Skip to content
CI/CD Best Practices

Parallelisation

The easiest way to speed up CI builds is to split your tasks up to be run in parallel on multiple machines. Parallelising your workload means that you are only limited by the slowest step in your build. Think about things that don’t immediately look like they could be run independently verify that assumption – Knowing your dependency graph inside-out is going to help a lot.

What is “sharding”?

In distributed computing, we talk about splitting up a process across multiple machines. To do this, we separate a workload into “shards” of deterministic tests/actions. In CI, it often means batching all of our unit tests into X number of shards, which will then be run concurrently.

Fan-out/Fan-in

This pattern involves running a common build/compilation step to setup the code for testing, then fanning-out run a set of acceptance tests in parallel and then fanning-in again to run the deploy job.

Trading space for time

You can make a trade-off here to parallelise things that generate artefacts, by trading storage space for speed. Yes, if the build fails, the artefacts aren’t useful but if you can spare the storage space, and handle cleanup/archival on a fixed schedule elsewhere, you can save time by not requiring code correctness before you start building the artefacts. If the docker image build takes 5 minutes, and the full static analysis suite also takes 5 minutes, waiting for the analysis feedback before building the docker image costs you time.

Test Sharding

Node - Cypress

Cypress can parallelise tests across CI machines using the CLI.

cypress run --record --parallel --ci-build-id $BUILD_TAG

Node - Jest

The Jest CLI takes a --shard flag that can split your tests and let you run different sets of tests of different machines.

jest --shard=1/3

Python - pytest

This uses the pytest-shard library to implement sharding in pytest.

pytest --shard-id=0 --num-shards=2

How to for CI providers

CircleCI

To run a set of concurrent jobs, you will need to add a workflows section to your existing .circleci/config.yml file.

The simple example below shows the default workflow orchestration with two concurrent jobs. The workflows key needs to have a unique name. In this example, the unique name is build_and_test. The jobs key is nested under the uniquely named workflow, and contains the list of job names. Since the jobs have no dependencies, they will run concurrently.

version: 2.1

jobs:
  build:
    docker:
      - image: cimg/<language>:<version TAG>
    steps:
      - checkout
      - run: <command>
  test:
    docker:
      - image: cimg/<language>:<version TAG>
    steps:
      - checkout
      - run: <command>
workflows:
  build_and_test:
    jobs:
      - build
      - test

Github Actions

concurrency:
  group: ${{ github.ref }}
  cancel-in-progress: true

Jenkins

def barrier = createBarrier count: 3;
boolean out = false;
parallel(
        await1: {
            awaitBarrier barrier
            echo "out=${out}"
        },
        await2: {
            awaitBarrier (barrier){
                sleep 2 //simulate a long time execution.
            }
            echo "out=${out}"
        },
        await3: {
            awaitBarrier (barrier){
                sleep 3 //simulate a long time execution.
                out = true
            }
            echo "out=${out}"
        }
)

Travis CI

Say you want to split up your unit tests and your integration tests into two different build jobs. They’ll run in parallel and fully utilize the available build capacity for your account.

Here’s an example on how to utilize this feature in your .travis.yml:

env:
  - TEST_SUITE=units
  - TEST_SUITE=integration

Then you change your script command to use the new environment variable to determine the script to run.

script: "bundle exec rake test:$TEST_SUITE"

Travis CI will determine the build matrix based on the environment variables and schedule two builds to run.