Skip to content
CI/CD Best Practices

Caching

Caching is transient/temporary data storage. It’s the act of reusing previously made assets/artifacts/information to speed up your build by working from a previous checkpoint. By doing this, we can reduce redundant actions and improve computation speed.

Benefits of caching

There are two primary benefits of caching:

  1. Build Speed: By working from previous data, we don’t have to repeat actions like downloading dependencies or testing code that hasn’t changed.
  2. Network Load Management: A lot of modern CI processes use data from somewhere else, such as downloading dependencies. By caching these actions, you don’t just reduce the time those actions take, but also the load they create on the external services.

What is a cache key?

A cache key is the identifier for our cache, and is how we will refer to and extract data later.

We often use dynamic information in our cache keys. For example, some folks will choose to include the branch name so that every feature branch has a separate cache.

A very common piece of dynamic information is a “checksum” of a file. A checksum is a computed hash that represents the contents of a file. It’s one-way, meaning that you can generate a checksum from a file but you cannot generate that same file from the checksum. This info is useful, to create a compact identifier that represents a file’s contents. When the contents change, so does the checksum.

There is no one-size-fits-all pattern, and we’d encourage you to mix and match patterns depending on what suits your needs.

Update on the mainline only

Only rebuild/update/upload the cache on the master/main/release branch.

Cache key example:

<static-branch-name>-<checksum>-<static-version>
main-1234567890asdfghjkl-v1

Pro: Easy to share cache with all builds, because it’s based on what’s been merged into the mainline.

Con: Testing caching can be hard as you have to remove any branch restrictions for the cache update, and be careful to use a separate cache key so that you don’t affect anyone else’s build. That is, you need to make manual changes that should not be commit to the final product.

Separate branch, separate cache

Each branch maintains a separate cache, that isn’t shared with any other branches.

Cache key example:

<branch>-<checksum>-<static-version>
my-feature/ticket-123-1234567890asdfghjkl-v1

Pro: Easy to test caching changes.

Con: The first build on every new branch will take longer as it doesn’t benefit from a cache.

A cache per build, to share information

Every build/commit gets a separate cache, and doesn’t share it outside the build. This pattern is primarily used to pass information between different stages in a build, such as unique build assets.

Cache key example:

<commit-sha>-<checksum>-<static-version>
ae23bc1-1234567890asdfghjkl-v1

Pro: Can share information between machines and network locations.

Con: No data is shared with any other build/commit and so it doesn’t contribute to any speedup in subsequent builds.

Advice

Picking a cache key

When choosing a cache key, you’ll (generally) want a combination of three components:

  1. A description: Something that describes the intended use of the cache (e.g. npm-dependencies)
  2. A checksum: Often, we’re storing the output of some process. A good cache key includes the identifier for the inputs used to generate that output. For example, the checksum of a lockfile (e.g. package-lock.json, poetry.lock) represents the inputs of the dependencies we downloaded/output.
  3. A static version: Caches are not infallible, and sometimes we can end up with a corrupted or somehow bad cache. To burst it and start over, it’s useful to have something like -v1 at the end of they key to quickly burst the cache with a single commit.

How to for package managers

Go - go.mod

Directories:

  • ~/go/pkg/mod

Cache key:

# <architecture>-<lockfile checksum>-<static version>
{{ arch }}-{{ checksum "go.sum"  }}-v1

To find the directories to be cached, you can run the following command:

go env GOMODCACHE

Java - Gradle

Directories:

  • ~/.gradle/caches
  • ~/.gradle/wrapper

Cache key:

# <lockfile checksum>-<static version>
{{ checksum "dependencies.lockfile"  }}-v1

Java - Maven

Directories:

  • ~/.m2/repository

Cache key:

# <lockfile checksum>-<static version>
{{ checksum "pom.xml"  }}-v1

Node - NPM

Directories:

  • ./node_modules

Cache key:

# <lockfile checksum>-<static version>
{{ checksum "package-lock.json"  }}-v1

Node - PNPM

Directories:

  • ./pnpm-store

Cache key:

# <lockfile checksum>-<static version>
{{ checksum "pnpm-lock.yaml"  }}-v1

Node - yarn

Environment variables:

  • YARN_CACHE_FOLDER=".cache"

Directories:

  • ./node_modules
  • ./.cache

Cache key:

# <lockfile checksum>-<static version>
{{ checksum "yarn.lock"  }}-v1

Python - pip

Directories:

  • ~/.local/share/virtualenvs/venv

Cache key:

# <lockfile checksum>-<static version>
{{ checksum "Pipfile.lock"  }}-v1

Python - poetry

Environment variables:

  • POETRY_CACHE_DIR=".cache" – Set the caching directory to be in the project
  • POETRY_VIRTUALENVS_IN_PROJECT="true" – Set the virtualenv directory to be in the project

Directories:

  • .venv
  • .cache

Cache key:

# <lockfile checksum>-<static version>
{{ checksum "poetry.lock"  }}-v1

Ruby - bundles

Directories:

  • ~/.bundle

Cache key:

# <lockfile checksum>-<static version>
{{ checksum "Gemfile.lock"  }}-v1

How to for CI providers

CircleCI

steps:
  - restore_cache:
      keys:
        # when lock file changes, use increasingly general patterns to restore cache
        - gradle-repo-v1-{{ .Branch }}-{{ checksum "dependencies.lockfile" }}
        - gradle-repo-v1-{{ .Branch }}-
        - gradle-repo-v1-
  - save_cache:
      paths:
        - ~/.gradle/caches
        - ~/.gradle/wrapper
      key: gradle-repo-v1-{{ .Branch }}-{{ checksum "dependencies.lockfile" }}

Github Actions

name: Caching with npm
on: push
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3

      - name: Cache node modules
        id: cache-npm
        uses: actions/cache@v3
        env:
          cache-name: cache-node-modules
        with:
          # npm cache files are stored in `~/.npm` on Linux/macOS
          path: ~/.npm
          key: ${{ runner.os }}-build-${{ env.cache-name }}-${{ hashFiles('**/package-lock.json') }}
          restore-keys: |
            ${{ runner.os }}-build-${{ env.cache-name }}-
            ${{ runner.os }}-build-
            ${{ runner.os }}-

      - if: ${{ steps.cache-npm.outputs.cache-hit != 'true' }}
        name: List the state of node modules
        continue-on-error: true
        run: npm list

      - name: Install dependencies
        run: npm install

      - name: Build
        run: npm run build

      - name: Test
        run: npm test

Jenkins

arbitraryFileCache(
    path: 'my-cache',
    cacheValidityDecidingFile: 'package-lock.json',
    includes: 'node_modules',
    excludes: '**/*.generated'
)

Travis CI

language: python    
cache: pip

You can also manually specify which directories will be cached, though you can’t choose the cache key.

cache:
  directories:
    - $HOME/.cache/pip
before_cache:
  - rm -f $HOME/.cache/pip/log/debug.log

Generic (container)

To do this generically, we’re going to leverage a public plugin for DroneCI and hijack its inputs a little. Below, we’ll detail the docker command that runs this cache, to give you a picture of the common inputs.

To run this verbatim, delete the blank lines and comments.

docker run --rm \
    -v "$(pwd)":/app \
    -e DRONE_REPO=octocat/hello-world \
    -e DRONE_REPO_BRANCH=main \
    -e DRONE_COMMIT_BRANCH=main \
    
    # restore = cache download 
    -e PLUGIN_RESTORE=false \
    # rebuild = cache upload
    -e PLUGIN_REBUILD=true \
 
    # Cloud storage location and authentication
    -e PLUGIN_BUCKET=<bucket> \
    -e AWS_ACCESS_KEY_ID=<token> \
    -e AWS_SECRET_ACCESS_KEY=<secret> \
 
    # To prevent partial cache unpacking, use `gzip` or `zstd` 
    -e PLUGIN_ARCHIVE_FORMAT="zstd"
 
    # A comma-separated list of directories to cache
    -e PLUGIN_MOUNT=/app/node_modules \
 
    # A Go-templated string to represent the cache key
    -e PLUGIN_CACHE_KEY='{{ .Commit.Branch }}-{{ checksum "package-lock.json" }}-v1' \
    meltwater/drone-cache