Offical documentation
Caching
Caching is transient/temporary data storage. It’s the act of reusing previously made assets/artifacts/information to speed up your build by working from a previous checkpoint. By doing this, we can reduce redundant actions and improve computation speed.
Benefits of caching
There are two primary benefits of caching:
- Build Speed: By working from previous data, we don’t have to repeat actions like downloading dependencies or testing code that hasn’t changed.
- Network Load Management: A lot of modern CI processes use data from somewhere else, such as downloading dependencies. By caching these actions, you don’t just reduce the time those actions take, but also the load they create on the external services.
What is a cache key?
A cache key is the identifier for our cache, and is how we will refer to and extract data later.
We often use dynamic information in our cache keys. For example, some folks will choose to include the branch name so that every feature branch has a separate cache.
A very common piece of dynamic information is a “checksum” of a file. A checksum is a computed hash that represents the contents of a file. It’s one-way, meaning that you can generate a checksum from a file but you cannot generate that same file from the checksum. This info is useful, to create a compact identifier that represents a file’s contents. When the contents change, so does the checksum.
Popular Patterns
There is no one-size-fits-all pattern, and we’d encourage you to mix and match patterns depending on what suits your needs.
Update on the mainline only
Only rebuild/update/upload the cache on the master/main/release branch.
Cache key example:
<static-branch-name>-<checksum>-<static-version>
main-1234567890asdfghjkl-v1
Pro: Easy to share cache with all builds, because it’s based on what’s been merged into the mainline.
Con: Testing caching can be hard as you have to remove any branch restrictions for the cache update, and be careful to use a separate cache key so that you don’t affect anyone else’s build. That is, you need to make manual changes that should not be commit to the final product.
Separate branch, separate cache
Each branch maintains a separate cache, that isn’t shared with any other branches.
Cache key example:
<branch>-<checksum>-<static-version>
my-feature/ticket-123-1234567890asdfghjkl-v1
Pro: Easy to test caching changes.
Con: The first build on every new branch will take longer as it doesn’t benefit from a cache.
A cache per build, to share information
Every build/commit gets a separate cache, and doesn’t share it outside the build. This pattern is primarily used to pass information between different stages in a build, such as unique build assets.
Cache key example:
<commit-sha>-<checksum>-<static-version>
ae23bc1-1234567890asdfghjkl-v1
Pro: Can share information between machines and network locations.
Con: No data is shared with any other build/commit and so it doesn’t contribute to any speedup in subsequent builds.
Advice
Picking a cache key
When choosing a cache key, you’ll (generally) want a combination of three components:
- A description: Something that describes the intended use of the cache (e.g.
npm-dependencies
) - A checksum: Often, we’re storing the output of some process. A good cache
key includes the identifier for the inputs used to generate that output. For example,
the checksum of a lockfile (e.g.
package-lock.json
,poetry.lock
) represents the inputs of the dependencies we downloaded/output. - A static version: Caches are not infallible, and sometimes we can end up
with a corrupted or somehow bad cache. To burst it and start over, it’s useful to
have something like
-v1
at the end of they key to quickly burst the cache with a single commit.
How to for package managers
Go - go.mod
Directories:
~/go/pkg/mod
Cache key:
# <architecture>-<lockfile checksum>-<static version>
{{ arch }}-{{ checksum "go.sum" }}-v1
To find the directories to be cached, you can run the following command:
go env GOMODCACHE
Java - Gradle
Directories:
~/.gradle/caches
~/.gradle/wrapper
Cache key:
# <lockfile checksum>-<static version>
{{ checksum "dependencies.lockfile" }}-v1
Java - Maven
Directories:
~/.m2/repository
Cache key:
# <lockfile checksum>-<static version>
{{ checksum "pom.xml" }}-v1
Node - NPM
Directories:
./node_modules
Cache key:
# <lockfile checksum>-<static version>
{{ checksum "package-lock.json" }}-v1
Node - PNPM
Directories:
./pnpm-store
Cache key:
# <lockfile checksum>-<static version>
{{ checksum "pnpm-lock.yaml" }}-v1
Node - yarn
Environment variables:
YARN_CACHE_FOLDER=".cache"
Directories:
./node_modules
./.cache
Cache key:
# <lockfile checksum>-<static version>
{{ checksum "yarn.lock" }}-v1
Python - pip
Directories:
~/.local/share/virtualenvs/venv
Cache key:
# <lockfile checksum>-<static version>
{{ checksum "Pipfile.lock" }}-v1
Python - poetry
Environment variables:
POETRY_CACHE_DIR=".cache"
– Set the caching directory to be in the projectPOETRY_VIRTUALENVS_IN_PROJECT="true"
– Set the virtualenv directory to be in the project
Directories:
.venv
.cache
Cache key:
# <lockfile checksum>-<static version>
{{ checksum "poetry.lock" }}-v1
Ruby - bundles
Directories:
~/.bundle
Cache key:
# <lockfile checksum>-<static version>
{{ checksum "Gemfile.lock" }}-v1
How to for CI providers
CircleCI
steps:
- restore_cache:
keys:
# when lock file changes, use increasingly general patterns to restore cache
- gradle-repo-v1-{{ .Branch }}-{{ checksum "dependencies.lockfile" }}
- gradle-repo-v1-{{ .Branch }}-
- gradle-repo-v1-
- save_cache:
paths:
- ~/.gradle/caches
- ~/.gradle/wrapper
key: gradle-repo-v1-{{ .Branch }}-{{ checksum "dependencies.lockfile" }}
Github Actions
Offical documentation
name: Caching with npm
on: push
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Cache node modules
id: cache-npm
uses: actions/cache@v3
env:
cache-name: cache-node-modules
with:
# npm cache files are stored in `~/.npm` on Linux/macOS
path: ~/.npm
key: ${{ runner.os }}-build-${{ env.cache-name }}-${{ hashFiles('**/package-lock.json') }}
restore-keys: |
${{ runner.os }}-build-${{ env.cache-name }}-
${{ runner.os }}-build-
${{ runner.os }}-
- if: ${{ steps.cache-npm.outputs.cache-hit != 'true' }}
name: List the state of node modules
continue-on-error: true
run: npm list
- name: Install dependencies
run: npm install
- name: Build
run: npm run build
- name: Test
run: npm test
Jenkins
Offical documentation
arbitraryFileCache(
path: 'my-cache',
cacheValidityDecidingFile: 'package-lock.json',
includes: 'node_modules',
excludes: '**/*.generated'
)
Travis CI
Offical documentation
language: python
cache: pip
You can also manually specify which directories will be cached, though you can’t choose the cache key.
cache:
directories:
- $HOME/.cache/pip
before_cache:
- rm -f $HOME/.cache/pip/log/debug.log
Generic (container)
UnOffical documentation
To do this generically, we’re going to leverage a public plugin for DroneCI and hijack its inputs a little. Below, we’ll detail the docker command that runs this cache, to give you a picture of the common inputs.
To run this verbatim, delete the blank lines and comments.
docker run --rm \
-v "$(pwd)":/app \
-e DRONE_REPO=octocat/hello-world \
-e DRONE_REPO_BRANCH=main \
-e DRONE_COMMIT_BRANCH=main \
# restore = cache download
-e PLUGIN_RESTORE=false \
# rebuild = cache upload
-e PLUGIN_REBUILD=true \
# Cloud storage location and authentication
-e PLUGIN_BUCKET=<bucket> \
-e AWS_ACCESS_KEY_ID=<token> \
-e AWS_SECRET_ACCESS_KEY=<secret> \
# To prevent partial cache unpacking, use `gzip` or `zstd`
-e PLUGIN_ARCHIVE_FORMAT="zstd"
# A comma-separated list of directories to cache
-e PLUGIN_MOUNT=/app/node_modules \
# A Go-templated string to represent the cache key
-e PLUGIN_CACHE_KEY='{{ .Commit.Branch }}-{{ checksum "package-lock.json" }}-v1' \
meltwater/drone-cache