gitlab_ci_optimization

Posted on 23 September 2020, updated on 18 December 2023.

Efficient CI/CD workflow is an important part of the process of software development, testing, and deployment. At Padok we like to use GitLab CI and I'll show you how you can optimize and speed up your GitLab CI pipeline while keeping the bill as low as possible.

Host your own runners

There are two ways to run CI/CD jobs on GitLab:

  • Using the runners provided by the host of your repository (for example gitlab.com). Those usually have minutes and spec limitations that grow with the plan you subscribed to.
  • Deploying your own runners. The GitLab runner is nothing more than an open-source piece of software that you can run pretty much anywhere. Once installed and configured, it will register on your GitLab server and listen for the jobs it has to run. The jobs can run in Kubernetes pods, in Docker containers on virtual machines, or even as simple processes on the runner itself.

Deploying and hosting your own runners is a good way to optimize your Gitlab CI solution to your own needs.

If you deploy the runners on Kubernetes (there is, of course, a Helm chart for that), you will be able to specify the specs (CPU, memory, disk space, etc.) needed for the job for them to use exactly the right amount of resources.

Being able to deploy your own runner is one of the best features of Gitlab CI. It would be a shame not to make the most of it!

Rewrite your Dockerfile to make the most of the docker cache

This advice is specific to a kind of job that is present in most CI workflows: the build of a Docker image in which your application will be shipped.

If you don’t build container images, you can skip this part.

Docker has a built-in caching mechanism that allows you to build images faster by re-using layers from a former build.

One important thing to know is that, during the build stage process, docker creates multiple intermediate images, each corresponding to an instruction in your Dockerfile. Before reusing a layer from a former build, docker will have to determine if this layer is still valid.

It does so differently depending on the kind of instruction:

  • for a COPY it will check if the file to copy has changed since the last build
  • for a RUN it will check if the command has changed
  • etc.

You should always write your Dockerfiles with the instructions that will most probably invalidate the cache at the end. For example, copying your application code into the image should be one of the last steps.

If your Dockerfile looks like this:

You should probably rewrite it so it looks like this:

Reuse the docker cache from a former build

If you are using a Kubernetes runner for your GitLab CI, each build job is run on a different container with its own Docker daemon. Therefore, you can’t reuse the cache as easily as explained in part 2, since there was no former build (as far as this Docker daemon knows).

What you can do is pull a docker image that you built earlier from your ci image registry before building a new one and using the --cache-form instruction.

So let’s see in detail what is done in the script part of this GitLab CI job:

  • An image tagged latest is pulled from the repository. If this image doesn’t already exist the job shouldn’t fail, thus the presence of the || true.
  • A new build is run with the --cache-from option. We specify the name and tag of the image pulled in the first step so the cache is reused from it. The built image is tagged with the short sha of the commit and latest.
  • Both the short sha and latest tagged images are pushed to the ci registry.

With this way of building your docker images, each build will use the cache from the former build.

One negative aspect is that you will spend time pulling and pushing the latest image but the time saved by reusing its cache is usually way greater.

Re-think how your jobs use the cache

Pulling and pushing the GitLab CI cache can sometimes take more time than running the job itself.

That’s why it’s worth thinking about its usage and disabling it when possible.

Ask yourself this question each time you write a new job: Do I need some files from the cache to run this task?

If not, just disable it or use another cache more suited to the task.

Be careful: If you defined the cache globally rather than for each job, you also have to explicitly disable it:

In the example above, I have to specify an empty set cache: {} so the cache is not used for this job.

NB: Don’t mistake artifacts for cache. Artifacts are results of a job (build artifact, test reports, etc.) Don’t enable the cache on a job just to be able to reuse an artifact on the next job: that’s what artifacts are for!

Allow a single pipeline per ref and make jobs interruptible

As you might have noticed, most latency/bugs with the Gitlab CI occur when plenty of jobs are run simultaneously (If not, please take a moment to thank the person on your team responsible for the marvel of a CI you got there!).

A solution for that — other than giving more compute power to the architecture on which your runners run — is to lower the number of triggered pipelines.

Of course, just canceling pipelines for random commits could get you into trouble with the person who pushed it.

Nonetheless, there are pipelines that won’t be missed by anyone if they do not complete. Those are the non-HEAD pipelines: a pipeline that has been triggered on the last commit of a branch before another push to the branch; this means that the commit is no longer the last one, thus non-HEAD.

To put it simply, we only want to run a Gitlab CI / CD pipeline on a commit if it’s the HEAD of the branch. If during the pipeline lifetime, a new commit is pushed on the branch, the pipeline should be stopped. The pipeline that is running on the new commit — which is the new HEAD — will continue as if nothing happened.

To do that you first have to enable that feature on GitLab CI web interface.

Go to Settings > CI/CD > General Pipelines and click on the box shown below:

Auto-cancel

Once this box is checked, the old pipeline will be stopped once it has completed its current job.

If you want to go further you can mark your jobs as interruptible, so the pipeline execution will be stopped even will a job is running. This is especially useful on tests or building jobs that may take more than a few minutes.

Automatically rerun jobs that failed on known issues

Even with the best intention, your Gitlab CI may encounter issues that are beyond your control.

Be it because of network issues keeping you from downloading your dependencies, budget restrictions that keep you from scaling up your infrastructure, etc. sometimes, the only advice you can give someone whose pipeline has failed is “Just run it again”.

However, a lot of time may pass between the moment the pipeline has failed and the developer who owns the pipeline notices it and runs it again. Plus it’s not only time: it’s also toil that could be automated.

The retry keyword allows you to automatically rerun operations that fail for particular reasons. Because there is no reason to do ourselves something that a machine can do better:

I hope those few tips are helpful to you and will help you cut time on your CI pipelines. If you’re not already using GitLab CI, you might want to check out this article on how to deploy a Kubernetes app with GitLab pipelines. Or if you’re already familiar with Gitlab CI, here’s how to use it to generate testing environments on the fly with Kubernetes.