How to optimize your CI/CD for Monorepo Node projects with Gitlab CI?

In recent years, many people have started using the monorepo pattern for source control of their projects. The main advantage of the monorepo approach is that all your code is in one place, which allows developers to make changes to several modules in one pull request. However, the main drawback of this approach is that it makes your CI/CD workflows much more complicated and error prone.

In this article we’ll discuss how you can optimize your CI/CD workflows on Node projects using the monorepo approach with Gitlab CI

We’ll go over the CI side of the pipelines (Lint and test jobs), the same approach can be used for CD (deploy jobs).

Project structure

Your project should look something like this:

Each module directory has:

  • a package.json file that contains all the dependencies the package relies on.
  • a .gitlab-ci.yml file which defines the jobs for the package

You will also notice the following files in the root directory:

  • .gitlab-ci.yml
  • .lint-ci.yml
  • .test-ci.yml

.gitlab-ci.yml is the file Gitlab will use to determine which pipelines to run. In a monorepo this file should only be used to define the stages and the default image for your jobs. The rest will be defined in other files which you will include.

 

.test-ci.yml and .lint-ci.yml will contain the job definitions for the test (resp. lint) jobs.

 

Here’s an example of what .test-ci.yml should look like:

 

Here’s an example of what one of the module’s .gitlab-ci.yml file should look like:

 

You’ve probably noticed that I haven’t talked about the .cache-ci.yml file and that we don't install dependencies in the jobs (no npm install), don’t worry, your questions will be answered in the following section :) 

Using lerna to manage your packages

 

Lerna is a tool for managing Node projects with multiple packages. Running lerna bootstrap at the root of the repository will install all the dependencies for each of the packages. Lerna treats each package as an independent entity, meaning that it will download all the dependencies to the package’s node_module directory. When you only have less than 3 packages this works fine, but if you have loads of them the process becomes very redundant as you will install the same libraries for multiple packages. The more libraries you download, the longer the lerna bootstrap step will take. 

This has a huge impact on your CI/CD pipelines because to test your code, you need to install all the dependencies first. The more your project grows, the longer it takes lerna to install the dependencies and the longer your tests will need to wait before running. 

To mitigate this issue, lerna introduced hoisting. The command ‘lerna bootstap --hoist` will install external dependencies at the repo root so they're available to all packages. Any binaries from these dependencies will be linked into dependent package node_modules/.bin/ directories so they're available for npm scripts. This ensures that you don’t download the same dependencies twice, meaning that if you add 10 packages to your project with dependencies that already exist, your installation will not take longer than before. Moreover, it greatly reduces the size of the dependencies and makes caching a very interesting option to speed up your CI/CD workflows.

Caching

 

Even though you now have an efficient way of installing your project dependencies, you might not want each job to wait 5min to install dependencies, especially if those don't change often. This is why Gitlab CI provides caching options, you’ll find an example of the .cache-ci.yml file I use on most of my projects:

This job is triggered by changes to the modules-sha file which is also used as a key for the cache. The modules-sha file is generated automatically by a pre-commit hook, the husky package makes it very easy to set up these hooks, here is an example of what your package.json should look like:

As you can see, the module-sha file is only modified when the package-lock.json files (which contain the exact dependencies of the node project at a given time) change. This means that your cache job will only run when your dependencies change! 

 

In this tutorial, we went over what you can optimize in your Gitlab monorepos to make your CI/CD workflows faster. You can further optimize your Gitlab CI CI/CD workflows by working on your runners, here are a few things you should try if you haven’t yet:

  • Running your GitLab runners on EKS
  • Using EKS + Fargate for your GitLab jobs
  • Trying different instance types for your EKS Nodes
  • Using custom docker images for your jobs
Baptiste Guerin

Baptiste Guerin

Baptiste is a Site Reliability Engineer (SRE) at Padok. He works with a broad set of DevOps Technologies, such as Kubernetes, G-Cloud, AWS, and Gitlab CI

What do you think? Leave your comments here !