Your project should look something like this:
Each module directory has:
- a package.json file that contains all the dependencies the package relies on.
- a .gitlab-ci.yml file which defines the jobs for the package
You will also notice the following files in the root directory:
.gitlab-ci.yml is the file Gitlab will use to determine which pipelines to run. In a monorepo this file should only be used to define the stages and the default image for your jobs. The rest will be defined in other files which you will include.
.test-ci.yml and .lint-ci.yml will contain the job definitions for the test (resp. lint) jobs.
Here’s an example of what .test-ci.yml should look like:
Here’s an example of what one of the module’s .gitlab-ci.yml file should look like:
You’ve probably noticed that I haven’t talked about the .cache-ci.yml file and that we don't install dependencies in the jobs (no npm install), don’t worry, your questions will be answered in the following section :)
Using lerna to manage your packages
Lerna is a tool for managing Node projects with multiple packages. Running lerna bootstrap at the root of the repository will install all the dependencies for each of the packages. Lerna treats each package as an independent entity, meaning that it will download all the dependencies to the package’s node_module directory. When you only have less than 3 packages this works fine, but if you have loads of them the process becomes very redundant as you will install the same libraries for multiple packages. The more libraries you download, the longer the lerna bootstrap step will take.
This has a huge impact on your CI/CD pipelines because to test your code, you need to install all the dependencies first. The more your project grows, the longer it takes lerna to install the dependencies and the longer your tests will need to wait before running.
To mitigate this issue, lerna introduced hoisting. The command ‘lerna bootstap --hoist` will install external dependencies at the repo root so they're available to all packages. Any binaries from these dependencies will be linked into dependent package node_modules/.bin/ directories so they're available for npm scripts. This ensures that you don’t download the same dependencies twice, meaning that if you add 10 packages to your project with dependencies that already exist, your installation will not take longer than before. Moreover, it greatly reduces the size of the dependencies and makes caching a very interesting option to speed up your CI/CD workflows.
Even though you now have an efficient way of installing your project dependencies, you might not want each job to wait 5min to install dependencies, especially if those don't change often. This is why Gitlab CI provides caching options, you’ll find an example of the .cache-ci.yml file I use on most of my projects:
This job is triggered by changes to the modules-sha file which is also used as a key for the cache. The modules-sha file is generated automatically by a pre-commit hook, the husky package makes it very easy to set up these hooks, here is an example of what your package.json should look like:
As you can see, the module-sha file is only modified when the package-lock.json files (which contain the exact dependencies of the node project at a given time) change. This means that your cache job will only run when your dependencies change!
In this tutorial, we went over what you can optimize in your Gitlab monorepos to make your CI/CD workflows faster. You can further optimize your Gitlab CI CI/CD workflows by working on your runners, here are a few things you should try if you haven’t yet:
- Running your GitLab runners on EKS
- Using EKS + Fargate for your GitLab jobs
- Trying different instance types for your EKS Nodes
- Using custom docker images for your jobs