gitlab_complex_pipelines

Posted on 5 January 2023.

Let’s say you need to deploy multiple complex environments and you plan to use Gitlab pipelines, how could you proceed? At Padok, we already faced this issue and solved it using a combination of multi-project dynamic pipelines, artifacts, and dependency relationships between Gitlab jobs.

What are we building here?

This article describes our solution to the question, the components we developed, and how we arranged them to work together. The objective here is to create the following pipeline structure:

pipepline_structure

Gitlab is a web-based source-control repository based on Git that is designed for team collaboration and productivity. It is a very powerful developer tool that lets your teams be in control, optimize your workflows to match your deployment SLAs and gives tons of options to create complex CI/CD pipelines.

This last part really got us to consider Gitlab CI as the solution to the problem at hand. Plus it is very popular already, with many examples in the literature to automate your DevOps lifecycle, deploy applications in Kubernetes, implement GitOps best practices, and many more!

How do you set up dynamic Gitlab pipelines?

First things first, we need a master job that can run downstream pipelines dynamically. We want to create as many environments as the user decides to in the input parameters: it can be only 1, 3 just like in this example, or more. The question: how do you trigger the create-env job multiple times? The answer: using dynamic Gitlab pipelines.

Dynamic pipelines in GitLab CI are generated programmatically based on certain conditions or parameters. This means that the steps and tasks in the pipeline are not fixed, but can change depending on the input or context.

For example, you could use them to automatically run a job that has to do the same task hundreds of times, with each instance being almost identical to the others, but with very subtle differences. It would be very tedious, and near impossible, to write out each variation of the same job. Instead of writing thousands of lines of code, you could generate them using dynamic pipelines in Gitlab CI.

Dynamic pipelines help manage complex or large CI/CD pipelines, where the tasks and dependencies can vary depending on the context. They allow teams to automate and customize their CI/CD processes, making them more efficient and effective.

# bootstrap-env/.gitlab-ci.yml
# boostrap-env
# ├── .gitlab-ci.yml   <--
# ├── generate_templates.py
# └── requirements.txt

variables:
  ENVIRONMENTS: 
		description: "User input: comma-separated list of environments"
		value: "dev,prod,staging"

stages: 
  - templating
  - deployment

generate-templates:
  stage: templating
  image: python:3.10
  before_script:
    - pip install -r requirements.txt
  script: 
    - python generate_templates.py --env $ENVIRONMENTS
  artifacts: 
    paths:
      - environments.yml
  
deploy-envs:
  stage: deployment
  trigger: 
    include: 
      - artifact: environments.yml
        job: generate-templates
		strategy: depend

In this example, we use a custom python script to generate a YAML configuration file for Gitlab CI/CD jobs. This is the first stage of the pipeline: templating . In the second stage deployment, we use the generated file to deploy every environment initially requested by the user in the ENVIRONMENTS variable.

As a result, the environments.yml configuration file will contain 3 jobs, each responsible for creating an environment: dev, prod, and staging.

pipeline_jobs

Users can input the environments they need, and the templating stage will create as many jobs as there are environments to bootstrap.

Voilà! We have, 3 child jobs for the 3 environments we asked for, but how do you make them actually create new environments?

How do you set up multi-project downstream Gitlab pipelines?

Second, we want every step of the target pipeline architecture to be in a dedicated Gitlab project. The problem is to tell Gitlab “can you trigger the pipeline of this other project to create a new environment for me please?”. Gitlab actually offers two different ways to achieve this: using a trigger or using calling the API.

We will need both in this example because the maximum depth of downstream pipelines that can be triggered is 2, but here we will need at least 3! Hopefully, triggering them with the API resets the counter, and you can use this technique to go as many levels down as you want. This is a cool trick to know, but it comes with its share of trade-offs.

On the plus side, both methods have similar behavior in terms of parent-child representation. As you would expect from a trigger job, you can see the pipelines running in the parent project, and the child project as well with a link to the original.

However, on the negative side, the POST command syntax can become a little overwhelming when lots of variables are involved and need to be passed down to the child pipeline.

In addition, the parent job calling for the child pipeline with the API doesn't wait for their child pipelines to terminate. Instead, they exit with a success as soon as the curl command is correctly executed. This can become a problem when you need other jobs from later stages to require these steps done. One workaround is to add a waiting loop after the API call.

We investigated multiple solutions, mostly loops in shell scripts that call one of Gitlab APIs to list project jobs, list pipeline jobs, or list pipeline bridges, depending on your needs.

# bootstrap-env/generated-template.yml

deploy-staging:
  environment: staging
  variables: 
    GITLAB_PROJECT_ID: 123456789 # the project ID of 'create-env'
    GITLAB_REF: main
  script: 
    - > 
      curl --request POST
      --form "token=$CI_JOB_TOKEN"
      --form "ref=$GITLAB_REF"
      --form "variables[ENVIRONMENT]=$CI_ENVIRONMENT_NAME"
      "https://gitlab.com/api/v4/projects/$GITLAB_PROJECT_ID/trigger/pipeline"

With this, each generated job triggers the “create-env” pipeline. We call the Gitlab API using curl and specify important parameters such as token, which is required to trigger multi-project pipelines. Additionally, we pass down relevant variables: in this example, we need an ENVIRONMENT variable for later stages.

In turn, “create-env” triggers “add-resources” using:

# create-env/.gitlab-ci.yml

stages: 
  - resources

add-env-resources:
  stage: resources
  rules:
    - if: $CI_PIPELINE_SOURCE == "pipeline"
  trigger:
    project: "$GITLAB_GROUP/add-resources"
    branch: main
    strategy: depend

# [ ... ]

For multi-project pipelines, the trigger keyword accepts Gitlab project as parameters, either a simple string or with the project keyword. Besides, it is worth mentioning that the trigger:project syntax is for Premium Gitlab accounts exclusively. The strategy: depend option makes the parent pipeline's status depend on its children's status.

Also, notice the use of rules to let the job run only when another pipeline is the trigger. This helps avoid unintentional environment creation on code pushes, and merge requests… which can rapidly turn into very serious problems!

Further down in the architecture, multi-project downstream pipelines will work the same: “resource1” calls “add-resource1” using a trigger logic, and so on.

That is a problem solved. However, things get tricky when artifacts are involved.

How do you pass artifacts around between Gitlab jobs?

Lastly, artifacts. We use them to convey information about the resources created throughout the procedure. In this example, resource2 needs information about resource1, while resource3 needs information about resource1 & resource2. An artifact is created at the end of each add-resource# pipeline, and we fetch them at the parent level to pass down variables to other children when creating new resources.

Don’t hesitate to go check the first diagram at the top of this article: in this section, we focus on the right-most side including all resource-related repositories and pipelines.

One of the main challenges is the dependency graph between jobs: parents jobs have to wait for their children to exit successfully. Then they can get the generated artifacts. Implementing a dependence strategy using stages is relatively easy with jobs in the same project:

# add-resources/.gitlab-ci.yml
# add-resources
# ├── .gitlab-ci.yml   <--
# ├── resource1.yml
# ├── resource2.yml
# └── resource3.yml

variables: 
  ARTIFACT_RESOURCE3: "resource3-outputs.zip"
  ARTIFACT_RESOURCE2: "resource2-outputs.zip"
  ARTIFACT_RESOURCE1: "resource1-outputs.zip"

stages:
  - resource1
  - resource2
  - resource3

add-resource1:
  stage: resource1
  trigger:
    include:
      - local: "resource1.yml"
    strategy: depend

add-resource2:
  stage: resource2
  trigger:
    include:
      - local: "resource2.yml"
    strategy: depend

add-resource3:
  stage: resource3
  trigger:
    include:
      - local: "resource3.yml"
    strategy: depend

By default, all artifacts from previous stages are passed down to future stages in Gitlab. Thus arise issues with storage, security, and pipeline speed. To alleviate this, you can also consider the dependencies field which lets you specify exactly what artifacts the job requires.

On the other hand, when jobs are part of different projects in a multi-project downstream pipeline, the stage approach is not enough and you will have to use the needs keyword.

# add-resources/resource2.yml
# add-resources
# ├── .gitlab-ci.yml
# ├── resource1.yml
# ├── resource2.yml   <--
# └── resource3.yml

variables:
  GITLAB_PROJECT: add-resource2
  GITLAB_REF: main

# Parse artifacts to get information needed for resource2 pipeline
parse-resource1-artifact:
  before_script:
    - apt-get update && apt-get install -y zip jq
  needs:
    - project: "$GITLAB_GROUP/add-resource1"
      job: add-resource1
      ref: main
      artifacts: true
  script:
    - echo "Parsing resource1 artifact from \"$ARTIFACT_RESOURCE1\"..."
    - >
      RESOURCE1_ID=$(unzip -p $ARTIFACT_RESOURCE1 resource1_information.json | jq -r '.id') &&
      echo $RESOURCE1_ID

# Triggers the creation of a new resource2 resource
trigger-resource2-pipeline:
  variables:
    RESOURCE1_ID: $RESOURCE1_ID
  trigger:
    project: "$GITLAB_GROUP/$GITLAB_PROJECT"
    branch: $GITLAB_REF
    strategy: depend
    forward:
      pipeline_variables: true
  needs:
    - parse-resource1-artifact

The needs logic takes precedence over the stage logic: meaning if both fields are set, the pipeline ignores the stage order and only focuses on needs. According to the official documentation, the same goes for the artifacts:

"When a job uses needs, it no longer downloads all artifacts from previous stages by default, because jobs with needs can start before earlier stages complete".

In this example, we make sure the first job retrieves the artifact from the previous step before running the actual pipeline: in parse-resource1-artifact, we download the artifact and then extract some information.

For the sake of the example, we implemented a simplistic parsing logic, but the outputs can be custom JSON files, logs, Terraform states, Kubernetes API calls… you get the idea.

Conclusion

That's all folks! We hope you learned something useful today. Feel free to reach out to us, we are always happy to discuss and learn from you. Cheers 🤍.