Multi-layering: Terraform IaC from scratch to scale

Posted on 10 February 2021, updated on 6 August 2021.

When it comes to deciding how to start an Infrastructure as Code project from scratch with Terraform, you certainly will experience some struggling questions. Should I start it and fix it later, when it becomes a problem? Should I set up continuous integration and deployment? Should I split this project into multiple repositories? If so, what size should a single piece be?

On a recent project, my team faced these questions since it’s all about trade-offs. This article explores two different approaches to organizing your code: Monolith vs. Layers.

By the end of your reading, my goal is that you will be able to make an informed decision for your project and how to start implementing it.

Terraform

First, we need to talk about Terraform's state file. It is the core piece of this article's comparison.

Terraform's state is the last known configuration of your infrastructure, as it was when Terraform last ran. By default, it is stored locally, but this is not a good practice. Instead, you should keep it remotely. Object storages like S3 or GCS are great for this. This way, you can run Terraform from any computer that has access to the remote state file.

Terraform uses a lock to protect its state from concurrent edits.

As the state file contains the information of every piece of infrastructure, you manage with Terraform, if the project isn't split, it will grow in size alongside your infrastructures.

Keep these concepts in mind; the state's locking mechanism and growing size will come into play when discussing collaboration later on.

Another feature that makes Terraform powerful is modules. With Terraform, you can wrap some reusable code into a module you can call in another section of your code.

Modules improve your code's readability and, since the v0.13, make duplicating resources easier by using logical loops. This feature is very popular, so you can find useful modules written by the community or even by the Cloud providers.

Here is some general advice I've gathered about the best practices you want to follow to keep your project stable, readable, and scalable:

Keep changes small to reduce blast radius: because many changes imply risks of breaks.
Review your code: even if your code works, ask for a review to share the changes and challenge its implementation.
When possible, promote to production: apply your infrastructure changes to a homologation environment to test them and then apply to other environments (quick reminder: staging and testing environments are production environments for OPS).
As with every code, naming conventions are essential: You need to define them beforehand to avoid resource recreation.
Wait before automating: as there will be a high-frequency building phase and a runtime phase.

Monolith

It's time to discuss architecture and the differences between two opposite approaches.

First, the monolithic approach consists of holding all the infrastructure configuration in a single state file. This is the natural evolution of an IaC repo as every change is brought to the main repo. The structure of a monolithic Terraform project might look like this:

There is a separation between files for readability, and there are modules to wrap reused code. However, behind the scenes, there is one and only remote Terraform state file. Although this article leans in favor of layering, the monolithic approach does have some pros; here they are:

Terraform handles intra-project dependencies: since you apply all your code at once, you can make references to objects from one file to another. An example of this is managing dependencies to the VPC. This is the network resource where you'll be binding other infrastructure blocs like the Kubernetes cluster or the databases. In your postgresql.tf and kubernetes.tf, you can refer to the VPC created by vpc.tf using aws_vpc.main.id
Terraform plans are executed over the entire infrastructure: this means that every time you commit a new change, you will be checking that the whole infrastructure matches the code you applied. This method enforces correctness on the state of the infrastructure, which is a good thing.
You can use workspaces to replicate through isolated environments: this can be used to promote changes to production. Terraform's workspaces are the simplest way to split state files as Terraform will create one state file per workspace. This is, in a way, a sort of layering. Generally, the workspaces are used to duplicate objects between environments. So you'll have testing, staging, integration, production workspaces, and so on. This is done using the command terraform workspace new <env> and terraform workspace select <env>. You will need to create separated .tfvars files, one for each environment.

Now you have a Terraform project in one main directory containing all created objects in your state; their dependencies are handled between files, and you can apply the same infrastructure to different environments.

This is fine if you are the only developer to update the project. Now we need to talk about collaboration. As your DevOps team grows, you may create more than one Pull Request at a time on the repo. This will create conflicts. One way to avoid these is to break down your monolithic configuration into layers.

Layers

Splitting huge code into smaller pieces is what we do when developing microservices. Since Infrastructure as Code is still code, it is subject to the same constraints and concepts. A layered IaC Terraform project may look like this:

This allows collaboration. As there is one state file per layer and per workspace, each team member can make changes to different layers without conflicting with coworkers' modifications to other layers.

You can terraform apply two layers simultaneously without worrying about the state lock, which is fantastic.

You can decouple architecture depending on its change frequency. In this video, Armin from Hashicorp explains that you may want to have 3 types of layers:

Bootstrap: used to kickstart your project, set up the backend, the organization, folders, etc.
Foundation: provisions resources like VPC networks and subnets, security policies, etc.
Service: includes everything else regarding your business activity

With this segregation, the first two layers are less frequently applied than the third one, so they will not be affected by small changes to services. It also reduces execution time for each plan.

It might also be a good idea to split the service layer, but how?

You can find articles that describe environment-oriented layers. This enables non-identical environments, like objects created in testing environments but not in production. In my opinion, this is not the best choice. Why? Because you don't want to sacrifice the fact that your environments are identical and therefore easier to maintain.

Other articles mention object-oriented layers, and these are the ones I prefer.

A common mistake is to make layers too small. This reduces readability and, if you're planning to automate the terraform apply via a CI as I did with CircleCI, it may be unsustainable to review every terraform plan diff for every layer every time a developer makes a pull request. Some projects have over 200 layers to manage!

For sizing a layer, keep these things in mind:

the target size of the project
making the layer meaningful
minimizing the dependencies between layers
the number of concurrent collaborators (during the build phase and at runtime)

Try to anticipate the idea of delegating a layer to a dedicated team, which may occur in the future.

I am confident that with these insights, Infrastructure as Code multi-layering is more helpful than inconvenient.

There are two main approaches to architect an IaC project: a single monolith or multiple layers. You now know the implications of each. Depending on your target infrastructure’s size, the size of your team, and the immutability you need, you will need to choose one over another.

Monoliths bring safety, readability and are straightforward to implement. Layers enable collaboration, agility and, flexibility regarding future needs.

At Padok, opinions are quite divided as the 2 implementations can fit different projects. I hope you enjoyed this article. If you want to learn more about the Terraform, I invite you to read this post about the newest 0.14.

Terraform IaC from scratch to scale: multi-layering (Monolith vs. Layers)

Terraform

Monolith

Layers