Posted on 7 July 2021, updated on 6 August 2021.

Ansible is an amazing tool for automating tasks in your deployment workflows that comes with very few constraints. The drawback of having very few constraints is that it can be quite difficult to evaluate the quality of one’s code. This is a problem that is often overlooked because it is quite hard to quantify and to measure « quality ».

However, when it goes down, we all feel the impact of it: simple tasks become tedious, collaboration within the team is more difficult, and more and more regression bugs are introduced over time. In this article, I will show you some routes you can go to effectively boost your code quality.

Re-using before re-writing

The first way to boost the quality of your code is to write as few codes as possible. This is not only true when it comes to Ansible code, but in any development project, really. If you don't write any code, you cannot make any mistakes, right?

The Ansible community is very active in writing roles for all types of use-cases, so before writing one, always keep in mind that whenever you want to automate something, chances are someone else has already done it.

Ansible Galaxy is the community registry for ansible resources. It comes with a CLI, ansible-galaxy, which allows installing roles and collections like so:

# Install a role
 ansible-galaxy role install role_name
 

 

 # Install a collection of modules
 ansible-galaxy collection install collection_name

In case you do not find what you need online, and you do have to write a role, this CLI can also help you to bootstrap the project following the official guidelines.

# Create a role from scratch
 ansible-galaxy role init role_name

Static code analysis

As your codebase grows, it will necessarily become more difficult to keep an eye on its quality. Are all parts of the code up to your team's standards? A good way to find out is to use static code analysis tools. Two specific tools are the community's standards when it comes to Ansible : yamllint and ansible-lint.

Yamllint will spot mistakes in your yaml syntax, which is at the core of your Ansible code. It will also ensure your code styling is consistent across your codebase. This may not seem as important, but it actually helps a lot to have a consistent codebase, especially when you have a very large codebase. It will also make it so much easier to onboard a new team member and share your knowledge of the codebase.

The second tool, ansible-lint, will check the logic of your playbooks and roles against a database of proven practices to avoid the most common pitfalls. It will make recommendations to help keep your code as maintainable as possible. Most of the time, ansible-lint comes with yamllint included, but it will depend on the version you install, so keep that in mind.

# With "-p" option, you get a condensed output. For full output, remove 
 # the "-p" option. It allows me to keep the example concise in this article.
 

 

 $ ansible-lint -p ./role_name
 ./tasks/main.yaml:146: [EANSIBLE0002] Trailing whitespace
 ./tasks/main.yaml:209: [EANSIBLE0002] Trailing whitespace 
 ./tasks/exec.yaml:16: [EANSIBLE0012] Commands should not change things if nothing needs doing
 ./tasks/exec.yaml:23: [EANSIBLE0013] Use shell only when shell functionality is required

In this example, you see 3 of the 18 default rules. To get the full list, simply use ansible-lint -L.

Both of these tools are very easy to set up, and when integrated into your git-flow, as part of your CI/CD, for example, they will be the guardians of your quality standards.

Testing your code

Unit testing plays a big part in avoiding regressions when developing new features. It is very commonly used in most programming languages, but did you know that there is such a thing as a testing framework for Ansible?

It is called Molecule, and allows for testing your Ansible roles within docker containers. It is configured using YAML, and in its default setup, a config file will look like this:

---
 dependency:
 name: galaxy
 driver:
 name: docker
 platforms:
 - name: instance
 image: docker.io/pycontribs/centos:8
 pre_build_image: true
 provisioner:
 name: ansible
 verifier:
 name: ansible

The default setup can be obtained by creating your role with molecule init role my-new-role --driver-name docker.
This command will bootstrap your role just like the ansible-galaxy role init command does, but it will add a default configuration for linting and testing.
In fact, molecule can be installed with common linting tools, and you can then use it for linting as well!

If we look at this config file, we see that molecule will use the docker image [docker.io/pycontribs/centos:8] to create an instance against which it will run tasks using a given provisioner (Ansible is the only supported provisioner as of today).
It will then test the state of the instance using a verifier. In this example, the verifier is Ansible, which means that all test verifications will be done through Ansible tasks.

Here is another example from the default setup:

---
 # This is an example playbook to execute Ansible tests.
 

 

 - name: Verify
 hosts: all
 gather_facts: false
 tasks:
 - name: Example assertion
 assert:
 that: true

Ansible is not the only verifier available, one could also use Testinfra, but Ansible is the default and most common option.

Having tests like this that run your playbooks or roles against a test instance to verify the final state corresponds to the expected one is always a huge help in boosting the quality of your code. This means it will be much easier to find out about regressions before they go to production, but it also means that the expected final state of your role is well-documented within your tests.
Your README probably describes part of the expected state, like "have a working PostgreSQL server running", but this is only the tip of the iceberg. Knowing exactly how you expect your role to configure your instance will help your team collaborate better towards a common goal.

Measuring performance

Complex Ansible roles can become slow as their task list grows. However, this can probably be improved by identifying bottlenecks in your execution pipeline. How can this be done? Using a feature of Ansible called "Callback Plugins".

What are callback plugins exactly?
Let's see with an example: the profile_tasks plugin.
Installing it is as simple as adding the callback_whitelist parameter to the defaults section of your ansible.cfg file.

A quick tip: if you want to share ansible.cfg settings at a project level, you can actually create an ansible.cfg file at the root of your repo, which will override the machine's settings.

Here is what it looks like:

[defaults]
 callback_whitelist = profile_tasks

During your role's execution, you will now see start and end timestamps for each task, and when the execution is done, you will get a summary listing the 20 longest tasks in your role.
Here the callback plugin is added as a hook on task execution and will be called whenever a task is started.

Here is a sample output for a small role that installs a list of homebrew packages on a mac:

Tuesday 29 June 2021 08:56:32 +0200 (0:00:20.438) 0:03:49.384 ********** 
 ================================================================================ 
 config_setup : Install homebrew packages ------------------------------- 180.29s
 config_setup : Update homebrew ------------------------------------------ 23.26s
 config_setup : homebrew cleanup ----------------------------------------- 20.44s
 config_setup : Install homebrew cask packages ---------------------------- 1.82s
 config_setup : Add custom brew taps -------------------------------------- 1.79s
 Gathering Facts ---------------------------------------------------------- 1.32s
 Gathering Facts ---------------------------------------------------------- 0.45s
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 
 total ------------------------------------------------------------------ 229.37s

This is only an example, there are a couple of other callbacks I would recommend to measure the performance of your Ansible code:

  • profile_tasks to measure task performance in a role
  • profile_roles to measure the execution time of your roles
  • timer to measure a whole playbooks execution time

From this information, you can then work towards improving it. Perhaps you have useless blocks, that needn't executing? Perhaps some tasks could be run in parallel? This will highly depend on your setup, but gathering the information is always the first part of an investigation.

 

I hope all these tips will help you improve the quality of your Ansible code, and help your team work better together! Most of these tips I found useful in my own experience, but since I am not all-knowing, there are probably different ways to improve your code as well. Do not hesitate to share them with us on Twitter or LinkedIn :)