Posted on 9 February 2023.
Do you have a Virtual Machine running Linux that you want to monitor? Do you have several Docker containers that need monitoring but you don’t know where to start? With Ansible, we will see how to set up a full monitoring stack using Prometheus and Grafana to monitor VM and/or Docker containers.
Let’s say that you have a machine running Virtual Machines (VMs) that themselves are running Docker. You can also have another real machine (or more!) that you want to monitor. This stack is really flexible, and that is the whole point. What is important is that you have something to monitor, whether it is VMs, real machines or both does not matter.
For today, we will use 3 VMs on a unique machine and we will monitor only the VMs.
Ok, so now that we all agree on what we have, let’s talk about want we want. One simple word: monitoring. For once in this blog, we will not use Kubernetes. Why? Because we want a simple way to start, and Kubernetes is not that simple for beginners.
So what do we want? We want all our VMs and real machines monitored, with a simple dashboard to visualize the data.
We also want it to answer these criteria:
- I can easily add a new machine or VM to monitor
- I can store my configuration as code using GitHub
- I can safely store my secrets on Github without revealing them to everyone that has access to the repo
I have the perfect solution to achieve these goals: Ansible.
How to achieve our objectives?
However, I will talk about the pattern that we will use to monitor our infrastructure: the observer and the targets.
It is quite simple actually: we have a machine that monitors all the others. On this observer, we will install Prometheus and Grafana. On the targets, we will install the agents that will report the data of our VMs and their Docker containers: node-exporter and cAdvisor.
node-exporter will report metrics on the hardware and the OS (things like CPU consumption for instance), whereas cAdvisor will focus solely on Docker-related metrics. One can work without the other, like on VM 2 on our schema. It is useless to install cAdvisor on VMs that do not execute Docker.
Now that you have the theory, let’s dive deeper into our architecture. Our folders will be structured like that:
├── inventories -> hosts inventory files │ └── hosts.yml -> describes the different hosts ├── playbooks -> ansible playbooks ├── roles -> ansible roles
And in our
hosts.yml, you will find… the hosts! Whether it is an observer or a target, it will be referenced here. The whole monitoring stack is referenced here.
all: children: observer: hosts: padok-observer: ansible_host: 192.168.0.1 target: hosts: padok-observer: ansible_host: 192.168.0.1 padok-target-1: ansible_host: 192.168.0.10 padok-target-2: ansible_host: 192.168.0.11
You might find something weird in this file:
padok-observer is mentioned twice. This is because I lied to you earlier! The schema I gave you is missing something: self-monitoring.
This one is way better. Now we also have the first VM’s metrics. In fact, the observer is also a target. That’s why it is mentioned twice: once in the observer list, and once in the target one.
Now that everything is crystal-clear, we can start working on the observer.
Let’s set up the observer
For the observer, we will use three open-source software:
- Prometheus is a free software application used for event monitoring. It records real-time metrics that are collected through network calls in a time series database.
- Grafana is an interactive visualization web application. It provides charts and graphs when connected to supported data sources such as the Prometheus server.
- Prometheus Alertmanager is a web application that handles alerts sent by client applications such as the Prometheus server. It allows us to forward these alerts to a big range of software such as Slack, OpsGenie, etc.
These guys will interact together to create a full monitoring stack.
The configuration of Prometheus, Grafana, and Alertmanager is not the main topic of this tutorial. But you will find the entire codebase in our GitHub.
Moving back to Ansible. In order to make everything work correctly, in our Ansible roles we need to :
- Create the configuration folders
- Create the configuration files
- Create the application containers
This gives us this
- name: Create Folder /srv/prometheus if not exist file: path: /srv/prometheus mode: 0755 state: directory - name: Create Folder /srv/grafana if not exist file: path: /srv/grafana mode: 0755 state: directory - name: Create Folder /srv/alertmanager if not exist file: path: /srv/alertmanager mode: 0755 state: directory - name: Create prometheus configuration file copy: dest: /srv/prometheus/prometheus.yml src: prometheus_main.yml mode: 0644 - name: Create prometheus alert configuration file copy: dest: /srv/prometheus/prometheus_alerts_rules.yml src: prometheus_alerts_rules.yml mode: 0644 - name: Create grafana configuration files copy: dest: /srv/ src: grafana mode: 0644 - name: Create alertmanager configuration file template: dest: /srv/alertmanager/alertmanager.yml src: alertmanager/alertmanager.j2 mode: 0644 - name: Create Prometheus container docker_container: name: prometheus restart_policy: always image: prom/prometheus: volumes: - /srv/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml - /srv/prometheus/prometheus_alerts_rules.yml:/etc/prometheus/prometheus_alerts_rules.yml - prometheus_main_data:/prometheus command: > --config.file=/etc/prometheus/prometheus.yml --storage.tsdb.path=/prometheus --web.console.libraries=/etc/prometheus/console_libraries --web.console.templates=/etc/prometheus/consoles --web.enable-lifecycle published_ports: "9090:9090" - name: Create Grafana container docker_container: name: grafana restart_policy: always image: grafana/grafana: volumes: - grafana-data:/var/lib/grafana - /srv/grafana/provisioning:/etc/grafana/provisioning - /srv/grafana/dashboards:/var/lib/grafana/dashboards env: GF_AUTH_ANONYMOUS_ENABLED: "true" GF_AUTH_ANONYMOUS_ORG_ROLE: "Admin" published_ports: "3000:3000" - name: Create Alertmanager container docker_container: name: alertmanager restart_policy: always image: prom/alertmanager: volumes: - alertmanager-data:/data - /srv/alertmanager:/config command: > --config.file=/config/alertmanager.yml --log.level=debug published_ports: "9093:9093"
You might have some questions about some interesting particularities of the setup, for instance, how to handle a secret.
Using Ansible Vault and Jinja2 to handle a secret
In our monitoring configuration, we often have passwords or tokens. We call them secrets because, well, that’s what they are: secrets. Remember our objectives: we want to be able to store our code as well as our secrets safely on GitHub (or any other repo). But we don’t want our secrets to be readable by anyone! Well, worry no more: let me introduce to you Ansible Vault.
With a simple command, you can encrypt your secret. It will ask for an encryption password, do not forget it, you will need it at every deployment :
ansible-vault encrypt_string "password" --ask-vault-pass
It will give you something like:
!vault | $ANSIBLE_VAULT;1.1;AES256 64306663363562356132323065396635636630373031303739323666373262663961393132316333 6135653763363566303331313639633030623530646239310a353236343035643132646230333466 36336439376131333630346563323833313164353265313264643232373465633561663331396133 3163303166373166390a396131303239356139653063616437363933333130393563646338663933 3966
And voila! Your secret is now encrypted. You can use it with Jinja2 with the
template command (as we saw before) and your
--- prometheus_version: v2.40.1 grafana_version: "9.2.5" alertmanager_version: v0.24.0 alertmanager_smtp_password: !vault | $ANSIBLE_VAULT;1.1;AES256 64306663363562356132323065396635636630373031303739323666373262663961393132316333 6135653763363566303331313639633030623530646239310a353236343035643132646230333466 36336439376131333630346563323833313164353265313264643232373465633561663331396133 3163303166373166390a396131303239356139653063616437363933333130393563646338663933 3966
roles/observer/templates/alertmanager.j2, you can call the
alertmanager_smtp_password which will be decrypted when applied:
route: receiver: "mail" repeat_interval: 4h group_by: [ alertname ] receivers: - name: "mail" email_configs: - smarthost: "outlook.office365.com:587" auth_username: "email@example.com" auth_password: "" from: "firstname.lastname@example.org" to: "email@example.com"
I want to deploy!
Enough talking, let’s deploy our observer! Thanks to the tags in our playbooks, we have the ability to deploy only the observer.
This is our playbook in
- name: Install Observability stack (targets) hosts: target tags: - monitoring - target roles: - ../roles/target - name: Install Observability stack (observer) hosts: observer tags: - monitoring - observer roles: - ../roles/observer
In our Ansible Playbook command, we will specify that we want to execute only the roles with the
ansible-playbook -i ansible/inventories/hosts.yml -u TheUserToExecuteWith ansible/playbooks/monitoring.yml -t observer --ask-vault-pass
That’s it! You now have a fully functional observer. You can access Prometheus on port number 9090, and Grafana on the 3000. The only thing missing now are the targets and you will have installed a full monitoring stack.
Let’s set up the targeted ones
For the targeted ones, the setup will depend on what is on your machine. We will use two main agents: node-exporter and cAdvisor. As I’ve said before, node-exporter will focus on hardware and OS metrics, whereas cAdvisor will report Docker-related metrics.
Here, we will install both. This gives us this Ansible role:
- name: Create NodeExporter docker_container: name: node-exporter restart_policy: always image: prom/node-exporter: volumes: - /proc:/host/proc:ro - /sys:/host/sys:ro - /:/rootfs:ro command: > --path.procfs=/host/proc --path.rootfs=/rootfs --path.sysfs=/host/sys --collector.filesystem.mount-points-exclude=^/(sys|proc|dev|host|etc)($$|/) published_ports: "9100:9100" - name: Create cAdvisor docker_container: name: cadvisor restart_policy: always image: gcr.io/cadvisor/cadvisor: volumes: - /:/rootfs:ro - /var/run:/var/run:ro - /sys:/sys:ro - /var/lib/docker/:/var/lib/docker:ro - /dev/disk/:/dev/disk:ro published_ports: "9101:8080"
These two will expose their metrics on a specific port (9100 for node-exporter and 8080 for cAdvisor) and create an endpoint called
/metrics for Prometheus to scrape.
There is only one thing missing for our monitoring stack to work: Prometheus needs to be aware of the targeted ones. For that, we will add them to the
scrape_configs of the Prometheus config file (
scrape_configs: - job_name: prometheus scrape_interval: 30s static_configs: - targets: ["localhost:9090"] - job_name: node-exporter scrape_interval: 30s static_configs: - targets: ["192.168.0.1:9100", "192.168.0.10:9100", "192.168.0.11:9100"] - job_name: cadvisor scrape_interval: 30s static_configs: - targets: ["192.168.0.1:9101", "192.168.0.11:9101"]
You can now deploy the monitored ones. To do that, same as before, use the tags:
ansible-playbook -i ansible/inventories/hosts.yml -u TheUserToExecuteWith ansible/playbooks/monitoring.yml -t target --ask-vault-pass
Be careful, if you have modified the Prometheus config file, you will also need to redeploy the observer and restart Prometheus to apply the configuration.
Add a targeted machine
To add a targeted machine or VM, the steps are quite easy:
- Make sure that you can connect through SSH
- Add the IP of the targeted machine and its hostname under
targetin the inventory (
- Add the IP of the targeted machine in the
cadvisorif you also want to monitor the containers) of the observer machine (
- Run the
ansible-playbookcommand for both the observer and the targets
- That’s it!
We have now covered everything. All the codebase is available publicly.
Here is what it looks like (this is the node-exporter dashboard):
In fact, it might not be the end.
The use of templating with Jinja2 may allow you to automatically update the Prometheus config file in accordance with the
Also, remember when I told you that we will not use Kubernetes because it wasn’t simple enough? Well, I might have been wrong. Why? Because with Kube, the Prometheus operator does everything I’ve shown you automatically. The initial setup is longer, but it may be a better solution from case to case. It is up to you!