Posted on 12 April 2021, updated on 15 March 2023.
I wanted to share my experience with evicted pods and especially explain what the « evicted » state of a Pod was. Before we get into eviction, it’s mandatory to understand how Kubernetes handles higher priority pods, also known also known as Quality of Service Classes (QoS Classes).
A little bit of vocabulary first! To understand well this article, we need to speak the same language, so you'll find below a few essential concepts to understand.
Requests and limits:
- Requests: Parameter used for Pod's scheduling. This parameter is the minimum amount of resources that a container needs to start. Requests do not mean that the resource is dedicated to the Pod.
- Limits: This is the maximum amount of a resource that the node will grant to the containers to use.
- "Guaranteed": Pods that have requests and limits setup on both memory and CPU resources:
- Every Container in the Pod must have a CPU limit and a CPU request, and they must be the same.
- Every Container in the Pod must have a memory limit and a memory request, and they must be the same.
- "Burstable": Pods with at least a request setup on the CPU or memory for at least one of their containers.
- "Best effort": Pods without any requests or limits.
What is an evicted pod?
Now we know what requests/limits are and that pods have classes, we will now deep dive into the evicted process.
When a node reaches its disk or memory limit, a flag is set on the Kubernetes node to indicate that it is under pressure. This flag also blocks new allocations on this node, and following this, an eviction process is started to free some resources.
This is the under-pressure node's Kubelet who will take care of the eviction process. This one will start to fail Pods until the node's used resources are under the eviction threshold, which means that the Kubelet will terminate all Pod's containers and set its PodPhase as Failed.
If a Deployment manages the evicted Pod, the Deployment creates another Pod to be scheduled by Kubernetes.
How are resources freed?
The first thing that the Kubelet will do is freed the disk by deleting non-running pods and their images (this is a quick win). Then, if the disk cleaning is not enough, the Kubelet will launch a pods' eviction in this precise order:
- Best effort pods
- Burstable pods and which are using more resources than the request set on the resource makes the node suffering
- Burstable pods and which are using fewer resources than the request set on the resource makes the node suffering
For instance, let's take a node that has some CPU issues. If a Pod has a request on the CPU resource and uses half of its CPU request, it will be evicted after a pod with a request on the CPU resource but which uses more than its request.
As for Guaranteed pods, they are, in theory, safe in the context of an eviction.
The most important:
As you may understand, it is imperative to set requests and limits on your pods correctly.
What you can do is to set your critical applications as Guaranteed, most of them Burstable, and the non-critical applications are fault-tolerant in Best effort.
Use case: evicted Prometheus Pods
A few months ago, it happened to me that my Prometheus server pod was evicted. If you take a look at the Pod's events, you could see a message about "memory usage exceeds" looking like:
Message: The node was low on resource: memory. Container prometheus-server was using 2890108Ki, which exceeds its request of 2000Mi.
Here are the requests configured on this pod:
$ k describe pods prometheus-server-5c949c44f7-rc9sv | grep -iA2 Requests
Well, it's not shocking that the Pod is consuming more than its memory request. The problem is, in the case where our node on which the Pod is running is in trouble with its memory (which is my case here), then our Pod will be evicted fast enough, just after the best effort ones.
To go further
Suppose you want to know more about the eviction process, and know how to prevent pod eviction. In that case, I encourage you to read this article from Kubernetes' official documentation, which explains the configuration of "Out of Resource Handling" more deeply.
This documentation covers eviction signals, eviction threshold, etc.