5 minutes to set up horizontal pod autoscaling on a Kubernetes cluster

When facing sensible load variations on your Kubernetes infrastructure, you usually do not want to overscale it. Both your wallet and the planet would get hurt. A good solution might be to use autoscaling. You might want to consider autoscaling horizontally (number of instances) or vertically (resources of your instance) both on your pods and nodes. This article aims at quickly getting you started with horizontal pod autoscaling. It also provides a quick way to test it using locust.

Install metrics-server with helm to enable autoscaling

To enable Kubernetes autoscaling you need to install the metrics-server on your cluster. A quick way to do so is by using helm:

helm install stable/metrics-server --name metrics-server --namespace metrics

Note that if you are using Weave network on EKS you need to append the following to the command above: --set hostNetwork.enabled=false

You now have access to the hpa resource: kubectl get hpa

Also, a number of interesting features come with the metrics-server such as the possibility to do kubectl top pod.

Note that the metrics-server is installed by default on GCP.

Set up autoscaling of Kubernetes pods

Apply a load balancer service and a deployment from the luksa repo for instance:

kubectl apply -f https://raw.githubusercontent.com/luksa/kubernetes-in-action/master/Chapter09/kubia-deployment-and-service-v1.yaml

Apply the following manifest kubectl apply -f hpa.yaml

Basically, the controller manager periodically compares the mean resource utilization of the pods managed by the targeted deployment with the resource request of the pods. If the ratio exceeds or fall under the targeted value specified in the hpa, the controller will trigger a scaling within the boundaries of the min and max replicas. For more information you can check the documentation here.

In our case you can describe the kubia pods to verify the cpu request defaults to 100m and we have set our target cpu utilization to 10% in the hpa. The controller manager watches the metrics-server every 15s (default value of horizontal-pod-autoscaler-sync-period). So far if you leave the kubia app at rest for 5 min (default value of horizontal-pod-autoscaler-downscale-stabilization) it will scale down from the 3 pods defined in the deployment to 2 pods as defined as the min replica boundary. 

Horizontal autoscaling is now set. Believe it or follow me to the next section for a quick test.

Note that depending on your version of the autoscaling api the syntax of the manifest might differ. You can refer to the documentation. Also note that you can also scale replica controllers as and replica sets

Test the autoscaling with locust

Locust is a handy way to simulate load on your application. You can install it in a second following the documentation.

First, expose your Kubernetes service with kubectl port-forward svc/kubia 8080:80

Then copy the following script.

In this small script we define an http get request on localhost:8080 that you just exposed. min_wait and max_wait define the boundaries for the waiting time of a user between two requests.

You can now start the locust server locally with locust -f locust-get-test.py. A friendly user interface will be exposed on http://localhost:8089/.

You can now for instance simulate 100 users at a hatch rate of 10, run kubectl get po -w and watch the pods pop up as load increases.


We have set up horizontal pod autoscaling on a Kubernetes cluster and tested it with locust in few minutes. Subscribe to the newsletter for more articles on Kubernetes and autoscaling. And check this article for more Kubernetes quick tips.

Emmanuel Lilette

Emmanuel Lilette

Emmanuel is a Site Reliability Engineer (SRE) at Padok. He is one of our architect specialized in cloud (AWS and GCP) and load testing using Gatling tool.

What do you think? Leave your comments here !