Posted on 11 November 2019, updated on 15 March 2023.

When facing sensible load variations on your Kubernetes infrastructure, you usually do not want to overscale it. A good solution might be to use autoscaling. This article aims to quickly get you started with horizontal pod autoscaling.

Install metrics-server with helm to enable autoscaling

To enable Kubernetes autoscaling you need to install the metrics-server on your cluster. A quick way to do so is by using helm:

helm install stable/metrics-server --name metrics-server --namespace metrics

Note that if you are using Weave network on EKS you need to append the following to the command above: --set hostNetwork.enabled=false

You now have access to the hpa resource: kubectl get hpa

Also, a number of interesting features come with the metrics-server such as the possibility to do kubectl top pod. Metrics-server offers you a lot of new Kubernetes resources metrics. 

Note that the metrics-server is installed by default on GCP.

Set up autoscaling of Kubernetes pods

Apply a load balancer service and a deployment from the luksa repo for instance:

kubectl apply -f https://raw.githubusercontent.com/luksa/kubernetes-in-action/master/Chapter09/kubia-deployment-and-service-v1.yaml

Apply the following manifest kubectl apply -f hpa.yaml

Basically, the controller manager periodically compares the mean resource usage of the pods managed by the targeted deployment with the resource request of the pods. If the ratio exceeds or falls under the targeted value specified in the hpa, the controller will trigger a scaling within the boundaries of the min and max number of replicas. For more information, you can check the documentation here.

In our case, you can describe the kubia pods to verify the cpu request defaults to 100m and we have set our target cpu utilization to 10% in the hpa. The controller manager watches the metrics-server every 15s (default value of horizontal-pod-autoscaler-sync-period). So far if you leave the kubia app at rest for 5 min (default value of horizontal-pod-autoscaler-downscale-stabilization) it will scale down from the 3 pods defined in the deployment to 2 running pods as defined as the min replica boundary. 

Horizontal autoscaling is now set. Believe it or follow me to the next section for a quick test.

Note that depending on your version of the autoscaling API the syntax of the manifest might differ. You can refer to the documentation. Also, note that you can also scale replica controllers as replica sets

Test the autoscaling with locust

Locust is a handy way to simulate load on your application. You can install it in a second following the documentation.

First, expose your Kubernetes service with kubectl port-forward svc/kubia 8080:80

Then copy the following script.

In this small script, we define an http get request on localhost:8080 that you just exposed. min_wait and max_wait define the boundaries for the waiting time of a user between two requests.

You can now start the locust server locally with locust -f locust-get-test.py. A friendly user interface will be exposed on http://localhost:8089/.

You can now for instance simulate 100 users at a hatch rate of 10, run kubectl get po -w and watch the pods pop up as load increases.

We have set up horizontal pod autoscaling on a Kubernetes cluster and tested it with locust in a few minutes. Subscribe to the newsletter for more articles on Kubernetes and autoscaling. And check this article for more Kubernetes quick tips.