Summary

    Install metrics-server with helm to enable autoscaling

    To enable Kubernetes autoscaling you need to install the metrics-server on your cluster. A quick way to do so is by using helm:

    helm install stable/metrics-server --name metrics-server --namespace metrics

    Note that if you are using Weave network on EKS you need to append the following to the command above: --set hostNetwork.enabled=false

    You now have access to the hpa resource: kubectl get hpa

    Also, a number of interesting features come with the metrics-server such as the possibility to do kubectl top pod.

    Note that the metrics-server is installed by default on GCP.

    Set up autoscaling of Kubernetes pods

    Apply a load balancer service and a deployment from the luksa repo for instance:

    kubectl apply -f https://raw.githubusercontent.com/luksa/kubernetes-in-action/master/Chapter09/kubia-deployment-and-service-v1.yaml

    Apply the following manifest kubectl apply -f hpa.yaml

    Basically, the controller manager periodically compares the mean resource utilization of the pods managed by the targeted deployment with the resource request of the pods. If the ratio exceeds or fall under the targeted value specified in the hpa, the controller will trigger a scaling within the boundaries of the min and max replicas. For more information you can check the documentation here.

    In our case you can describe the kubia pods to verify the cpu request defaults to 100m and we have set our target cpu utilization to 10% in the hpa. The controller manager watches the metrics-server every 15s (default value of horizontal-pod-autoscaler-sync-period). So far if you leave the kubia app at rest for 5 min (default value of horizontal-pod-autoscaler-downscale-stabilization) it will scale down from the 3 pods defined in the deployment to 2 pods as defined as the min replica boundary. 

    Horizontal autoscaling is now set. Believe it or follow me to the next section for a quick test.

    Note that depending on your version of the autoscaling api the syntax of the manifest might differ. You can refer to the documentation. Also note that you can also scale replica controllers as and replica sets

    Test the autoscaling with locust

    Locust is a handy way to simulate load on your application. You can install it in a second following the documentation.

    First, expose your Kubernetes service with kubectl port-forward svc/kubia 8080:80

    Then copy the following script.

    In this small script we define an http get request on localhost:8080 that you just exposed. min_wait and max_wait define the boundaries for the waiting time of a user between two requests.

    You can now start the locust server locally with locust -f locust-get-test.py. A friendly user interface will be exposed on http://localhost:8089/.

    You can now for instance simulate 100 users at a hatch rate of 10, run kubectl get po -w and watch the pods pop up as load increases.

    We have set up horizontal pod autoscaling on a Kubernetes cluster and tested it with locust in few minutes. Subscribe to the newsletter for more articles on Kubernetes and autoscaling. And check this article for more Kubernetes quick tips.