Scale Kubernetes workloads with KEDA's event-driven automation

Posted on 31 August 2023, updated on 11 December 2023.

In this little blog post, we’re going to discover how to automatically scale your Kubernetes applications in an event driven way using KEDA.

Why do we want to automatically scale our applications?

As an SRE you are responsible for the optimal functioning of applications, their resilience, and their availability. Autoscaling is a concept that can answer this. You want to be sure your workloads will handle perfectly the traffic.

What is KEDA?

KEDA, or Kubernetes-based Event Driven Autoscaler, is a Kubernetes controller that will autoscale your applications based on the number of events needing to be processed.

KEDA is based on the concept of Scalers which are types of triggers or event sources from which we want to scale up our applications.

From your side, the only thing to do is to configure a ScaledObject (KEDA CRD) by choosing the scaler you want to use to automatically scale your application, as well as a few parameters, and KEDA will do the rest for you:

Monitor events sources
Create and manage HPA lifecycle

As of today, there are 62 built-in scalers and 4 external scalers available.

What makes KEDA nice is this lightweight component and the fact that it uses native Kubernetes components such as HorizontalPodAutoscaler. From my point of view, its "Plug and Play" approach is just wonderful 🤩.

Deploy KEDA

Well, the easiest way to deploy KEDA is to use their official Helm Chart as follows:

helm repo add kedacore

helm repo update

helm install keda kedacore/keda --namespace keda --create-namespace

⚠️ In case you're deploying KEDA with ArgoCD, you may encounter issues regarding the length of CRDs' annotations. You can use the template.sped.syncPolicy.syncOptions with the option ServerSideApply=true specifically for KEDA as a workaround. Also, you can disable CRDs deployment with helm chart's values file, but you'll have to find another way to deploy KEDA CRDs.

Automatically scale our sample web app based on the native cron scaler

It's time for us to play a little bit with KEDA!

Deploy our sample web app

For the demo, I'll use a small Golang web application that I made for this blog post. I deployed it using the following manifest:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: go-helloworld
  name: go-helloworld
spec:
  selector:
    matchLabels:
      app: go-helloworld
  template:
    metadata:
      labels:
        app: go-helloworld
    spec:
      containers:
        - image: rg.fr-par.scw.cloud/novigrad/go-helloworld:0.1.0
          name: go-helloworld
          resources:
            requests:
              cpu: "50m"
              memory: "64Mi"
            limits:
              memory: "128Mi"
              cpu: "100m"
---
apiVersion: v1
kind: Service
metadata:
  name: go-helloworld
spec:
  selector:
    app: go-helloworld
  ports:
    - protocol: TCP
      port: 8080
      name: http
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: nginx
    cert-manager.io/cluster-issuer: letsencrypt
  name: go-helloworld
spec:
  rules:
  - host: helloworld.jourdain.io
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: go-helloworld
            port:
              number: 8080
  tls: # < placing a host in the TLS config will indicate a certificate should be created
  - hosts:
    - helloworld.jourdain.io
    secretName: go-helloworld-tls-cert

Configure KEDA to automatically scale our web app on working hours only

Ok, let's imagine that we want our app to be available during working hours only. You might wonder why you do this. There could be several reasons for this.

For instance, in a development environment, there is not necessarily a need to keep applications up and running around the clock. In a cloud environment, it could save you a lot of money depending on the number of apps / compute instances you have.

Well, let's do this! 🤑

To achieve this, we're going to use the KEDA's native Cron scaler. Since the Cron scaler supports Linux format cron, it allows us to even scale our application during the working day too 😄

To configure the Cron scaler, we'll use the [ScaledObject] CRD is as follows:

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: go-helloworld
spec:
  scaleTargetRef:
    name: go-helloworld
  triggers:
  - type: cron
    metadata:
      timezone: Europe/Paris
      start: 00 08 * * 1-5
      end: 00 18 * * 1-5
      desiredReplicas: "2"

⚠️ The ScaledObject must be in the same namespace as your application!

Let's dive a little bit into this configuration:

spec.scaleTargetRef is a reference of your Kubernetes Deployment / StatefulSet or other custom resource you want to scale
- name (mandatory): name of your Kubernetes resource
- kind (optional): kind of your Kubernetes resource, the default value is Deployment
spec.triggers is a list of triggers to activate the scaling of the target resource
- type (mandatory): scaler name
- metadata (mandatory): configuration parameters that the Cron scaler requires With this configuration, my application will be up and running with two replicas from 08:00 until 18:00 every day of the week from Monday through Friday. Isn't that fantastic? 😀

Automatically scale our sample web app based on HTTP events with KEDA HTTP add-on (external scaler)

As you have seen, with all scalers available, we can automatically scale our web application in many ways, like on several messages in an AMQP queue, for instance.

Now you understand how KEDA works. We're going to explore how KEDA can help us handle traffic spikes by automatically scaling our application based on HTTP events. To do so, we have two choices:

Use the Prometheus scaler
Use the KEDA HTTP external scaler, which is working like an add-on Since I don't have Prometheus installed on my demo cluster, I'm about to use the KEDA HTTP external scaler (perfect excuse to introduce you the an external scaler 🙄).

💡 The KEDA HTTP ADD-ON is currently in beta phase. It is mainly maintained by the KEDA team.

Overview of the solution

The KEDA HTTP scaler is an add-on built on top of the KEDA core, which comes with its components: operator, scaler, and interceptor. If you want to know more about their roles, feel free to read the official documentation. Anyway, to help you better understand how it will work, I made you a little diagram:

blog_post_keda_overview

Install KEDA HTTP add-on

As this scaler is not a built-in one, we'll have to install it. As specified in the official documentation, we can install it with a Helm Chart:

helm install http-add-on kedacore/keda-add-ons-http --namespace keda

If everything goes well, you should see the following new pods:

❯ k get pods -l app=keda-add-ons-http -o name
pod/keda-add-ons-http-controller-manager-5c8d895cff-7jsl8
pod/keda-add-ons-http-external-scaler-57889786cf-r45lj
pod/keda-add-ons-http-interceptor-5bf6756df9-wwff8
pod/keda-add-ons-http-interceptor-5bf6756df9-x8l58
pod/keda-add-ons-http-interceptor-5bf6756df9-zxvw

Configure a `HTTPScaledObject` for our web app

As I said before, the KEDA HTTP add-on comes with its components, including an operator, which also means that it comes with its CRDs. The HTTPScaledObject is a CRD managed by the KEDA HTTP add-on. This is what we'll need to configure here. Let's create the HTTPScaledObject resource for our web app:

⚠️ The HTTPScaleObject resource must be created in the same namespace as your web app!

kind: HTTPScaledObject
apiVersion: http.keda.sh/v1alpha1
metadata:
    name: go-helloworld
spec:
    host: "helloworld.jourdain.io"
    targetPendingRequests: 10
    scaledownPeriod: 300
    scaleTargetRef:
        deployment: go-helloworld
        service: go-helloworld
        port: 8080
    replicas:
        min: 0
        max: 10

Here, we have configured our HTTPScaledObject to scale our app’s Deployment from 0 to 10 replicas, knowing that if there are 10 requests in a pending state on the interceptor (requests that are not yet taken by your application), then KEDA will add a pod.

Adapt our web app's service and ingress resources

If you take a look at the diagram above, you can see that our web app Ingress will need to reference the KEDA HTTP add-on's interceptor service instead of the web app one. Since the Ingress can't reference a service in another namespace, we are going to create a service of type external in the same namespace as our web app that references the interceptor service from the keda namespace:

kind: Service
apiVersion: v1
metadata:
  name: keda-add-ons-http-interceptor-proxy
spec:
  type: ExternalName
  externalName: keda-add-ons-http-interceptor-proxy.keda.svc.cluster.local

Now, we need to re-configure the web app's ingress so that it refers to the newly created service:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: nginx
    cert-manager.io/cluster-issuer: letsencrypt
  name: go-helloworld
spec:
  rules:
  - host: helloworld.jourdain.io
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: keda-add-ons-http-interceptor-proxy
            port:
              number: 8080
  tls: # < placing a host in the TLS config will indicate a certificate should be created
  - hosts:
    - helloworld.jourdain.io
    secretName: go-helloworld-tls-cert

⚠️ You need to put the name of the new service but pay attention to the port which is also replaced by the interceptor's service one

Let's try it!

To ensure that our configuration is working well, I'm going to use [k6], which is a load-testing tool. If you want to know more about k6, here are a few resources from the Padok blog:

Enough advertising; let's move on! 😁 Here is my k6 script with which I will do the test (with one or two changes):

import { check } from 'k6';
import http from 'k6/http';

export const options = {
  scenarios: {
    constant_request_rate: {
      executor: 'constant-arrival-rate',
      rate: 100,
      timeUnit: '1s', // 100 iterations per second, i.e. 100 RPS
      duration: '30s',
      preAllocatedVUs: 50, // how large the initial pool of VUs would be
      maxVUs: 50, // if the preAllocatedVUs are not enough, we can initialize more
    },
  },
};

export function test(params) {
  const res = http.get('');
  check(res, {
    'is status 200': (r) => r.status === 200,
  });
}

export default function () {
  test();
}

First, let's see what happens with 100 constant RPS:

❯ k6 run k6/script.js

          /\\      |‾‾| /‾‾/   /‾‾/
     /\\  /  \\     |  |/  /   /  /
    /  \\/    \\    |     (   /   ‾‾\\
   /          \\   |  |\\  \\ |  (‾)  |
  / __________ \\  |__| \\__\\ \\_____/ .io

  execution: local
     script: k6/script.js
     output: -

  scenarios: (100.00%) 1 scenario, 50 max VUs, 1m0s max duration (incl. graceful stop):
           * constant_request_rate: 100.00 iterations/s for 30s (maxVUs: 50, gracefulStop: 30s)


     ✓ is status 200

     checks.........................: 100.00% ✓ 3001      ✗ 0
     data_received..................: 845 kB  28 kB/s
     data_sent......................: 134 kB  4.5 kB/s
     http_req_blocked...............: avg=792.54µs min=0s     med=1µs     max=137.85ms p(90)=2µs     p(95)=2µs
     http_req_connecting............: avg=136.6µs  min=0s     med=0s      max=17.67ms  p(90)=0s      p(95)=0s
     http_req_duration..............: avg=11.38ms  min=7.68ms med=10.68ms max=100.96ms p(90)=12.78ms p(95)=14.33ms
       { expected_response:true }...: avg=11.38ms  min=7.68ms med=10.68ms max=100.96ms p(90)=12.78ms p(95)=14.33ms
     http_req_failed................: 0.00%   ✓ 0         ✗ 3001
     http_req_receiving.............: avg=89.68µs  min=8µs    med=64µs    max=6.35ms   p(90)=112µs   p(95)=134µs
     http_req_sending...............: avg=152.31µs min=14µs   med=137µs   max=2.57ms   p(90)=274µs   p(95)=313µs
     http_req_tls_handshaking.......: avg=587.62µs min=0s     med=0s      max=74.46ms  p(90)=0s      p(95)=0s
     http_req_waiting...............: avg=11.14ms  min=7.62ms med=10.48ms max=100.92ms p(90)=12.47ms p(95)=13.96ms
     http_reqs......................: 3001    99.983105/s
     iteration_duration.............: avg=12.37ms  min=7.73ms med=10.88ms max=194.89ms p(90)=13.07ms p(95)=14.99ms
     iterations.....................: 3001    99.983105/s
     vus............................: 1       min=1       max=1
     vus_max........................: 50      min=50      max=50


running (0m30.0s), 00/50 VUs, 3001 complete and 0 interrupted iterations
constant_request_rate ✓ [======================================] 00/50 VUs  30s  100.00 iters/s

💡 If you want to see live how many requests the interceptor has in its queue, you can launch in two terminals panes/tabs the following commands:

❯ kubectl proxy

Starting to serve on 127.0.0.1:8001

and:

❯ watch -n '1' curl --silent localhost:8001/api/v1/namespaces/keda/services/keda-add-ons-http-interceptor-admin:9090/proxy/queue

{"default/go-helloworld":0}

With the 100 RPS test, my application did not scale up because the number of pending requests in the interceptor queue did not exceed 1. As a reminder, we configured targetPendingRequests to 10. So everything seems normal 😁

Let's x10 our RPS and see what will happen:

❯ k6 run k6/script.js

          /\\      |‾‾| /‾‾/   /‾‾/
     /\\  /  \\     |  |/  /   /  /
    /  \\/    \\    |     (   /   ‾‾\\
   /          \\   |  |\\  \\ |  (‾)  |
  / __________ \\  |__| \\__\\ \\_____/ .io

  execution: local
     script: k6/script.js
     output: -

  scenarios: (100.00%) 1 scenario, 50 max VUs, 1m0s max duration (incl. graceful stop):
           * constant_request_rate: 1000.00 iterations/s for 30s (maxVUs: 50, gracefulStop: 30s)

     ✗ is status 200
      ↳  99% — ✓ 11642 / ✗ 2

     checks.........................: 99.98% ✓ 11642      ✗ 2
     data_received..................: 2.6 MB 86 kB/s
     data_sent......................: 446 kB 15 kB/s
     dropped_iterations.............: 18356  611.028519/s
     http_req_blocked...............: avg=1.07ms   min=0s     med=0s      max=408.06ms p(90)=1µs      p(95)=1µs
     http_req_connecting............: avg=43.12µs  min=0s     med=0s      max=11.05ms  p(90)=0s       p(95)=0s
     http_req_duration..............: avg=120.09ms min=8.14ms med=74.77ms max=6.87s    p(90)=189.49ms p(95)=250.21ms
       { expected_response:true }...: avg=120.01ms min=8.14ms med=74.76ms max=6.87s    p(90)=189.41ms p(95)=249.97ms
     http_req_failed................: 0.01%  ✓ 2          ✗ 11642
     http_req_receiving.............: avg=377.61µs min=5µs    med=32µs    max=27.32ms  p(90)=758.1µs  p(95)=2.49ms
     http_req_sending...............: avg=61.57µs  min=9µs    med=45µs    max=9.99ms   p(90)=102µs    p(95)=141µs
     http_req_tls_handshaking.......: avg=626.79µs min=0s     med=0s      max=297.82ms p(90)=0s       p(95)=0s
     http_req_waiting...............: avg=119.65ms min=7.95ms med=74.32ms max=6.87s    p(90)=188.95ms p(95)=249.76ms
     http_reqs......................: 11644  387.60166/s
     iteration_duration.............: avg=121.26ms min=8.32ms med=74.87ms max=7.07s    p(90)=189.62ms p(95)=250.28ms
     iterations.....................: 11644  387.60166/s
     vus............................: 44     min=25       max=50
     vus_max........................: 50     min=50       max=50


running (0m30.0s), 00/50 VUs, 11644 complete and 0 interrupted iterations
constant_request_rate ✓ [======================================] 00/50 VUs  30s  1000.00 iters/s

Not that bad 🧐 I think that the fact that we have two requests KO, is because of the application cold start (because it started from 0) and k6, which does not wait more than 1/2 seconds per request.

Here is the deployment history:

❯ k get deployments.apps -w
NAME            READY   UP-TO-DATE   AVAILABLE   AGE
go-helloworld   0/0     0            0           36m
go-helloworld   0/1     0            0           36m
go-helloworld   1/1     1            1           36m
go-helloworld   1/4     1            1           36m
go-helloworld   2/4     4            2           36m
go-helloworld   3/4     4            3           36m
go-helloworld   4/4     4            4           36m
go-helloworld   4/5     4            4           37m
go-helloworld   5/5     5            5           37m

As you can see, the application scaled up from 0 to 5 replicas; until the number of pending requests for the web application is less than 10.

From what I have observed, the scaling instructions were very fast, the app quickly reached the 5 replicas.

Here is a little comparison of the http_req_duration k6 metric between the 100 RPS and 1k RPS tests:

# 100 RPS
http_req_duration: avg=11.38ms  min=7.68ms med=10.68ms max=100.96ms p(90)=12.78ms p(95)=14.33ms

# 1k RPS
http_req_duration: avg=120.09ms min=8.14ms med=74.77ms max=6.87s    p(90)=189.49ms p(95)=250.21ms

Depending on our needs (SLOs, SLAs, etc.), we can maybe adjust a little bit the targetPendingRequests parameter of our web application HTTPScaledObject.

Scale to zero!

With the two examples we've covered in this article, you may already experience the scale to zero. However, do you know how it works?

As KEDA automatically scales applications based on events, from the moment an event is received, KEDA will scale the application to its minimum replica. For instance, if we take the case of HTTP add-on, KEDA will scale to the min replica at the first received request.