Solve the challenges faced with on-premise Kubernetes cluster

Posted on 6 October 2022, updated on 6 March 2023.

Have you ever tried to deploy on-premise Kubernetes clusters, using your own bare-metal infrastructure? Because Kubernetes is primarily built for cloud environments, it includes ways to call cloud APIs to request specific infrastructure components. If you can't do that, you will have to provide these components one way or another.

In this article, we will explore the main difficulties that arise when building on-premise clusters, and how to solve them. We will focus firstly on the most obvious: cluster setup and maintenance. We will then see the main infrastructure components we have to provide to replace the role of a cloud provider. Finally, we will build a simple but fully featured on-premise cluster implementing these solutions.

Let's get started!

The specificities of on-premise clusters

Difficult cluster setup and maintenance

Almost all cloud providers have a managed Kubernetes solution. Using these solutions is the easiest way to deploy Kubernetes workloads: creating a cluster, as well as upgrading it to a new version of Kubernetes only requires a click and very few configurations.

Maintaining a production-grade Kubernetes control plane yourself requires time and a different knowledge base than the one required to deploy apps on a cluster. That’s why most of the time it is recommended to use a managed solution.

The official way to deploy on-premise Kubernetes clusters is using Kubeadm, which is not as simple as using cloud-managed solutions. The process contains many steps and pre-requirements, and although it's a good way to learn about the components of Kubernetes, it is quite error-prone and not easy to automize.

Furthermore, if you want to set up an on-premise cluster with high availability (three or more control plane nodes), it adds yet another level of complexity. Then you will need to deal with the day-to-day tasks of maintaining the cluster, such as renewing certificates, adding nodes, fixing various bugs, upgrading the Kubernetes version...

All of this represents a lot of work that you probably don't want to deal with when you just want to deploy your application to Kubernetes. However, I still think you should try setting up a cluster with Kubeadm at least once. Read through the official documentation if you encounter any issues.

Load-balancing and accessing your cluster

The second pain point of deploying on-premise clusters is about accessing your applications from outside the cluster. It seems obvious that, if you deploy applications to Kubernetes, you want to be able to access them from the Internet, or even just your local network. But as we'll see, this seemingly easy-to-solve requirement doesn't have an obvious solution, and I personally have lost quite some time understanding why.

Kubernetes provides 2 main ways to access your services from outside the cluster: NodePort Services, and LoadBalancer Services. The NodePort Service opens a random high-number port on each node of the cluster. It then routes traffic from this port to selected pod(s) that serve the app you want to access.

This is helpful for testing purposes, but you and your browser would typically expect web servers to serve on port 80/443 for example, which is not possible with this kind of Service.

A Service of the type LoadBalancer first opens up a high-number port on nodes that have a pod serving your app, as a NodePort would. But then it also interacts with the cloud provider's API to configure a load balancer that is external to Kubernetes.

The external load balancer can be configured to open any port you choose, and will route traffic to one of the nodes serving the app on the selected high-number port. The external provider also provides you with a single IP to access your cluster applications, and it is responsible for knowing the node(s) to which it should transfer your request.

There is a clear problem here: if my on-premise deployment doesn't have a dedicated node to be used as a load-balancer, how do I know which node/IP should serve my request? And if it does, what happens when this dedicated node fails?

Dynamic Volume Provisioning

Since Pods, like containers, are ephemeral and can't persist data, Kubernetes defines objects to help provision volumes for stateful Pods, that need to persist data. However, as with LoadBalancer Services, Kubernetes does not handle volume provisioning end-to-end: you need to have physical backend storage on which to write data, but Kubernetes does provide a way to interact with the multitude of existing storage backends (NFS, Ceph, cloud provider storage...).

Here's a recap of the different objects that handle storage in Kubernetes:

PersistentVolumeClaim (PVC): represents a request for a volume. Pods that need to persist data reference a PersistentVolumeClaim, which is then matched to a PersistentVolume by Kubernetes if it finds one with specified requirements.
PersistentVolume (PV) represents the actual storage backend. It has a size, type, and access mode (defines how many pods can access it simultaneously and if it is read-only). This object needs to be backed by an actual storage backend and is not ephemeral: you could define a new PVC, and if this volume is not in use, it could be reused and rewritten.
StorageClass (SC) is used for dynamic provisioning to define the types of storage available, and which provisioner handles them.

PersistentVolume creation is the part that is not directly handled by Kubernetes, and there are 2 ways to get around it:

Manual provisioning: the cluster administrator manually creates PersistentVolume objects, after having provisioned a storage backend: NFS, separate partition on a local drive, cloud provider volumes...
Dynamic provisioning: the administrator creates a StorageClass and installs a volume provisioner on the cluster. The provisioner is responsible to make calls to the cloud provider's API to create volumes on-demand to match created PVCs. For cloud deployments, dynamic provisioning is fairly easy, since Kubernetes comes bundled with ways to interact with all major cloud provider's volume backends, such as AWS EBS, so you just need to define the storage class. However for on-premise deployments, the primary solution is to have an external backend such as a Ceph cluster or NFS, that is not handled by Kubernetes, and install a provisioner for it in your cluster. We will explore another solution in this article, but you can also use the NFS provisioner, depending on your need.

Solutions

K3s

We will use K3s to set up an on-premise Kubernetes cluster, but as usual, there are other solutions.

K3s is a lightweight Kubernetes: it is fully compliant with Kubernetes (meaning you can deploy the same objects on both) but encapsulates all its components (apiserver, scheduler, even containerd...) in just 2 binaries: a server binary for control-plane nodes, and an agent binary for worker nodes. Below is an overview of K3s' architecture:

k3s

K3s is adapted for small clusters and on-premise deployments by removing cloud-specific components (such as support for all cloud provider's volume provisioners) and adding helpful components such as Traefik Ingress Controller, a service load-balancer, the local-path volume provider...

As we'll see in the last part of this article, K3s installation is also as easy as it gets while remaining highly configurable: the bash installer accepts environment variables to enable/disable components and configure them. Then to update the K3s version or change install parameters, simply re-run the install script with the desired INSTALL_K3S_VERSION or other parameters, starting with the master node.

K3s can also be used to deploy on-premise high-availability clusters. It formerly relied on an external backend that you had to provision yourself (for example an external ETCD cluster, or even a PostgreSQL cluster). Although this solution is still supported, K3s can now also handle a distributed ETCD cluster itself.

MetalLB

To solve the problem of not having a cloud load-balancer, we can use MetalLB, which is a load-balancer implementation for bare-metal clusters. It has 2 functioning modes, but we will only use the basic "Layer 2" mode.

In layer 2 mode, we give MetalLB a set of IP addresses behind which it will make our service(s) available. Ideally, you would only give 1 IP to MetalLB, and use an ingress controller to route requests to your specific services! When a request for this IP arrives in the network, a node in the cluster will send ARP responses for it, so that requests are directed to this node. There is an automatic failover in case this specific node fails.

One drawback of MetalLB is that it cannot be used to loadbalance the Kubernetes apiserver IP in a high availability cluster. There is an open issue on this subject. Other solutions exist for this use case such as kube-vip, but you would need to test how they can co-exist with MetalLB.

Longhorn

One solution that enables us to dynamically provision volumes for on-premise deployments and that is fully managed inside Kubernetes is Longhorn. It basically creates replicas of your volumes on multiple nodes' disks, so that they can be recreated in case of node failure. The following diagram explains how Longhorn works, but reading its documentation would definitely help get a better understanding of it.

longhorn

Longhorn also provides a quite helpful web UI and other features such as pushing backups to S3. The latest version also introduced a killer feature: it can now provision ReadWriteMany type volumes, using an embedded NFS server. This means you can use Longhorn to provision volumes that can be mounted and modified by multiple pods simultaneously!

One drawback I found to all this awesomeness is that Longhorn can end up consuming a lot of resources in your cluster, specifically the CPU. This is understandable given the complex nature of its mission, but you'll have to test and decide if its features are worth the cost in resources. One way I've found to mitigate this issue is using Linux distributions that use the most up-to-date Linux kernel, for example using Ubuntu server instead of Debian server.

Longhorn can also obviously use quite a lot of disk space, so I'd recommend using the local-path provisioner deployed by K3s instead of Longhorn when deploying stateful applications that already handle replication themselves, such as database clusters.

Practice: Setting up an on-premise cluster

Install and configure K3s

The first step of deploying our on-premise cluster is the installation of K3s on each machine. The principle is to execute the install script, which you can find at https://get.k3s.io, on each machine. We provide a few environment variables to the script to configure our installation, but you can find the full list of configurable variables in the documentation. We first create the control plane node using the following command:

curl -sfL https://get.k3s.io | INSTALL_K3S_VERSION="v1.25.0+k3s1" INSTALL_K3S_EXEC="--disable=servicelb --cluster-init" sh -

You can disable specific components of K3s using the --disable flag. For example, in our install, we disable servicelb because we will use MetalLB as our load-balancer. The --cluster-init flag tells K3s to initialize an embedded ETCD cluster as its backend.

To add nodes to our cluster, we need 2 pieces of information:

An authentication token. It was created by the first control plane node, and is located in /var/lib/rancher/k3s/server/node-token. Set it on the worker node in a K3S_TOKEN variable.
The IP of the control plane node. The worker node needs to be able to join the control plane node using this IP. Store it in a CP_IP variable on the worker node.

We can then add a worker node using this command:

curl -sfL https://get.k3s.io | K3S_URL="https://${CP_IP}:6443" K3S_TOKEN=$K3S_TOKEN INSTALL_K3S_VERSION="v1.25.0+k3s1" sh -

Install and configure MetalLB

There are multiple ways to install MetalLB, we will use Helm to do so. Simply add the repository and install MetalLB:

helm repo add metallb https://metallb.github.io/metallb
helm install metallb metallb/metallb -n metallb-system --create-namespace

You should see a few pods running in the metallb-system namespace.

You then need to configure MetalLB with the IP addresses that it can handle and attribute to your services. These IPs should be from the same network that your cluster is deployed to but should be in a range that is excluded by the DHCP server, meaning that no other machine in the same network should have this IP.

To configure MetalLB, apply the following objects using kubectl:

apiVersion: metallb.io/v1beta1
kind: IPAddressPool
metadata:
  name: my-ip-pool
  namespace: metallb-system
spec:
  addresses:
  - 192.168.10.0/24         # Configure your chosen IPs here
  - 192.168.9.1-192.168.9.5 # This is an other way to write the IP range
---
apiVersion: metallb.io/v1beta1
kind: L2Advertisement
metadata:
  name: my-l2-config
  namespace: metallb-system
spec:
  ipAddressPools:
  - my-ip-pool

Install and configure Longhorn

There are a few requirements that all the nodes in your cluster need to respect before setting up Longhorn. You can execute the check script that will notify you if one of the nodes in your cluster doesn’t respect the requirements. From a machine that has access to your cluster, run:

curl -sSfL https://raw.githubusercontent.com/longhorn/longhorn/v1.3.1/scripts/environment_check.sh | bash

Once the requirements are satisfied, Longhorn can be installed very easily using helm:

helm repo add longhorn https://charts.longhorn.io
helm install longhorn longhorn/longhorn --namespace longhorn-system --create-namespace

Check that the pods are running in the longhorn-system namespace.

Since there is no authentication on the UI, the ingress is not deployed by default. You can access the UI by enabling ingress deployment in the Helm values, or by port-forwarding. You can also try deploying PVCs and Pods to use the newly created longhorn StorageClass!

With these 3 simple parts, we have created a fully functional on-premise Kubernetes cluster with minimal difficulty.

Conclusion

Since Kubernetes was originally built for the cloud, deploying on-premise clusters presents a lot of challenges. Hopefully, given the modular nature of Kubernetes and the thriving ecosystem surrounding it, most of the problems already have one or even multiple solutions! Feel free to experiment with the components presented and replace them if you found a solution that better suits your needs.

I hope that this guide was helpful and that you won't struggle as much as I did when setting up your first on-premise cluster, or that it gave you ideas to improve an already existing cluster!