kubernetes_infrastructure_crossplane

28 July 2022

Let’s say that for the business that you’re building, you need some IT infrastructure. To run a website for example. The pragmatic solution today is to build this infrastructure in the cloud, using AWS, GCP, Azure, or any other cloud provider.

 

Let’s say that for this task, your DevOps team wants to use infrastructure-as-code as it’s the most efficient way to provide cloud infrastructure. They also love Kubernetes (which we do too ❤️).

 

With all of this, you and your DevOps team should definitely consider using Crossplane, the tool that combines all of the above! In this article, we’ll cover what is Crossplane and a basic utilization example, how it compares with Terraform in some aspects, and feedback from how we used it at Padok.

What is Crossplane?

Crossplane is a tool created by Upbound which has been released in December of 2018. It has been accepted as an incubating project by the CNCF (Cloud Native Computing Foundation) in 2020.

One of their main catchphrases displayed on their website at the time, which is still true to this day, was: “Provision and manage cloud infrastructure and services using kubectl”. To make this clear for everyone, it means that using Crossplane enables you to use Kubernetes to control all of your cloud infrastructures. kubectl is the tool with which you interact with a Kubernetes cluster.

To use Crossplane, you will therefore need to have a Kubernetes cluster at your disposal. It needs to be able to reach the internet. Moreover, be careful about the resource (CPU, memory) sizing of your control plane if you do not use managed control plane services such as Amazon EKS or Google GKE.

You can install Crossplane by following their documentation. It’s packaged in a Helm chart and is pretty simple to configure. As simple as that:

kubectl create namespace crossplane-system
helm repo add crossplane-stable https://charts.crossplane.io/stable
helm repo update
helm install crossplane --namespace crossplane-system crossplane-stable/crossplane

But once you’ve done this, you won’t be able to do much! We have to cover what has been created in your Kubernetes cluster once you installed the chart.

Crossplane can be considered as a Kubernetes add-on, which means that it makes use of custom resources to provide all of its functionality. There are 4 kinds of resources, which we are going to describe:

  • Providers
    • They are the first kind of package in Crossplane’s terminology
    • A package is simply an OCI image, like a Docker image for example
    • It installs CustomResourceDefinitions to allow for the provisioning of resources on an external service like a cloud provider
    • As of today, providers exist for AWS, GCP, Azure, Datadog, Alibaba, GitLab, Github, Kubernetes, and many more
  • Managed resources
    • They are installed by Providers
    • They represent infrastructure resources
  • Configurations
    • They are the second kind of package according to Crossplane
    • They leverage the CompositeResourceDefinition and Composition features of Crossplane
  • Composite resources
    • They are defined using Crossplane configuration, as defined above
    • They group-managed resources together to allow for the creation of more complex, business-oriented infrastructure resources

In this article, we will focus on the first two kinds of resources, as Crossplane configurations and composite resources are useful for pretty advanced use cases.

Let us cover a simple example that will allow us to set up a really simple network configuration in AWS.

Basic end-to-end example

First and foremost for our example, to provision AWS resources with Crossplane we need to have a few things first. We’ll assume that we already have a Kubernetes cluster with Crossplane installed and an AWS account. Here is what we’ll do to bootstrap our example:

  • Create access keys for an IAM user with permissions on the right AWS services
  • Install the AWS Crossplane provider in our Kubernetes cluster
  • Create a provider config allowing Crossplane to use the created IAM user to manage AWS infrastructure

For the first step, go into the AWS IAM service dashboard, and create an administrator group with the AdministratorAccess AWS managed permission set. This will allow your Crossplane user to manage any kind of AWS resources for you. Next, create a crossplane IAM user which will belong in your administrator group (we could use an IAM role, but it would require more setup). Create an access key for this user and use them to create a crossplane-aws-credentials Kubernetes secret in the same namespace where Crossplane is installed in your cluster. The credentials key of the secret should contain the access key in the form of an AWS shared credentials file.

Here is an example of the secret manifest :

apiVersion: v1
kind: Secret
metadata:
  name: crossplane-aws-credentials
  namespace: crossplane-system
type: Opaque
data:
  credentials: <crossplane_user_credentials_base64_encoded>

The AWS part is done, now let’s write some YAML. You’ll need to install the AWS provider into your cluster, which can be done by applying this manifest:

apiVersion: pkg.crossplane.io/v1
kind: Provider
metadata:
  name: provider-aws
spec:
  package: crossplane/provider-aws:v0.28.0

After a moment, you should be able to see something like this in your cluster:

❯ kubectl get Provider
NAME               INSTALLED   HEALTHY   PACKAGE                                AGE
provider-aws       True        True      crossplane/provider-aws:v0.28.0        1h
❯ kubectl get pod -n crossplane-system
NAME                                            READY   STATUS    RESTARTS   AGE
crossplane-8547dd8dcd-tfpwq                     1/1     Running   0          2h
crossplane-rbac-manager-798d9cf5cf-w9rfc        1/1     Running   0          2h
provider-aws-600a696e071d-8444cb57d4-html5      1/1     Running   0          1h

Now we need to tell our new provider to use the credentials of the crossplane IAM user that we created earlier, which are stored in the crossplane-aws-credentials. For this, we need to create a ProviderConfig:

apiVersion: aws.crossplane.io/v1beta1
kind: ProviderConfig
metadata:
  name: aws-provider-config
spec:
  credentials:
    source: Secret
    secretRef:
      namespace: crossplane-system
      name: crossplane-aws-credentials
      key: credentials

Now, we are set to create our AWS resources using Crossplane! In this example, we will create a simple network composed of a VPC and a subnet. Here are the manifests for the VPC and the subnet respectively:

apiVersion: ec2.aws.crossplane.io/v1beta1
kind: VPC
metadata:
  name: sandbox-vpc
  labels:
    name: sandbox-vpc
spec:
  forProvider:
    region: eu-west-3
    cidrBlock: 10.10.0.0/16
    enableDnsSupport: true
    enableDnsHostNames: true
  providerConfigRef:
    name: aws-provider-config
apiVersion: ec2.aws.crossplane.io/v1beta1
kind: Subnet
metadata:
  name: sandbox-subnet
  labels:
    name: sandbox-subnet
spec:
  forProvider:
    region: eu-west-3
    availabilityZone: eu-west-3a
    vpcIdSelector:
      matchLabels:
        name: sandbox-vpc
    cidrBlock: 10.0.0.0/17
  providerConfigRef:
    name: aws-provider-config

Let’s look at them for a bit. The apiVersion shows us that we are using the EC2 service of AWS and the kind shows what kind of resource we are wishing to create. We then define the metadata of the Kubernetes resources, consisting of a name and a label reusing the resource’s name.

The spec part is the most interesting. We have two blocks:

  • The providerConfigRef block defines which ProviderConfig we want to use to manage this infrastructure resource
  • The forProvider block defines parameters specific to this kind of managed resource

The forProvider block is what you will spend most of your time on, as it defines your external resource configuration, as in our example, the CIDR block of the networks we wanted to create. This information is available in the documentation of the provider.

As you also may have spotted, the subnet resource is linked to the VPC resource using a selector on Kubernetes resource labels. This is how it’s possible to build entirely linked infrastructures using Crossplane.

After applying those manifests in your cluster, you should be able to see the resources in both Kubernetes and AWS. To see them in Kubernetes, just run kubectl get commands targeting the resource kind you want to describe, like kubectl get VPC.

How does it compare with Terraform?

After learning all of this, you could tell yourself: why bother using Crossplane when I can use Terraform to provision my infrastructure? And you’re absolutely right to ask this question. The finality of these two tools is the same: provision infrastructure using code. However, the way we go from code to cloud is very different! Let’s spend some time analyzing a few key differences between those two tools.

YAML or HCL?

The main differentiator between both technologies is the language you use to operate them. As Crossplane is Kubernetes native, it uses YAML to distribute its features. Terraform has its own programming language, HCL, which is something to learn before you can start using it efficiently.

By using YAML, Crossplane can be used in conjunction with powerful templating technologies such as Helm. It’s also a much simpler language to read than HCL. HCL, being a very specific language built only for Hashicorp tools, will confuse your Dev team if they ever try to dabble into your infrastructure’s code.

State management

A big part of Terraform is state management. The state in Terraform terms represents the mapping between your code and the real cloud infrastructure your code is supposed to create. It’s used to compute the differences between both of them and to know whether to make changes or not. Materially speaking, the state is a JSON file that you store locally or remotely.

With Crossplane, the state of your infrastructure is represented by all the Kubernetes resources you created. To consult your Crossplane state, you can just query the Kubernetes API using kubectl. For example, if you are using the AWS provider and manage RDS instances, you can just run:

❯ kubectl get RDSInstance
NAME             READY   SYNCED   STATE       ENGINE     VERSION   AGE
my-db            True    True     available   postgres   13.6      18d
my-other-db      True    True     available   postgres   14.2      7d6h



Reconciliation loop

A reconciliation loop can be explained using this simple schema:

reconciliation_loop_crossplane

It’s a very common concept in Kubernetes because it’s how operators work. Crossplane can be considered like an operator as it relies on Custom Resources to provide its functionality. The way that it’s materialized is that Crossplane regularly monitors its resources to watch for changes. For example, if you change a parameter on a managed resource, Crossplane will pick the change up and carry out the said change in the real infrastructure.

This is a key difference with Terraform, which works in a more asynchronous way. The only time when Terraform will reconcile the world with its state is when running the terraform apply command. It will then compute all the differences between both worlds and let the user choose whether to apply those changes to the existing infrastructure or not.

Immutable fields management

Whenever you start to manage cloud infrastructure, you will quickly learn about immutable parameters. For example, the name of an AWS VPC is an immutable field, which means that it cannot be changed during the lifecycle of this resource. With Terraform, if you created a VPC with your code, and then decide to change its name, Terraform will suggest you replace completely the resource by destroying the former VPC and creating a new one with the same characteristics, except for the name.

Crossplane functions the opposite way. When an immutable property of a managed resource is changed, Crossplane will never take it upon itself to destroy the resource and recreate it with the new property. This is in part due to the fact that Kubernetes does not support the definition of immutable fields for custom resources.

GitOps approach

Quickly said, Crossplane is a perfect match with GitOps practices. If you already manage your Kubernetes resources using a tool like ArgoCD or FluxCD, managing Crossplane and its dependent manifests should be very easy. Simply pushing commits on a git repository will conduct changes on your external infrastructure with such a setup. It’s really convenient and also how working on infrastructure as a team should be.

Implementing GitOps using Terraform is a tad more tedious because you have to rely on some sort of CI/CD machinery like a custom pipeline or tools such as Atlantis.

What we’ve learned

At Padok, we like to experiment with promising technologies like Crossplane. As we are already using Terraform for a while for our infrastructure projects, we have an acute eye on how Crossplane could help us tackle use cases that Terraform struggles to do. The reverse is also true.

Easier private resources management

One of the first things we’ve realized with Crossplane is that it makes the management of isolated resources (in network terms) easy. For example, let’s say that you need to provision a database for your application. You can easily create an AWS RDS instance with either Terraform or Crossplane, but you then need to add some users and create some databases to make it usable by your backend application.

This RDS instance should be in a private network to guarantee that it’s not accessible from the internet. Realistically, your Kubernetes cluster should be able to access the network of your RDS instance because your backend pods need to use it. With this, Crossplane should be able to access the RDS instance via the network out of the box. With a provider like the SQL provider, you can create the users and databases that your application needs.

To do such a thing with Terraform, you have to set up your network connection on your computer so that it can access private resources (with an SSH tunnel for example). Another solution would be to have your Terraform code running into a network connected to your database network, which is a big task in itself.

Infrastructure overview

Using Terraform, having an overview of your infrastructure is not easy. You cannot really be sure that all of your code matches your real-world infrastructure except when running plans. By using Crossplane, you leverage the reconciliation loop and availability of the Kubernetes API. You can check at all times if your infrastructure is in sync with your desired state.

With ArgoCD, it’s even easier to see the entire state of your infrastructure via its UI. Here’s what a very small set of Crossplane resources looks like:

infrastructure_overview_crossplane

With this, you can check in the blink of an eye whether your infrastructure is healthy or not. This is a big positive compared to the effort needed to do such a check with Terraform.

Provider maturity

As Crossplane is a pretty young technology, all of its features are pretty cutting edge, including the providers. If you compare it with Terraform, Crossplane is really late on this aspect because as an example, the AWS provider is in a pretty incomplete state. You cannot yet create read-replicas for an RDS instance or add SSH key pairs for an EC2 instance. Those small missing things can add a lot of complexity to your code if you need those specific resources.

A workaround that the Crossplane team is working on is the terrajet providers. It’s a tool that can process Terraform providers to generate Crossplane providers. Could this be the solution to our previous problem? Not entirely.

For once, you cannot link resources between providers. For example, you may have all your infrastructure built with the classic AWS provider. If a resource is not in this provider but can be created using the terrajet provider, you won’t be able to link them together as we did in our example.

Secondly, installing a terrajet provider can greatly hinder the performance of your Kubernetes APIServer. For example, the AWS terrajet provider installs more than 700 CRDs (about 160 for the classic one). That has the effect of slowing down any requests done to Kubernetes by a lot (a few seconds to dozens of seconds depending on what your cluster is running). An issue is open on the topic of adding a feature to only install a portion of the CRDs bundled by the provider.

Kubernetes ⇒ SPOF

By using Crossplane, you’ll need to make sure that your Kubernetes cluster is in working order at all times. If for any reason, your cluster becomes unavailable, you won’t be able to make any changes to your infrastructure without running the risk of having a difference between the Crossplane state and the real world.

Be also really careful with the permissions you give to your Kubernetes users. Deleting a Crossplane resource in Kubernetes may result in its destruction in the real world. Take the time to educate your team on how to interact with them.

At last, you will have to think about this question: how will you manage the Kubernetes cluster on which Crossplane runs? Would you be brave enough to manage it with Crossplane itself after provisioning it? Or will you leave some written documentation or Terraform code to bootstrap the needed cloud infrastructure? This is an open question!

Conclusion

We’ve covered the basics of Crossplane, how it compares with Terraform and some feedback bits that I’ve gathered while experimenting with this technology. My final opinion on Crossplane would be that it has a very interesting future, but that it’s still a bit too young to handle very complex business use cases. I hope that I gave you a good overview of this promising technology and that it will help you if you needed to make a choice.