Tech Radar Cloud - Resilient

Adopt

Kyverno

Open source, easy to use, and efficient, Kyverno ensures that good security practices are respected at runtime in Kubernetes.

While most workload security parameters need to be checked as soon as possible, especially in CI/CD, policy enforcement directly integrated into the Kubernetes validation engine ensures compliance with best practices.

As Kubernetes is a critical part of the infrastructure, it is necessary to impose good security practices. Natively, the Kubernetes control plane doesn't offer the possibility of fine-tuning custom security policies (the Admission Policies Pods available since 1.25 enable basic management).

Kyverno is a policy engine in Kubernetes. The power of Kyverno lies in the simplicity of writing policies in yaml. In status Incubating since 2020 within the CNCF incubator, Kyverno has experienced strong traction compared to its main competitor OPA Gatekeeper.

Based on Kubernetes validationWebhook and mutationWebhook, the tool offers a wide range of pre-written policies. For example, require the presence of request/limit for all pods or forbid the creation of privileged pods.

Kyverno also offers an elegant way of adding security configurations on the fly via Webhooks mutations:

Add an HTTP proxy as a pod environment variable
Create RBAC rights dynamically when creating a namespace

However, we advise you to limit the use of mutation policies, which can be detrimental to understanding during incidents. As the resource is no longer defined as code, the person in charge of debugging must know this mechanism.

Warning: Like all tools that take advantage of Kubernetes webhooks, Kyverno can be a SPOF if its FailurePolicy is set to Fail: if Kyverno is down, the Server API cannot validate requests and therefore authorizes no action. A good practice is to whitelist critical namespaces such as the kube-system to avoid any problems.

Vault

Vault is an essential secret management tool for all organizations.

Hashicorp Vault is already over 8 years old and still the community's most popular secrets management tool. This is thanks to the countless features and improvements that Vault has brought to this sector:

Dynamic secret generation
Fine-tuned secret access management following an RBAC pattern applicable to any type of identity (users, groups, machines)
Integration of different authentication sources (e.g., Kubernetes)
Certificate management directly via an API

Two major differences set Vault apart from its competitors. Firstly, its integration with different identity sources (e.g., AWS, GCP, Kubernetes) enables your workloads to authenticate themselves and be authorized to retrieve their various secrets automatically. But also the various dynamic secret engines (Database, AWS, GCP) that enable secrets to be rotated with each use.

We have just one caveat to its adoption in self-hosted mode: Vault is a complex tool and may not be suitable for a small organization. Like any technology, it adds a cost of operability, and mismanaging your backups or failing to anticipate an update could lead you straight into a corner.

As secret management is a central feature of any platform, don't minimize this cost of operability. If your team is already overloaded, consider using the SaaS version of Vault or your Cloud Provider's secret management services.

We can only recommend it today, as it has enabled us to solve complex problems (e.g., mTLS on-premise) without adding prohibitive maintenance costs.

Hub and Spoke

Hub and Spoke is an organizational and network model for managing cloud accounts. It enables resources used by applications or businesses (Spoke) to be isolated with an account (Hub) that centralizes network communications and shares tools with all spokes.

It's easy to create resources in the cloud, but infinitely more complex to organize them on a large scale and guarantee their security. Companies with many applications and a strong need for governance mainly adopt the Hub and Spoke model.

The Hub account is the central account. It centralizes communications (incoming and outgoing), monitors them, and reinforces isolation between the networks of different environments or applications. It also centralizes security policies, which are then inherited by Spokes accounts.

Spokes contain everything you need to host workloads. These accounts are provisioned according to the needs of your applications, inheriting your organization's best practices and policies, and connected to the Hub to take advantage of centralized network management. The level of team autonomy on the Hub will be defined according to your organization and security constraints.

In conclusion, a Hub and Spoke infrastructure guarantees a high level of security on a large-scale infrastructure. Implementing a Hub and Spoke is often used for large organizations, but it's best implemented in small organizations to ensure a solid, scalable foundation.

SonarQube

SonarQube is an open-source analysis tool for ensuring the quality and security of your code. It supports a large number of different programming languages.

The quality and security of application code must be ensured throughout the development process, particularly by using tools integrated with CI/CD. SonarQube combines static and dynamic code analysis to search for bugs, code smells, and vulnerabilities, ensuring code security, reliability, and maintainability.

SonarQube has many rule sets available (e.g., OWASP Top 10) in over twenty programming languages. What's more, you can add your own rules.

A criticality level is also proposed for each result, giving an overall score for an analysis divided into 4 categories. It is thus possible to define blocking standards for deployment pipelines and impose a certain code quality or security level throughout the development process.

An intuitive administration console is available, listing the various problems encountered in the code. It provides an overview of the various projects scanned.

Beware, however: the management console contains a great deal of sensitive information and access to it must be adequately protected. Sonarqube's RBAC rights management system is simple but effective.

Price: SonarQube is available as SaaS (SonarCloud) or Open Source. We recommend starting with SaaS, which remains affordable from the outset: €10 per month for less than 100k lines of code.

Trial

Checkov

Checkov is a static code analysis tool for your Infrastructure As Code, enabling you to reinforce and verify the application of best practices, particularly regarding security.

Automating development, deployment, and infrastructure management workflows makes it easier for security teams to monitor the application of best practices.

Checkov is a free, open-source static code analysis tool that allows you to check the conformity of your Infrastructure as Code (IaC).

Checkov supports several backends, such as :

Terraform, Ansible, Cloudformation for managing your infrastructure
Dockerfile, Helm, Kubernetes for deploying your applications and tools
Argo Workflows, Azure Pipelines, BitBucket Pipelines, Circle CI Pipelines, GitHub Actions, and GitLab CI workflow for your CI/CD workflows
But also Serverless Framework

Ideally, the tool can be easily integrated into a CI/CD pipeline before applying your changes (or merging them into our main branch). Automating the security check will take some of the load off your code review. You'll be able to add new rules linked to your constraints (regulatory, for example) while deactivating some that may not be relevant to your context. It's also an excellent tool for auditing controls in a regulatory context, such as ISO 27001, PCI DSS, or SOC2.

Like any static code analysis tool with default rule sets, Checkov will require a rule adjustment (exclusion) period to adapt it to your needs and avoid false positives. It will also require a governance process (control, audit, adjustment) to prevent manual bypasses from proliferating in your code, rendering it useless.

Checkov could become an essential tool for security teams by automating some of the checks. It's an interesting, agnostic option if you don't already have it in a paid security tool suite.

gVisor

gVisor is a technology that secures container execution by providing a layer of isolation between containerized applications and the host system. Simple and effective, it does not support all Linux system calls.

Running containerized applications offers a clear advantage over virtual machines regarding operability and cost. Nevertheless, the isolation level of a VM and a container are not identical. A VM has its own kernel, whereas containers share the host kernel. There are many ways of escaping from a container: misconfiguration, vulnerable kernel, dangerous assembly, etc.

gVisor implements a virtual environment that enables containers to be run isolated and securely. It uses a "sandboxed" environment in which containers are executed. This isolation enables the filtering of system calls (syscalls) made by the application to the host kernel.

A system call is a request made by a program to an operating system to perform a specific operation, such as reading or writing data to disk. gVisor can intercept system calls made by containerized applications. It acts as a filtering intermediary between the application and the operating system kernel. By filtering out dangerous calls, gVisor helps strengthen your environment's security.

gVisor, therefore, acts as an isolation layer for the execution of your containers. It is similar in operation to Kata Container, which we have also included in this tech radar, but implementation is not based on virtualization. gVisor is lighter because it implements a simple syscall control sandbox, not full virtualization. This enables faster execution without the need for additional resources as is the case with virtualization.

However, gVisor does not yet support all Linux system calls for the amd64 architecture (the list can be found in the documentation). We, therefore, advise you to test each application before migrating it to a gVisor runtime.

We liked gVisor for its ease of operation and implementation. gVisor can be implemented with a single click on GKE. However, as mentioned above, it is not compatible with all syscalls, which can be a hindrance to its implementation.

Cilium

Cilium is an extremely powerful open-source network plugin for Kubernetes that offers advanced security and observability features with ease of access. It is currently considered the market benchmark in its field.

Cilium is an open-source network NIC project for Kubernetes, which evolved from CNCF in October 2022. It offers advanced features such as transparent traffic encryption between pods, advanced network filtering, enhanced traffic observability, and multi-cluster communication.

Cilium's great strength is that this CNI is based on eBPF (Extended Berkeley Packet Filter). eBPF is a Linux kernel technology that can run isolated programs in kernel space. It is used to safely and efficiently extend kernel capabilities without modifying the kernel source code or loading additional kernel modules.

This results in several benefits: performance is significantly higher than with iptables used by kube-proxy. Thanks to ePBF, Cilium does not require any kernel module to be installed so that it can run without any prerequisites on any server with a sufficiently recent Linux kernel (kernel 4.19).

Cilium offers many essential features when your Kubernetes cluster becomes your infrastructure's central and critical point, particularly in terms of security and observability. For example, it's possible to perform Layer 7 network filtering with CiliumNetworkPolicy CRD or gain in-depth visibility into Layer 7 traffic using metrics exposed by Cilium.

Cilium also makes it easy to encrypt traffic between pods. For those who use a Service Mesh solely for mTLS encryption, Cilium offers a simpler solution.

Today, Cilium has become the CNI par excellence in Kubernetes, so much so that it is now used by Datadog or in data-plane v2 from GKE (Google Kubernetes Engine). However, this technology remains complex to use and requires good technical skills within your teams.

Falco Security

Falco is an intrusion detection tool for Kubernetes. It relies on dynamic container analysis to raise security alerts in the event of suspicious behavior.

Falco is an open-source tool developed by Sysdig, which has been part of CNCF since 2018. It detects suspicious behavior in containerized environments. It, therefore, integrates perfectly with Kubernetes.

Falco uses dynamic analysis rules and monitors syscalls. A major advantage of Falco over other HIDS (host-based intrusion detection systems) is the ability to listen to Server API events. When a potential threat is detected, Falco sends an alert providing detailed information on the event that triggered it. In particular, it can detect binary mutations, the elevation of privileges, suspicious use of SSH services, etc.

Falco is highly customizable, enabling users to create their own rules and alerts to meet their specific security needs. It can be easily integrated with other security tools and services to provide a complete security solution.

Falco is an essential element in ensuring the security of a Kubernetes cluster. We recommend setting it up as long as your teams have the time to process and configure alerts. As with all intrusion detection tools, the main difficulty is fine-tuning the configuration to produce the right volume of alerts. You need to have both qualitative alerts on the most critical rules and quantitative alerts to detect noisier attacks.

It should also be noted that problems persist with Falco's interconnection with the kernel: for example, in an EKS cluster, Falco kernel modules are generally published 2 weeks after the publication of a new AMI by AWS, which delays the update of nodes.

We recommend this tool if your team has the bandwidth to process alerts daily. Otherwise, you'll just be feeding "alert fatigue."

Linkerd

Linkerd is an up-and-coming Service Mesh technology that makes it easy to implement encryption and authorization of communications in Kubernetes clusters.

Linkerd has been a CNCF "graduated" Kubernetes Mesh Service since July 2021. Linkerd promises to offer the community an extremely simple and fast Service Mesh.

A Service Mesh is an infrastructure component that facilitates communication between microservices. It enables a number of advanced functionalities, including mTLS encryption of inter-service communications, strong observability, fault injection, etc.

Compared to other Mesh services such as Istio, Linkerd's architecture is straightforward. The main reason is that the number of Linkerd functionalities is deliberately limited and can be extended thanks to several plugins: tracing thanks to Jaeger and canary releases thanks to Flagger. Linkerd still lacks certain functionalities, notably JWT header-based routing.

Linkerd's performance is very good, thanks to its minimalist proxy written in Rust. The various benchmarks available on the Internet indicate memory and CPU consumption on the order of 10% of Istio's.

Finally, a very important point concerning the cost of setting up this Mesh service is that Linkerd does not contain an ingress controller. This means you can add it when needed without having to change the ingressClass of all services.

We recommend you try Linkerd, which has the advantage of being much simpler to install and maintain than its better-known competitors. This Mesh service offers most of the expected functionalities, notably via the plugin system.

Assess

BuildKit

Buildkit is a container image build engine that works with low privileges while fully covering the possible instructions for a Dockerfile.

Today, building container images is a central CI/CD issue for containerized applications. But it's also often a costly operation involving compiling binaries, and thus a heavy RAM and CPU load.

So, in order to adapt to a heavy but highly variable load depending on the developments in progress, it's comfortable to carry out these builds in Kubernetes clusters, often shared with other tools.

To ensure the security of these clusters, it is essential that the tools they host, and therefore in particular container image build tools, are run without privileges.

Historically, container image builds have required a privileged process. As a result, a developer with the right to launch a build pipeline could take control of all containers running on the same node.

Fortunately, a number of Linux kernel upgrades over the last few years have made it possible to launch containers with virtually no privileges. And build engines have adapted accordingly. Tools such as Buildkit, Buildah, and Kaniko all consider these new possibilities in their own way.

Among these tools, we currently favor Buildkit for our projects, as it seems to be the only one among them to really manage all the possible instructions of a Dockerfile. What's more, it's built on a client-server model, which means we can take advantage of a common cache and deliver excellent performance, thus reducing the time needed to build.

A security feature that speeds up development - it's possible!

Zero-trust architecture

Zero Trust is an architecture model that no longer bases the security of an IT system solely on the defense of its perimeter but on the total absence of trust between parties. It's a highly secure and difficult model to implement, in which every connection must be authorized.

The basic principle of Zero Trust architecture is to trust no one or no device by default, even if they're inside the network, and to constantly check the identity and permissions of everything that attempts to access corporate resources. No implicit trust is placed in a user or service based on where the request originates.

Zero trust doesn't mean you should stop restricting access to resources via network filtering! The Zero Trust model assumes that the network will necessarily be compromised and the perimeter will fail, forcing users and devices to prove their legitimacy.

Zero Trust is particularly well-suited to Cloud infrastructures, which often have a complex network typology comprised of several interconnected virtual networks exposed on the Internet, where the very notion of the perimeter has endless meaning, especially in serverless infrastructures...

This approach is attractive but complex to implement. Each user and application must use a centralized authentication system (e.g., MFA for users, SSL certificates for servers). Typically, an identity broker links external terminals (user workstations) and services hosted on the infrastructure (e.g., Boundary). Services such as CloudFlare Zerotrust greatly simplify the implementation of this type of architecture.

The Zero-Trust concept is, therefore, particularly well suited to companies with a heightened need to protect their data but requires a major investment to set up.

Boundary

Boundary is a tool designed to replace existing systems for accessing your internal networks.

Boundary is a tool developed by Hashicorp and open-sourced in 2020. It is designed to replace your Bastion or VPN solutions, improving their management, security, and user experience.

Boundary provides the same user experience as Kubernetes' port-forwarding feature but across your entire infrastructure (e.g., DBs, VMs, WebServices, etc.). It runs without forgetting the audibility and access segregation necessary for this tool via audit logs, the ability to revoke sessions in progress, and a whole RBAC access management logic based on users or groups.

It consists of 2 parts:

A central controller that will contain the configuration of the various targets to which your employees will need access, as well as all rights management.
Workers access your various internal networks and act as gateways

Boundary's great strength lies in its ease of installation and operation. Its configuration may be complex, but its integration with Terraform means it can be automated like the rest of your infrastructure's configuration.

Boundary is still in its infancy, and we are not yet fully satisfied with the user experience it offers. In particular, the CLI requires the user to be familiar with Boundary resource identifiers, which are not intuitive. Nevertheless, we're confident that the project will become a benchmark for this type of tool.

Therefore, we place it in "Assess" and continue to evaluate its main competitors (e.g., Teleport).

Kata containers

Kata Containers is a virtualization technology for setting up containers. It provides strong isolation for containerized applications but can be complex to install.

Kata Containers uses lightweight virtual machines to provide hardware isolation for containers, thus protecting host systems from potential vulnerabilities brought about by containers. Each container operates as if it were running on a separate VM while benefiting from the advantages of lightweight virtualization, such as rapid start-up and low resource utilization.

However, Kata is only a runtime, i.e., it limits itself to creating and running the container, whereas Docker provides additional functionalities, such as image management, data storage, networking, or orchestration.

Like gVisor, Kata Containers intercepts system calls made by containerized applications. Unlike gVisor, however, it uses a lightweight hypervisor-based virtualization approach to provide this isolation. This provides additional security by preventing applications from running directly on the host. Nevertheless, this requires additional resources and a slower start-up. On the other hand, benchmarks show that containers run faster on Kata Containers than on gVisor.

Unlike gVisor, which has a built-in implementation in GKE, Kata Containers is not natively implemented by any Cloud Provider (GCP, Azure, AWS). This makes the tool more difficult to set up than gVisor, requiring configuration changes to nodes in managed clusters.

To sum up: in our use cases, we prefer to use gVisor for its simplicity of implementation. However, Kata Containers is more "insulating" than gVisor, has better compatibility with different syscalls, and is faster in executing its containers, albeit requiring more resources.

Hold

Istio

Istio is a complex tool that allows you to control the various communications between your applications precisely.

Istio is what's commonly known as a "Service Mesh." Using the same pattern used by Kubernetes enables you to control all communications within your cluster and all external communications.

Today, it's a must-have, and everyone knows it exists, but it's very complex to set up and operate, which puts many people off using it. We've already suffered many setbacks in our projects because of it, notably the transition to version 1.6, which left its mark on many users.

We have also observed that the main feature driving many of our customers' use of a Service Mesh is mTLS (mutual TLS), mainly to meet security requirements. This feature is supported by Istio and its simpler alternatives (e.g., Linkerd, Consul). We prefer to install these tools rather than Istio.

Another differentiating feature that has already led us to use Istio is the ability to control output flows, notably through the use of Egress Gateways. This feature can make all the difference in environments requiring a higher level of security.

Istio must remain a tool that requires considerable analysis before being adopted to not add significant maintenance costs to sometimes overloaded teams.

Secure