discovery_hashicorp_boundary

28 June 2022

Hashicorp's Boundary solution's first release was on the 14th of October 2020. It's been around for a year and a half and there aren't a lot of articles talking about its day-to-day usage. So let's be critical for a bit and see how it compares to other solutions.

 

Hashicorp Boundary is advertised as a solution for: "Accessing any system from anywhere based on user identity."

 

The common solutions for that particular use case are mostly Bastion and VPNs handled using OSS software (e.g. OpenVPN, Wireguard) or commercial solutions (e.g. Fortinet, ZScaler).

 

Can Hashicorp Boundary be a cloud-agnostic alternative to cloud-based solutions like  AWS SSM?

 

Does Boundary bring enough to the table to be considered as an alternative to those giants? Let's find out.

General Concept

If you already understand Boundary concepts you can skip this part, nothing new apart from my amazing drawings.

Hashicorp's Boundary is a 2-component system, namely, controllers and workers. To understand a bit their relationship here's a drawing explaining the interaction between a user and the 2 components.

interactions

Summing up this drawing, the controller handles ACL (Access Control List) and the worker is the one handling all the network magic happening behind the scene.

From that, we can infer the killer feature of Boundary, Centralized ACL handling.

Boundary shines by making the teams in charge of security able to define global access rules based on the user's identity. And by using groups instead of people, you could achieve an instantaneous onboarding process for your remote access solution.

A replacement for Bastions and VPNs?

Ok, so this new tool seems promising, but does it really compare in terms of features?

Bastion vs Boundary

Port-forward

The most common setup is to use a standard Virtual Machine to serve as a Bastion, enforcing a strict SSH setup and exposing it through a Public IP on any port you'd like. If the Bastion has access to the internal network on another interface, you'll be able to access it too.

Through standard SSH Tunneling, you can achieve port-forwarding, which is exactly what Boundary provides. For those familiar with kubectl port-forward command, I like to say that Boundary is the ultimate port-forwarder, bringing the port-forward capability to non-kubernetes workloads (e.g. GCP CloudSQL, AWS RDS etc.) with the same UX as the kubectl CLI.

 

Internal network routing

Something that Boundary cannot do, though, is routing a network range through a Bastion. For those familiar with the tool, or as it likes to be called the "poor man's VPN", sshuttle supercharges your bastion to make it act like a VPN, and it's a client tool only.

If the following quote (taken from sshuttle README), resonates with you, Boundary won't solve this:
You don't want to create an ssh port forward for every single host/port on the remote network.

Access Segregation

To my knowledge standard Bastion using sshd can't segregate network access based on the user's identity because having your SSH public key on a Bastion means you can port-forward or route any network or workload your Bastion instance has access to.

As I said earlier, this is where Boundary shines: it enables access segregation based on the user's identity. Therefore, a developer might be able to access its service's database but not some other teams’ database. It enables fine-grain control that would have required a lot of work and maintenance to achieve without Boundary.

VPN vs Boundary

Port-forward

VPN can't be used for port-forwarding.

Internal Network Routing

Well that's the VPN use case, so it works on a VPN but doesn't work on Boundary.

Access Segregation

VPN Access segregation can be complex to set up but can be achieved, though it might be hard to create and maintain organization rules.

 

Still reading ? Let's get technical

The first part was kind of a product review from an architect's point of view, but I know deep down what you crave are the technicalities. Let's explore its internal concepts and see from an operator's point of view if it's as promising as every other Hashicorp product.

Deployment

I like to start deploying a tool in a somewhat production manner before even starting to use it, to see if in its actual state it will add an operating cost just by having to deploy and update it.

When I said Boundary was only a 2-component system, I forgot one, Boundary requires a PostgreSQL Database to store all its entities.

I thought about the setup I'd like to achieve in some ongoing projects and came up with this architecture: 

architecture

Let's dive into how it works:

  • Controller
    • Exposes an API and a UI
    • Exposes an endpoint for worker <-> controller communication
  • Worker
    • Exposes an endpoint for a user to go through

Both controller and workers are set up using the same golang binary, and the configuration will then tell the binary what role it should do. This eases the process of deployment a lot, as you can use the same deployment model for both and just configure them differently! The only difference you'll end up with is the controller needs to expose 2 different ports.

I wanted to achieve this setup on Kubernetes because at Padok we've been dealing with Kubernetes extensively in a lot of different projects, and it has become our main use case nowadays.

For this particular example, I’ll deploy Boundary on GCP, It will also help in comparing Hashicorp Boundary and IAP-Based Bastion.

It seems that the support for deploying Boundary on Kubernetes wasn't as extensive as I anticipated, and I ended up writing my own Helm chart to deploy it to my "playground" cluster.

In the end, deploying Boundary was a breeze if you don't count on the Helm chart creation part, my process ended up being:

  • Create a GCP Cloud SQL Instance dedicated to Boundary using Terraform
  • Giving access to this Cloud SQL instance in K8S using external-secret
  • Use a pre-upgrade / pre-install Helm hook to launch, respectively:
    • boundary database migrate
    • boundary database init
  • Let ArgoCD deploy the helm chart

This process is pretty much "standard" across Padok, so this doesn't represent too much of a hassle to do. If I had to rate Deployment ease for Boundary, a 4.5/5 rating would be objective.

Configuration

Our stack could now be deployed easily, we now had to configure each component accordingly, to get the previously depicted architecture!

Being a Hashicorp product, if you've been using their services, you know what's coming next.

HCL

Aside from this joke, HCL (Hashicorp Configuration Language) is the common DSL in all Hashicorp products and is mostly known due to the fact Terraform is using it.

Controller

To get a functional controller, you have to set up and configure HCL blocks in a config file you'll pass to the boundary golang binary:

 controller {

	name = "{{ include "boundary-controller.name" . }}"
	description = "{{ .Values.config.description | default "A controller for Boundary"}}"
	public_cluster_addr = "X.X.X.X:80"
	database {
		url = "env://{{ .Values.config.db.standardEnvironmentVariable | default "BOUNDARY_PG_URL" }}"
		migration_url = "env://{{ .Values.config.db.migrationEnvironmentVariable | default "MIGRATION_PG_URL" }}"
		max_open_connections = {{ .Values.config.db.maxOpenConnections }}
	
	}

} 

This block will tell Boundary to start a controller. The only thing I'm not a fan of is the public_cluster_addr requirement. Since I'm in a Kubernetes cluster, I don't really want to hardcode the public IP used for the "cluster" endpoint.

Next, we will need to set up 2 listener blocks in order to expose the API/UI and the "controller cluster" endpoint:

 listener "tcp" {
	address = "0.0.0.0"
	purpose = "api"
	tls_disable = {{ .Values.config.api.tlsDisable }}
	cors_enabled = true
	cors_allowed_origins = ["*"]
}

  

listener "tcp" {
	address = "0.0.0.0"
	purpose = "cluster"
} 

Last but not least, Boundary uses cryptographic keys for different purposes:

  • Database encryption
  • Worker Authentication
  • Recovery

I chose to go with GCP KMS keys in this setup since I'm on GCP, so here's the config you'll end up with:

 kms "gcpckms" {
	purpose = "root"
	project = "{{ .Values.config.kms.root.gcp.project }}"
	region = "{{ .Values.config.kms.root.gcp.region | default "global" }}"
	key_ring = "{{ .Values.config.kms.root.gcp.keyRing }}"
	crypto_key = "{{ .Values.config.kms.root.gcp.key }}"
}

  

kms "gcpckms" {
	purpose = "worker-auth"
	project = "{{ .Values.config.kms.worker.gcp.project }}"
	region = "{{ .Values.config.kms.worker.gcp.region | default "global" }}"
	key_ring = "{{ .Values.config.kms.worker.gcp.keyRing }}"
	crypto_key = "{{ .Values.config.kms.worker.gcp.key }}"
}

  

kms "gcpckms" {
	purpose = "recovery"
	project = "{{ .Values.config.kms.recovery.gcp.project }}"
	region = "{{ .Values.config.kms.recovery.gcp.region | default "global" }}"
	key_ring = "{{ .Values.config.kms.recovery.gcp.keyRing }}"
	crypto_key = "{{ .Values.config.kms.recovery.gcp.key }}"
} 
Worker

The worker configuration is a bit lighter and follows the same pattern:

A worker block :
 worker {

	name = "{{ include "boundary-worker.name" . }}"
	description = "{{ .Values.config.description | default "A worker for Boundary"}}"
	controllers = ["{{ .Values.config.controller.ips }}"]
	public_addr = "{{ .Values.config.hostname }}:80"

} 

The same issue with the worker: we'll need to hardcode the controller IP! (I tried putting a hostname but this doesn't seem to work).

A listen block

 

 listener "tcp" {

	purpose = "proxy"
	tls_disable = {{ .Values.config.tlsDisable | default "true"}}
	address = "0.0.0.0"
	
} 

 

A kms block
 kms "gcpckms" {

	purpose = "worker-auth"
	project = "{{ .Values.config.kms.gcp.project }}"
	region = "{{ .Values.config.kms.gcp.region | default "global" }}"
	key_ring = "{{ .Values.config.kms.gcp.keyRing }}"
	crypto_key = "{{ .Values.config.kms.gcp.key }}"

} 

The key used here must be the same as the one set up for worker-auth in the controller!

Time for rating this bit of configuration. It's not perfect, but it's pretty straightforward to do, and I didn't have a difficult time setting this up, so let's say 4/5.

Operator Experience

Now our stack is configured as intended, I have at least 1 controller and 1 worker but how do I interact with Boundary to achieve this for example:

  • bob@padok.fr should be able to connect to Boundary database using Boundary
  • alice@padok.fr should be able to connect to her application database using Boundary

Alice and Bob are fictional employees of Padok, please don't send unsolicited commercial emails, they might get angry.

Here, we will have to dive into a lot of concepts to achieve this in a "production" fashion, Boundary has a lot of different Entity types I'll try to define some that we will use:

  • Scope: This entity could be compared to a Kubernetes namespace and comes in 3 types:
    • Global: it's the root of our scope tree every other scope will be contained in this one
    • Organization: Type of scope that is just below the Global scope.
    • Project: Type of scope contained in an organization's scope
  • User: Identity associated with an Account
  • Account: Credentials are used to connect multiple accounts can be linked to the same user
  • Groups/Managed Groups: Boundary can define local and external sourced groups to emulate your organization
  • Hosts/Host Catalog/Host Sets: All those entities permit the creation of a catalog by grouping workloads with each other and will be useful to list all the things your users will need access to.
  • Targets: Association of a host, a port, and a connection limit
  • Roles: Entity used to represent ACL, roles can be given to a group or a user

I won't go too much into the details, but here's what I ended up with for my little "production" MVP:

production _MVP

Each dotted line separates the different scopes, so each object defined between those lines is in the defined scope.

To give a bit more context, the Playground project scope is a representation of a GCP Project in Boundary. I designed it like that, but you could also have a project representing multiple GCP projects or even decide to split it by environment (e.g. staging-playground, production-playground. For that particular use case, maybe the use of host sets could be a good setup too)

Achieving this example was easy, but the example is easy in itself, so I tried to come up with a way to define a multi-catalog/multi-hosts setup and have it easy to set ACL for multiple groups at a different level of the hierarchy (e.g. I want that group to have access to a whole catalog).

Well, to say the least, this was a hell of a ride. ACL are hard and lack a bit of flexibility which renders this work a bit painful, but with some help from Terraform and the creation of a dedicated module for a Boundary organization, I completed it in a somewhat satisfying manner. The Terraform module I created is not ready to be publicly available on Padok's Github, but I will show the inputs interface I ended up with: 

   name = "Padok"
  description = "Padok Infrastructure"
  controller_hostname = "https://boundary.playground.padok.cloud"
  hosts = {
    playground = {
      databases = {
        boundary = {
          addresses = ["10.110.32.3"]
          type      = "tcp"
          default_port = "5432"
        }
        payment-service = {
          addresses = ["10.110.32.4"]
          type      = "tcp"
          default_port = "5432"
        }
      }
    }
  }
  auth_methods = {
    google = {
      description = "Padok Google provider"
      oidc = {
        client_id = "XXX.apps.googleusercontent.com"
        client_secret = "XXX"
        issuer    = "https://accounts.google.com"
        primary   = true
        claims_scopes = ["email", "profile"]
      }
    }
  }
  managed_groups = {
    google = {
      bob = {
        filter = "\"/token/email\" == \"bob@padok.fr\""
        role = "playground/databases/boundary"
      }
      alice = {
        filter = "\"/token/email\" == \"alice@padok.fr\""
        role = "playground/databases/payment-service"
      }
    }
  } 

The hosts input is used to create the whole hierarchy of hosts and targets including the host sets:

   hosts = {
    playground = { // <= This level represents a Host Catalog
      databases = { // <= This level represents a Host Set
        boundary = { // <= This level represents a Host
          addresses = ["10.110.32.3"] // <= Those inputs represent a Target
          type      = "tcp"
          default_port = "5432"
        }
      }
    }
  } 

I also had to simplify ACL Management since this was a pain, I included that logic into the managed_groups input. As you can see, the role doesn't represent anything Boundary related. The idea was to be able to grant access at any level (Catalog/Host Set/Host) and the module will create the ACL on Boundary that will make this work:

  • role = "playground" gives you access to all targets linked to Hosts inside the playground Host Catalog
  • role = "playground/databases" gives you access to all targets linked to Hosts inside the databases Host Set. It would give you access to both databases (boundary and payment-service)
  • role = "playground/databases/boundary" gives you access to the single target linked to boundary Host

As usual, let's rate the Operator Experience. I think from what you've just read, you know it won't get a perfect rating. Well yes, I think 2.5/5 would be good, but I can't get any lower. The model works and is highly customizable but the overhead you'll endure when trying to grasp those ACL concepts really drives the rating down.

User Experience

Boundary was hard to set up, but now we need to think about Alice and Bob and how they will be using it.

Both users will want to access one database: so, which commands will they need to launch in order to achieve that? It depends on whether or not I'm a good operator and have already written documentation about it. 

But let's assume I'm not, so we can also rate the "discoverability" of Boundary.

First off let's launch boundary and see the subcommands :

Commands:
    accounts                  Manage Boundary accounts
    auth-methods              Manage Boundary auth methods
    auth-tokens               Manage Boundary auth tokens
    authenticate              Authenticate the Boundary command-line client
    config                    Manage resources related to Boundary's local configuration
    connect                   Connect to a target through a Boundary worker
    credential-libraries      Manage Boundary credential libraries
    credential-stores         Manage Boundary credential stores
    database                  Manage Boundary's database
    dev                       Start a Boundary dev environment
    groups                    Manage Boundary groups
    host-catalogs             Manage Boundary host catalogs
    host-sets                 Manage Boundary host sets
    hosts                     Manage Boundary hosts
    logout                    Delete the current token within Boundary and forget it locally
    managed-groups            Manage Boundary managed groups
    roles                     Manage Boundary roles
    scopes                    Manage Boundary scopes
    server                    Start a Boundary server
    sessions                  Manage Boundary sessions
    targets                   Manage Boundary targets
    users                     Manage Boundary users

Looking at this it seems Bob or Alice will only need to launch authenticate and connect.

Bob and Alice remembered I told them they could use Google OIDC to connect to Boundary so they tried to launch boundary authenticate oidc but the CLI kept asking for an auth-method-id parameter, they had no idea how to retrieve this parameter though, and ended up asking me directly the value, leading to context-switching and making me late on the release of a CI-CD optimization.

(That was my fault, you don't give a tool to anyone without making documentation)

Let's see how they could have done it without calling me :

boundary scopes list 
// This command will list all scopes available in the Global scope
// From that list they have to infer on which scope the auth-method-id is
boundary auth-methods list -scope-id=o_PAKsJSZkw7 
// This command will list all auth-methods available in a scope
// There was only 1 auth-method so this was ok
boundary authenticate oidc -auth-method=amoidc_1LNVr8iYqG
// Now their token is in their preferred keyring, and they can interact with the connect command
// Since the targets are not contained inside the organization scope they now have to find the project scope
boundary scopes list -scope-id=o_PAKsJSZkw7 
// There's only one
// They now need to find the correct targets to connect to since connect is expecting a target name/id and a scope 
boundary targets list -scope-id=p_qLRnP15V4r
// They found their database ! 
boundary connect -target-id=ttcp_jghB6rj2Je

I think we can all agree this is not ideal in terms of "discoverability" and it can be quite a pain on first use (I know I should have made documentation).

Let's get to the rating part! Overall I find Boundary to have a real lack of UX, something I'd like to see (maybe it's not a good idea) is a way to store the current scope you're on like `kubectl` does with namespaces, so you don't have to add it to every command you type. I think Boundary deserves a 2.5/5 rating for its user experience, I've seen worse and better in the past.

Let's wrap it up

Overall Boundary is a really good tool that enables fine-grain control around who can access internal resources, it has some features that I didn't talk about which are really impressive (e.g. the ability to correlate targets and workers based on tags is really impressive and enhances the user experience). But its user/operator experience really drives its rating down, and I wouldn't be surprised if some of the next milestones would be around those 2 topics.

However, for a tool that's been around for less than 2 years, achieving this level of quality is insane!

Topic | Rating

 

Deployment                   |         4.5

Configuration                |         4

Operator Experience    |         2.5 (1)

User Experience            |        2.5 (1)

Final                                |        3.5