How to bring zero-trust security to microservices

Why we must use a zero-trust security model in microservices and how to implement it using the Kuma universal service mesh.

How to bring zero-trust security to microservices
Gerd Altmann (CC0)

Transitioning to microservices has many advantages for teams building large applications, particularly those that must accelerate the pace of innovation, deployments, and time to market. Microservices also provide technology teams the opportunity to secure their applications and services better than they did with monolithic code bases.

Zero-trust security provides these teams with a scalable way to make security fool-proof while managing a growing number of microservices and greater complexity. That’s right. Although it seems counterintuitive at first, microservices allow us to secure our applications and all of their services better than we ever did with monolithic code bases. Failure to seize that opportunity will result in non-secure, exploitable, and non-compliant architectures that are only going to become more difficult to secure in the future.

Let’s understand why we need zero-trust security in microservices. We will also review a real-world zero-trust security example by leveraging the Cloud Native Computing Foundation’s Kuma project, a universal service mesh built on top of the Envoy proxy.

Security before microservices

In a monolithic application, every resource that we create can be accessed indiscriminately from every other resource via function calls because they are all part of the same code base. Typically, resources are going to be encapsulated into objects (if we use OOP) that will expose initializers and functions that we can invoke to interact with them and change their state.

For example, if we are building a marketplace application (like Amazon.com), there will be resources that identify users and the items for sale, and that generate invoices when items are sold:

zero trust microservices 01 Kong

A simple marketplace monolithic application.

Typically, this means we will have objects that we can use to either create, delete, or update these resources via function calls that can be used from anywhere in the monolithic code base. While there are ways to reduce access to certain objects and functions (i.e., with public, private, and protected access-level modifiers and package-level visibility), usually these practices are not strictly enforced by teams, and our security should not depend on them.

zero trust microservices 02 Kong

A monolithic application is potentially easy to exploit, because resources can be accessed from anywhere in the code base.

Security with microservices

With microservices, instead of having every resource in the same code base, we will have those resources decoupled and assigned to individual services, with each service exposing an API that can be used by another service. Instead of executing a function call to access or change the state of a resource, we can execute a network request.

zero trust microservices 03 Kong

With microservices our resources can interact with each other via service requests over the network as opposed to function calls within the same monolithic code base. The APIs can be RPC-based, REST, or anything else really.

By default, this doesn’t change our situation: Without proper barriers in place, every service could theoretically consume the exposed APIs of another service to change the state of every resource. But because the communication medium has changed and it is now the network, we can use technologies and patterns that operate on the network connectivity itself to set up our barriers and determine the access levels that every service should have in the big picture.

Understanding zero-trust security

To implement security rules over the network connectivity among services, we need to set up permissions, and then check those permissions on every incoming request.

For example, we may want to allow the Invoices and Users services to consume each other (an invoice is always associated with a user, and a user can have many invoices), but only allow the Invoices service to consume the Items service (since an invoice is always associated to an item), like in the following scenario:

zero trust microservices 04 Kong

A graphical illustration of connectivity permissions between services. The arrows and their direction determine whether services can make requests (green) or not (red). For example, the Items service cannot consume any other service, but it can be consumed by the Invoices service.

After setting up permissions (we will explore shortly how a service mesh can be used to do this), we then need to check them. The component that will check our permissions will have to determine if the incoming requests are being sent by a service that has been allowed to consume the current service. We will implement a check somewhere along the execution path, something like this:

if (incoming_service == “items”) {
  deny();
} else {
  allow();
}

This check can be done by our services themselves or by anything else on the execution path of the requests, but ultimately it has to happen somewhere.

The biggest problem to solve before enforcing these permissions is having a reliable way to assign an identity to each service so that when we identify the services in our checks, they are who they claim to be.

Identity is essential. Without identity, there is no security. Whenever we travel and enter a new country, we show a passport that associates our persona with the document, and by doing so, we certify our identity. Likewise, our services also must present a “virtual passport” that validates their identities.

Since the concept of trust is exploitable, we must remove all forms of trust from our systems—and hence, we must implement “zero-trust” security. 

zero trust microservices 05 Kong

The identity of the caller is sent on every request via mTLS.

In order for zero-trust to be implemented, we must assign an identity to every service instance that will be used for every outgoing request. The identity will act as the virtual passport for that request, confirming that the originating service is indeed who it claims to be. Mutual Transport Layer Security (mTLS) can be adopted to provide both identities and encryption on the transport layer. Since every request now provides an identity that can be verified, we can then enforce the permissions checks.

The identity of a service is typically assigned as a SAN (Subject Alternative Name) of the originating TLS certificate associated with the request, as in the case of zero-trust security enabled by a Kuma service mesh, which we will explore shortly.

SAN is an extension to X.509 (a standard that is being used to create public key certificates) that allows us to assign a custom value to a certificate. In the case of zero-trust, the service name will be one of those values that is passed along with the certificate in a SAN field. When a request is being received by a service, we can then extract the SAN from the TLS certificate—and the service name from it, which is the identity of the service—and then implement the permission checks knowing that the originating service really is who it claims to be.

zero trust microservices 06 Kong

The SAN (Subject Alternative Name) is very commonly used in TLS certificates and can also be explored by our browser. In the picture above, we can see some of the SAN values belonging to the TLS certificate for Google.com.

Now that we have explored the importance of having identities for our services and we understand how we can leverage mTLS as the virtual passport that is included in every request our services make, we are still left with three important problems that we need to address:

  1. Assigning TLS certificates and identities on every instance of every service.
  2. Validating the identities and checking permissions on every request.
  3. Rotating certificates over time to improve security and prevent impersonation.

These are hard problems to solve because they effectively provide the backbone of our zero-trust security implementation. If not done correctly, our zero-trust security model will be flawed, and therefore insecure.

Moreover, the above tasks must be implemented for every instance of every service that our application teams are creating. In a typical organization, these service instances will include both containerized and VM-based workloads running across one or more cloud providers, perhaps even in our physical data center.

The biggest mistake any organization could make is asking its teams to build these features from scratch every time they create a new application. The resulting fragmentation in the security implementations will create unreliability in how the security model is implemented, making the entire system insecure.

Service mesh to the rescue

Service mesh is a pattern that implements modern service connectivity functionalities in such a way that does not require us to update our applications to take advantage of them. Service mesh is typically delivered by deploying data plane proxies next to every instance (or Kubernetes Pod) of our services and a control plane that is the source of truth for configuring those data plane proxies.

zero trust microservices 07 Kong

In a service mesh, all the outgoing and incoming requests are automatically intercepted by the data plane proxies (Envoy) that are deployed next to each instance of each service. The control plane (Kuma) is in charge of propagating the policies we want to set up (like zero-trust) to the proxies. The control plane is never on the execution path of the service-to-service requests; only the data plane proxies live on the execution path.

The service mesh pattern is based on the idea that our services should not be in charge of managing the inbound or outbound connectivity. Over time, services written in different technologies will inevitably end up having various implementations. Therefore, a fragmented way to manage that connectivity ultimately will result in unreliability. Plus, the application teams should focus on the application itself, not on managing connectivity, which ideally should be provisioned by the underlying infrastructure. For these reasons, service mesh not only gives us all sorts of service connectivity functionality out of the box, like zero-trust security, but also makes the application teams more efficient while giving the infrastructure architects complete control over the connectivity that is being generated within the organization.

Just as we didn’t ask our application teams to walk into a physical data center and manually connect the networking cables to a router/switch for L1-L3 connectivity, today we don’t want them to build their own network management software for L4-L7 connectivity. Instead, we want to use patterns like service mesh to provide that to them out of the box.

Zero-trust security via Kuma

Kuma is an open source service mesh (first created by Kong and then donated to the CNCF) that supports multi-cluster, multi-region, and multi-cloud deployments across both Kuberenetes and virtual machines (VMs). Kuma provides more than 10 policies that we can apply to service connectivity (like zero-trust, routing, fault injection, discovery, multi-mesh, etc.) and has been engineered to scale in large distributed enterprise deployments. Kuma natively supports the Envoy proxy as its data plane proxy technology. Ease of use has been a focus of the project since day one.

zero trust microservices 08 Kong

Kuma can run a distributed service mesh across clouds and clusters — including hybrid Kubernetes plus VMs — via its multi-zone deployment mode.

With Kuma, we can deploy a service mesh that can deliver zero-trust security across both containerized and VM workloads in a single or multiple cluster setup. To do so, we need to follow these steps:

1. Download and install Kuma at kuma.io/install.
2. Start our services and start `kuma-dp` next to them (in Kubernetes, `kuma-dp` is automatically injected). We can follow the getting started instructions on the installation page to do this for both Kubernetes and VMs.

Then, once our control plane is running and the data plane proxies are successfully connecting to it from each instance of our services, we can execute the final step:

3. Enable the mTLS and Traffic Permission policies on our service mesh via the Mesh and TrafficPermission Kuma resources.

In Kuma, we can create multiple isolated virtual meshes on top of the same deployment of service mesh, which is typically used to support multiple applications and teams on the same service mesh infrastructure. To enable zero-trust security, we first need to enable mTLS on the Mesh resource of choice by enabling the mtls property.

In Kuma, we can decide to let the system generate its own certificate authority (CA) for the Mesh or we can set our own root certificate and keys. The CA certificate and key will then be used to automatically provision a new TLS certificate for every data plane proxy with an identity, and it will also automatically rotate those certificates with a configurable interval of time. In Kong Mesh, we can also talk to a third-party PKI (like HashiCorp Vault) to provision a CA in Kuma.

1 2 Page 1
Page 1 of 2