One simple question has confounded countless developers working on Confidential Containers; how do we know we are connecting to the correct KBS? For context, KBS is short for Key Broker Service, which is the trusted entity that conditionally grants access to client secrets. The term relying party could be used to describe the KBS. Inside the guest, there is a Key Broker Client (KBC) built into the Attestation Agent (AA). The KBC talks to the KBS to get container decryption keys among other things.
The connection between the KBC and the KBS is secured with public key cryptography. The KBC generates a random keypair and sends the public key to the KBS when requesting confidential resources. Since the KBC has the lifespan of one VM, it makes sense for it to have an ephemeral keypair. The hash of the public key is included in the hardware evidence, which is also sent to the KBS. With this evidence, the KBS (with the help of an Attestation Service) can verify that the public key it receives from the KBC was generated inside a real TEE with a certain initial TCB. This is precisely what the KBS needs to validate before releasing client secrets to the KBC.
While it's no problem to send the public key of the KBC to the KBS along with the hardware evidence, it's not as clear how we should get the public key of the KBS to the KBC. Some KBCs sidestep this question by setting up the secure channel via RSA, but even if RSA does not require the public key of the KBS to be known to the KBC, it might still seem like we should verify that we are talking to the right party.
The KBS is a long running process with a fixed public key known to the guest owner. Since verifying the KBS public key would be part of the setup of the secure channel between, you can't use this secure channel to provision the key. The public key is not confidential so instead of secret injection, we could provide the key as part of the measured boot state of the guest. For SEV-SNP this could be done by putting the public key or a hash of the public key in the host data field. There is a corresponding field for TDX and we could even do this with SEV by injecting the key into the firmware binary. Unfortunately, there are two issues with these approaches.
First, Confidential Containers is designed to use a two-stage measurement system. The first stage is the measurement of the initial TCB by the confidential hardware. The second stage is the measurement/decryption of the workload container images. These stages are decoupled so that any container can run in the same confidential environment without adjustments to the initial TCB. The host data is a first-class entity in the SNP Attestation Report, so changes to it do not affect other parts of the measurement. Even so, continuously updating the host report is in tension with the goal of having a generic guest measurement and a relatively simple verification process. For platforms like SEV that do not have a host data field, measuring the KBS public key would complicate verifying the attestation evidence even more.
Even if the public key of the KBS were in the host data field, however, it's not clear that this would provide an additional security guarantee. The host data is validated by the KBS. Let's imagine that a malicious CSP tampers with the network to connect the KBC to a malicious KBS. The CSP could also change the host data field to point to the public key of that KBS. Since the KBS is malicious it can validate the hardware evidence any way it chooses and establish a secure channel with the KBC. The presence of the KBS public key in the host data field does absolutely nothing to guarantee that we are talking to the correct KBS. The KBS is essentially being asked to validate itself, something that a malicious KBS can do.
Fortunately, there is a much simpler way to know that we've connected to the correct KBS. Only the correct KBS will have our secrets. If we can run an encrypted container image, then we must have connected to the KBS with the keys to decrypt this image. More specifically, the execution of a confidential workload should be gated on the receipt of a secret. This could mean that the container image itself is encrypted and contains some identifying information, such as the credentials for a database. Since workloads can make requests to the Attestation Agent, we could also use a signed container image that requests secrets from the AA.
Unfortunately, not all resources that the KBS provides are confidential. The KBS can also be used to provision the policy and key information for validating image signatures. Image signature validation requires public keys. Since these aren't secret, these don't confirm the identity of the KBS. A malicious KBS could provide policies and keys to validate just about any image. As a result, workloads cannot rely on signature validation alone. This would break the edict mentioned above; workloads must be gated on the receipt of a secret. We can use signatures in combination with secrets, as many pods likely will.
The above is the response that I have typically given to people asking about how the KBS public key is provisioned. Recently, however, I have begun to look at the issue from a different perspective that I think is much more intuitive. It turns out that even the simplest questions posed here rely on some assumptions that lead to confusion. Let's start from the basics.
Imagine that you have just inherited the Klopman Diamond. You want to keep it in a safe place so you visit your local bank. You decide to rent out a safe deposit box. Before depositing your diamond, you inspect the box to make sure that it is sturdy and that it is empty. This is akin to an attestation. You are validating the isolation mechanism and initial TCB of the box. Once you are satisfied, you use the key to lock up the box with the diamond inside. There is only one key. This is your secure connection to the box.
While the bank probably asked you for some money (much like a CSP) at no point did the safety deposit box try to inspect you. The safety deposit box is an inanimate container. It does not care who it is being rented by. The more important thing is that the renter is satisfied with the guarantees of the box. This maps directly onto Confidential Containers and perhaps confidential computing more generally. In the previous section we were conceptualizing the flow from the perspective of the KBC. Instead, we should think about it from the perspective of the KBS, because the KBS operates on behalf of the client. In short, we don't need to validate the identity of the KBS because we are the KBS.
In Confidential Computing we usually talk about the hardware as the root of trust. The hardware is the root of trust of an enclave, but in Confidential Containers, it's really the KBS that is the root of trust of a workload. Through attestation this trust is extended to the enclave. It's tempting to think about this process the other way around, but this causes confusion. If we start from the KBS, most of the questions evaporate. Including the question that started everything off.
Let’s say that the CSP spins up a VM for us, but then manipulates the network such that the KBC connects to a KBC that does not belong to us. This is no different from a bank renting out a safety deposit box to someone that is not us. There might be some orchestration snafus if the resources are misallocated, but this is not a security issue. The boxes are generic. I will use any one that meets my standards. There is a lurking concern, however. Surely it would be an issue if a user becomes convinced that they are communicating with a workload that does not actually belong to them. It’s important that only one guest can have privileged communication with an enclave. In other words, it’s important that an enclave connects only with one KBS. Otherwise, a pod might end up with a mix of containers representing different entities, some of which could be hostile to each other. Fortunately, this situation is relatively easy to avoid. The Attestation Agent must connect with only one KBS, just like a safety deposit box should have only one key.
There is one final scenario to consider. Let’s say that after you leave the bank one of the workers rearranges the labels on all the safety deposit boxes. This would be annoying, but it wouldn’t compromise confidentiality. You are still the only one who can access your box. It might take a while to find it again, but you’ll know you’ve got the right one when you use your key to open it and you find all your stuff inside. If you didn’t have anything in your safety deposit box or if you only deposited some non-identifying pocket lint, then you might not be as confident that you found your box again. This highlights the importance of having a non-generic workload. In Confidential Containers enclaves are generic. The secrets provided by the KBS are the identity of a guest.
These aren’t new conclusions. In fact, these are the same conclusions we reached in the first half of this article. To me this viewpoint from the perspective of the KBS rather than the KBC is a lot easier to think about.
@fitzthum @mingweishih @anakrish @jepio @dubek
I was thinking that if we had the following mechanisms, we might be able to solve our difficulties more gracefully and avoid the attacks discussed above:
The Attestation Agent (AA) requests a "Token" resource from KBS/AS in the startup phase. After the Attestation Evidence is verified, AA will get the Token. Token contains the following contents:
After that, subsequent secret resource requests need to carry this token.
We can extend more content in this token, which may solve many problems in the future.