Americas

  • United States

Asia

Oceania

matthew_tyson
Software Architect

Intro to MongoDB’s queryable encryption

Analysis
Sep 01, 20227 mins
EncryptionSecurity

MongoDB 6.0 introduces a preview feature that pulls off the quasi-magical feat of allowing encrypted data to be used as the target of searches, without ever transmitting the keys to the database.

Encrypted blocks of multicolored data cubes rolling out.
Credit: Matejmo / Getty Images

Queryable encryption was the main attraction at MongoDB World 2022, for understandable reasons.  It introduces a unique capability to reduce the attack surface for confidential data in several use cases.  In particular, data remains encrypted at insert, storage, and query.  Both queries and their responses are encrypted over the wire and randomized for resistance to frequency analysis.

The outcome of this is that applications can support use cases that require searching against classified data while never exposing it as plaintext in the data store infrastructure.  Datastores that hold private information are a main target of hackers for obvious reasons.  MongoDB’s encrypted fields means that this information is cryptographically secure at all times in the database, but still usable for searching.  In fact, the database does not hold the keys for decrypting the data at all.  That means that even a complete breach of DB servers will not result in loss of private information. 

Several prominent and sophisticated attack vectors are eliminated.  For example:

  • Unethical or hacked DB admin account.
  • Accessing on-disk files.
  • Accessing in-memory data.

This is something like hashing passwords.  We hash passwords in the DB for the same reasons, so that it is impossible for a hacker or even the admin of the DB to view the password.  The big difference of course is that hashing passwords is a one-way affair.  You can verify if the password is correct, but that’s it.  There’s no querying such a field and no way to recover the plaintext.  Queryable encryption retains the ability to work with the field.

Another interesting characteristic of the system is that fields are encrypted in a randomized fashion, so the same value will output different ciphertext on different runs.  This means the system is resistant to frequency analysis attacks as well.  The system allows for a rigorous distinction between clients that have view privileges for the search results and those that don’t, by controlling which clients have access to the keys. 

For example, an application might store confidential information like a credit card number, alongside less sensitive information like username.  A non-privileged client could see the username but not the credit card in a strict way, by not provisioning the client with the cryptographic keys.  A client with access to the keys could see and use the credit card in searches, while keeping the card number encrypted at the steps of sending, searching, storing, and retrieving them.

Tradeoffs of queryable encryption

Of course all this comes at a cost.  Specifically, there is a cost to space and time requirements for queries involving encrypted fields.  (MongoDB guidance is around 2-3 times extra storage requirements for encrypted data, but that is expected to come down in the future). 

Querying the encrypted data is handled by MongoDB incorporating metadata in the encrypted collections themselves, as well as separate collections with further metadata.  These account for the increase in storage and time requirements when working with those data sets, along with the work of actual encryption and decryption.

Moreover, there is architectural complexity that must be supported in the form of a key management service (KMS) and the overhead of coding for employing it and the work of encryption and decryption itself. 

How queryable encryption works

At the highest level, it looks like Figure 1.

High-level architecture of queryable encryption Matthew Tyson

Figure 1. High-level architecture of queryable encryption

Figure 1 illustrates that the system adds an architectural component: the KMS.  The other change to the typical flow of events is that the data and queries are encrypted and decrypted via the MongoDB driver.  The KMS provides the keys for this process.

Automatic and manual encryption

There are two basic modes for queryable encryption: automatic and manual.  In automatic, the MongoDB driver itself handles encryption and decryption.  In manual, the application developer does more hands-on work using the keys from the KMS.

Key types: customer master keys (CMK) and data encryption keys (DEK)

In the queryable encryption system there are two types of keys in play: the customer master keys (CMK) and the data encryption key (DEK).  The DEK is the actual work key for encrypting the data.  The CMK is used to encrypt the DEK.  This provides extra security.  The client application itself can make use of the DEK (and the data encrypted with it) only by first decrypting it with the CMK.

Therefore, even if the DEK is exposed in its encrypted form, it is useless to an attacker without access to the CMK.  The architecture can be arranged such that the client application never holds the CMK itself, as described next with a key management service.  The bottom line is that the dual key arrangement is an extra layer of security for your private keys.

Data encryption keys (DEK) are stored in an extra key vault collection as described below.

Key vaults

Data is encrypted with symmetric secret keys.  Those keys belong to the app developer and are never sent to MongoDB.  They are stored in a key vault.  There are three basic scenarios for managing the keys, described below in ascending order of security.

  1. Local file key provider
  • Suitable only for development.
  • Keys are stored on local system alongside app
  • KMIP (Key Management Interoperability Protocol) provider.
    • Suitable for production, but less secure than using a KMS provider.
    • Customer master keys (CMK) are transmitted to client application
  • Full KMS (Key Management Service) provider. Suitable for production
    • Supported cloud KMS are: AWS, Azure and GCP
    • On-premises HSM (hardware security module) and KMS are supported
    • Only data encryption keys are transmitted to client application

    Local key provider for development

    At development-time, the application developer can generate keys (say, with OpenSSL) and store them locally.  Those keys are then used for encrypting and decrypting the information sent to and from the MongoDB instance.  This is for development only because it introduces a major vulnerability to the secret keys that mitigates much of the advantages to queryable encryption.

    KMIP provider

    There are a number of KMIP implementations (including open source) and commercial services.  In this scenario, the CMK is stored at the KMIP provider, and transmitted to the client app when the need for encrypting or decrypting the DEK for use arises.  If the key vault collection is breached, the data remains safe.  This arrangement is described in Figure 2.

    KMIP architecture outline Matthew Tyson

    Figure 2. KMIP architecture outline

    KMS provider

    By using a KMS provider (like AWS, Azure or GCP) the customer master key is never exposed to the network or client app.  Instead, the KMS provides the service of encrypting the DEK.  The DEK itself is sent to the KMS, encrypted, and returned as cipher text, where it is then stored in a special key vault collection in MongoDB. 

    The stored DEK can then be retrieved and decrypted with the KMS in a similar fashion, again preventing exposure of the CMK itself.  As in KMIP, if the key vault collection is breached, the data remains safe.

    You can see this layout in Figure 3.

    KMS Architecture outline Matthew Tyson

    Figure 3. KMS Architecture outline

    Conclusion

    Queryable encryption is a preview feature, and at the moment, only equality queries are supported.  More query types like ranges are on the roadmap. 

    Although it requires extra setup, queryable encryption delivers a critical feature for use cases requiring search against confidential data that cannot be achieved in any other way.  It is a compelling and distinctive capability.