MongoDB Offers Field Level Encryption

MongoDB now has the ability to encrypt data by field:

MongoDB calls the new feature Field Level Encryption. It works kind of like end-to-end encrypted messaging, which scrambles data as it moves across the internet, revealing it only to the sender and the recipient. In such a “client-side” encryption scheme, databases utilizing Field Level Encryption will not only require a system login, but will additionally require specific keys to process and decrypt specific chunks of data locally on a user’s device as needed. That means MongoDB itself and cloud providers won’t be able to access customer data, and a database’s administrators or remote managers don’t need to have access to everything either.

For regular users, not much will be visibly different. If their credentials are stolen and they aren’t using multifactor authentication, an attacker will still be able to access everything the victim could. But the new feature is meant to eliminate single points of failure. With Field Level Encryption in place, a hacker who steals an administrative username and password, or finds a software vulnerability that gives them system access, still won’t be able to use these holes to access readable data.

Tags: authentication, cryptography, encryption, hacking, keys

Posted on June 26, 2019 at 1:03 PM • 22 Comments

Comments

Alejandro • June 26, 2019 1:40 PM

Sounds wonderful.

What’s the catch?

Steven Clark • June 26, 2019 1:45 PM

I’m not getting the part where the database should have any say in this. Shouldn’t every database be capable of field level encryption if you choose the right raw field types and encrypt at the application?

Jeremy • June 26, 2019 2:32 PM

@Steven Clark – Agreed.

@Alejandro – I don’t have technical details for this particular implementation, but the typical drawbacks to having the client end-to-end encrypt data that’s going to be stored on a server are:

The server cannot search, filter, index, or otherwise process the data in any way.
The server cannot compress or deduplicate the data, and thus needs more storage space (and possibly more bandwidth). Clients can potentially compress before encrypting, but cannot deduplicate data across multiple accounts.
The client can only read the data from a device where their key is stored.
If the client loses their key, the data is unrecoverable.

Gweihir • June 26, 2019 2:41 PM

On the other hand, an Attacker that has time and collects data that is being accessed still gets pretty much everything of value.

Gweihir • June 26, 2019 2:44 PM

Ah, correction, the attacker will get everything that was used to search and select this data. That still has to be done or the whole benefits of using a DB goes out the window. Sounds excessively slow as it is though.

Sed Contra • June 26, 2019 3:00 PM

So we will never have to hear it said that “MongoDB pwned in game of life”.

https://youtube.com/watch?v=SKRma7PDW10

David Leppik • June 26, 2019 5:08 PM

Looking at the release notes, they give an example of a record containing personal information. The person’s name is public, but Social Security Number, Phone, and Email are each encrypted.

It’s somewhat more convenient and obvious than storing encrypted fields in a database where the database doesn’t know anything about the encryption. They’ve taken a simple programming task and turned it into a DB administration task. I suspect they wouldn’t have bothered, except that as a primarily cloud-hosted database they can offer it as a service.

Doug • June 26, 2019 5:52 PM

Available in Lotus/IBM/HCL Notes since 1993. Glad the rest of the world is catching up!

This is a brief description that is for version 8.5 but applies back to the original rev.

https://www.ibm.com/support/knowledgecenter/en/SSVRGU_8.5.3/com.ibm.designer.domino.main.doc/H_DOCUMENT_AND_FIELD_ENCRYPTION_OVERVIEW.html

Mike D. • June 26, 2019 7:50 PM

@Jeremy Couldn’t the database deduplicate on a full-field basis, in the specific case where the same user using the same key stores the same object using, say, different devices? It would seem the ciphertexts would match in this case. The server could use hashes to detect candidate duplicates and full compares to be sure.

Evan • June 26, 2019 8:37 PM

@Mike D.

I haven’t looked at this implementation, but it would be reasonable to use a salt value (even the row key would do) to prevent two fields with the same data hashing to the same value. The fact that two encrypted rows are the same is actually pretty useful for correlation (and possibly frequency analysis).

IIRC, MongoDB is mostly key-value rather than rich indexes like and aggregation like SQL. This means that you don’t lose much expressiveness if you encrypt the value part.

Ismar • June 26, 2019 11:04 PM

“One reason that no one did this before was because they didn’t perceive customer demand the way that it’s easy to perceive today,” says Davi Ottenheimer, MongoDB’s vice president of trust and digital ethics. All those high-profile database breaches have finally started to make companies aware of what solid encryption is worth.”

Encouraging sign

Clive Robinson • June 27, 2019 3:15 AM

@ Alejandro,

Sounds wonderful. What’s the catch?

The article is far from clear on what the mechanics of this scheme are, so any answer will have to be made on assumptions.

It also helps if you have Dorothy Dennings book on database security via encryption.

As a basic proviso a database consists of a large number of records which are made up of fields usually laid out in a consistant form. That is one field will be some kind of index called the “Primary Key” or a combination of two or more fields that will be unique to each record.

Where as a primary key can be just an integer sequential serial number for each record secondary keys can be like house addresses where a house number street name and local region identifier can be strings of alpha chars.

An integer primary key field generaly contains no usefull information unless you know how the records were ordered. Secondary key fields almost by definition carry usefull information. Therefore a primary key field does not of necessity need to be encrypted where as the secondary key fields do.

Fields that are not encrypted can be searched for by the database engine on the cloud host, whilst encrypted fields can only be searched where they have been decrypted back to plaintext.

As the decryption can be done at eirher the DB Server or Client application.

One way a search can be done securely at a server is with the equivalent of an HSM. The HSM contains the field keys thus you could arange for a HSM to take a value, range of values or search string to check, and the appropriate field identifier as the base input the DB then sends each record to the HSM and gets a boolian “Yes/No” response for that record from the HSM. As you can appreciate having to check tens of thousands of records is going to be a slow process… However even slower would be to do the same thing on a distant client where you have all the network delays etc as well.

The great hope to break this deadlock issue was a new type of encryption that would alow you to do mathmatics on encrypted data without having to decrypt first.

In some respects we can already do so. For instance a stream cipher using an addative encryption rather than an XOR alows with Chinese Clock Mathmatics numbers to be added or subtracted. But you have know idea without decrypting if you have got to zero, so you can not do comparison operations which is the fundemental operation of searching…

No One / Ex Cathedra • June 27, 2019 4:47 AM

But the companies involved must still have access to the information in the databases–or did I miss something? How will the companies involved process the plaintext data if they cannot access it?

I also like that the end-users will be able to keep an eye on the integrity of their data.

Folks who are crypto savvy might end up sleeping better at night.

GregW • June 27, 2019 6:13 AM

@Jeremy,
Good list of reasons field level encryption makes sense.

I’d call out two more as someone frustrated that my current database doesn’t have field level encryption.

First, field level encryption allows data in your database that you don’t want the DBA and/or DBA/app support personnel to be able to view when carrying out their daily duties. Payroll-related, sensitive PII data, credit cards etc. (Now it’s really better not to store that data at all if you can possibly figure out an alternative (data is a liability!), especially credit cards which can be stored by your payment processor, but that won’t work in every situation.)

The only people with a need to know the plaintext are the service account ingesting the data and the service account or individual authorized to read the data. And the database application itself. Perhaps the DBA has access to a recovery key of some kind whose use/retrieval can be governed by a security policy, stored in a two-key safe, etc.

Second, if your database is not for an application, but instead used for analytics and reporting (data warehouse/business intelligence/data lake/etc), the many off the shelf reporting and visualization clients have no client side encryption as there is no standard SQL-level support for specifying public/private keys.

Once your database has to support even a single use case with sensitive data ( say mashing up payroll-related sales commissions with a billion rows of transactional sales data), you are pushed into a trio of choices:

1) Don’t use or allow the business to use traditional reporting tools. Build custom apps for ingesting and viewing the data and do encryption/decryption there.
2) Bring the sensitive data into a completely separate database from your main datastore, one.which is much more tightly segregated/controlled. Create jobs to move/copy all the transactional data needed for mashup into this database. This may be expensive if theres a lot of data. This may either require the data to arrive late or require additional care to ensure it doesnt get out of sync with your primary datastore depending on how you implement the data movement.
3) Bring the sensitive data into it’s own database, leave the transactional data elsewhere and do the mashup/JOIN operations in the reporting tool’s memory… which tends to be slower than an in-database join as provided in options 1 or 2.

Field level encryption would allow some mitigation of these risks/costs.

It’s true you are increasing the surface area of code (ie a huge database app) with access to your particularly sensitive data but the ability to reduce insider access to that data would seem to outweigh that.

I can understand why field level encryption might be best done at the application level whenever that’s possible, but I don’t really understand why field level encryption wouldn’t be a useful security feature particularly in columnar (analytics-oriented) databases. Am I wrong?

Me • June 27, 2019 12:27 PM

So, does this mean that the encrypted fields cannot be used to search?

Or does this mean that searches what include filtering on encrypted fields will pull everything local, decrypt, and then filter, which would be very slow and wasteful?

We already saw the article about the practical attack on the theoretical encrypted database that could recreate the data based solely on statistics and knowing which rows were returned based on queries that couldn’t be read (except for the field names). So I assume this isn’t that.

fragile • June 27, 2019 2:06 PM

Client calls helpdesk : “my query doesn’t work, i get strange results”
Helpdesk : “sorry, can’t help you, we cant’s see a thing”

Clive Robinson • June 27, 2019 4:42 PM

@ Me,

So, does this mean that the encrypted fields cannot be used to search?

With current encryption unless you have the key or known encrypted value for zero then no.

As I mentioned above there are various things you can do with stream encrypted data provided you know the size (2^N) of the field used for integers in a record field[1]. Likewise there are other things you can do with other forms of encryption. However the problem with searching is “finding zero” or “within a range” when comparing and this is a whole different balk game.

So for now atleast to search you have to do it at the server end or at the client end. The cliebt end as you note “which would be very slow and wasteful”.

But worse it leaves you open to all sorts of risk that most would not at first realise. An obvious short cut would be for the client to encrypt the desired search value and send that in the queary to the DataBase Engine. Whilst it works and plaintext is not revealed it alows for easy statistical analysis which is bad, at the best of times. There are techniques that use different keys on the same record field and a secondary mac field. If the Database Engine has a match on the encrypted field value the client has sent, the client if lacking the proper key will on decrypting the record mac field will not get a correct result (within certain limits). Such tricks do make improvments in reducing the records downloaded but they only make the statistical problem slightly more difficult.

All in all searching encrypted database record fields withour first decrypring is something that has engaged some of the finest minds for something like seven decades. What we do know is that nobody has proved you can not do it so people keep thinking on the problem and in times past I wasted ebough time to write a dissertation on the subject, then I wised up and realised it needs something that several decades later is still sitting like a miraja on the distant horizon.

But techbology has moved on and now it’s nolonger a theoretical activity. Those who have invested large amounts in Cloud Infrastructure are starting to see reluctance increasing as more and more stories about Silicon Valley Companies having “special relationships” with various US and other Five Eye intelligence agencies and even law enforcment agencies. Their “Trust us we’re the good guys” image is not so much tarnished as compleatly shattered. And it’s not just Databases in the traditional sense it’s “Software as a Service” as well. There are some quite creepy rumours surfacing about those in education where not just schools but colleges and universities have handed over students PII to Google and Microsoft both of whom are using it for all sorts of purposes that would horrify the students parents, if not the students…
However drawing the attention of the academic institutions hierarchy to what is happening is rebuffed, poo-pooed or in less polite ways denigrated, and refusal to use these systems considered if not directly threatened as termination of the students education…

[1] One of the joys of a language with a limited number of words is that sometimes words get reused in different domains of endevor thus can even when used in context be confusing. Hence field in a record has a very different meaning to the mathmatical use of field.

RealFakeNews • June 28, 2019 9:32 PM

Wow.

I’ve been doing this for decades already. Why does the DB engine need to get involved?

RealFakeNews • June 28, 2019 9:36 PM

I’ve not needed to do it, but if you insist on certain encrypted data being searchable, store only specific fields as a hash value in addition to the encrypted value.

You double the database size, but you wanted to search it, right?

Clive Robinson • June 29, 2019 12:43 PM

@ RealFakeNews,

if you insist on certain encrypted data being searchable, store only specific fields as a hash value in addition to the encrypted value.

That opens up a whole bunch of statistical attacks.

Hashing a value smaller than the hash size is in effect a simple “substitution cipher” with large alphabet. It also only works for “exact matches” not “range matches”.

Further a problem with hashes is “Dictionary Attacks” especially with precomputed rainbow tables.

Davi Ottenheimer • July 2, 2019 11:16 AM

We did it! Looking forward in the coming months to discussing why I drove this feature into the database and how it ultimately was designed.

Anonymous • October 7, 2020 12:54 AM

Indeed a great feature
Lacking is the ability to search over the encrypted data.
More over and maybe the biggest pitfall is that it is very tightly coupled with MongoDb.

Organizations from a certain size have multiple data sources, all containing PI data.
They are trying to avoid as much as possible cutom solutions and looking for a complete holistic solutions which will work with all their databases on all clouds and on-prem environements.

The only one which does that (and with the search capability) is http://www.kindite.com

Schneier on Security

MongoDB Offers Field Level Encryption

Comments

Leave a comment Cancel reply