Americas

  • United States

Asia

Oceania

john_mello jr
Contributor

Google adds Python to its differential privacy repertoire

News Analysis
Feb 03, 20223 mins
Data Privacy

Company hopes the move will make it easier for developers to use differential privacy to help improve privacy on the internet.

Close-up shot of an eye and eyewear with binary streams in the foreground.
Credit: Natali Mis / Getty Images

Google has announced it’s adding Python to the languages supported by one of its open-source projects designed to bolster privacy on the internet. The project includes a library and tools for using differential privacy, a technology designed to preserve an individual’s privacy in large data sets.

“Previously, our differential privacy library was available in three programming languages,” Miguel Guevara, a product manager in Google’s Privacy and Data Protection Office, wrote in the company’s developers blog. “Now, we’re making it available in Python, reaching nearly half of the developers worldwide. This means millions more developers, researchers and companies will be able to build applications with industry-leading privacy technology, enabling them to obtain insights and observe trends from their data sets while protecting and respecting the privacy of individuals.”

What differential privacy does

Christopher W. Clifton, a professor of computer science at Purdue University, explains that many people doing data science projects are moving to Python, so adding support of the language to Google’s framework will broaden the community of interest in differential privacy. “It’s targeting the people who are spending a lot of time looking at data and releasing information on data,” he says.

One problem with releasing information about large data sets, even when the data in those sets is anonymized, is that there are ways to figure out information about individuals in that data. “Even if I know you’re in a data set, differential privacy protects you from me knowing information about you through analysis of that data set,” Clifton explains. “Differential privacy ensures that from a statistical analysis of the data, you can’t figure out with absolute certainty anything about any one person in the data.”

Libraries make it easier for developers to offer summaries of data

For the average developer, though, using differential privacy can be difficult, notes Jason I. Hong, a computer science professor at Carnegie Mellon University’s CyLab Security and Privacy Institute. “These libraries make it much easier for developers to offer useful summaries of data while protecting the privacy of individuals,” he says.

“Using Google’s libraries still requires that you have a good understanding of differential privacy and how to use it,” Clifton cautioned. “It’s not like we have a sudden solution to all our privacy problems, but for people who understand how to work with differential privacy and have an understanding of it, it makes it easier for them to produce things that get it right.”

U.S. Census is using differential privacy

Hong notes that differential privacy is useful for analyzing large data sets. “The U.S. Census is using differential privacy techniques for 2020 census data to make it possible to offer summary statistics while making it hard for attackers to reverse-engineer those results and uncover data about specific individuals,” he adds.

In addition to adding Python support to its differential privacy framework, Google introduced a new tool to allow developers to more easily fine-tune their deployments. The tool makes it easier to adjust the balance, or epsilon, between privacy loss and noise introduced into a data set to prevent that loss.

“To get the right level of epsilon, you often need to run your pipeline many times. That’s time consuming,” Guevara says in an interview. “The tool we released allows you to simulate with one pass the difference in utility that you get from different values of Epsilon. So, you only have to run your pipeline once.”