The Race to Hide Your Voice

Voice recognition—and data collection—have boomed in recent years. Researchers are figuring out how to protect your privacy.
An illustration with a collage of browsers and cyber security icons.
Illustration: Elena Lacey

Your voice reveals more about you than you realize. To the human ear, your voice can instantly give away your mood, for example—it’s easy to tell if you’re excited or upset. But machines can learn a lot more: inferring your age, gender, ethnicity, socio-economic status, health conditions, and beyond. Researchers have even been able to generate images of faces based on the information contained in individuals’ voice data.

As machines become better at understanding you through your voice, companies are cashing in. Voice recognition systems—from Siri and Alexa to those using your voice as your password—have proliferated in recent years as artificial intelligence and machine learning have unlocked the ability to understand not just what you are saying but who you are. Big Voice may be a $20 billion industry within a few years. And as the market grows, privacy-focused researchers are increasingly searching for ways to protect people from having their voice data used against them.

Vocal Threats

Both the words you say and how you say them can be used to identify you, says Emmanuel Vincent, a senior research scientist specializing in voice technologies at France’s National Institute for Research in Digital Science and Technology (Inria), but this is only the beginning. “You will also find other pieces of information about your emotions or your medical condition,” Vincent says.

“These additional pieces of information help build a more complete profile—then this would be used for all sorts of targeted advertisements,” Vincent says. As well as your voice data potentially feeding into the vast realm of data used to show you online ads, there’s also the risk that hackers could access the location where your voice data is stored and use it to impersonate you. A small number of these cloning incidents have already happened, proving the value your voice holds. Simple robocall scams have also recorded people saying “yes” to use the confirmation in payment scams.

Last year, TikTok changed its privacy policies and started collecting the voiceprints—a loose term for the data your voice contains—of people in the US alongside other biometric data, such as your faceprint. More broadly, call centers are using AI to analyze people’s “behavior and emotion” during phone calls and evaluate the “tone, pace, and pitch of every single word” to develop profiles of people and increase sales. “We’re almost in a situation where the systems to recognize who you are and link everything together exist, but the protection is not there—and it’s still quite far away from being readily usable,” says Henry Turner, who researched the security of voice systems at the University of Oxford.

Hidden Meaning

Your voice is produced through a complex process involving the lungs and your voice box, throat, nose, mouth, and sinuses. More than a hundred muscles are activated when you speak, says Rébecca Kleinberger, a voice researcher at the MIT Media Lab. “It's also very much the brain,” Kleinberger says. 

Researchers are experimenting with four ways to enhance privacy for your voice, says Natalia Tomashenko, a researcher at Avignon University, France, who has been studying voice and is the first author of a research paper on the results of a voice privacy engineering challenge. None of the methods are perfect, but they are being explored as possible ways to boost privacy in the infrastructure processing your voice data.

First is obfuscation, which tries to completely hide who the speaker is. Think of a Hollywood depiction of a hacker totally distorting their voice over a phone call as they explain a devilish plot or ransom (or hacktivist collective Anonymous’s promotional videos). Simple voice-changing hardware allows anyone to quickly change the sound of their voice. More advanced speech-to-text-to-speech systems can transcribe what you’re saying and then reverse the process and say it in a new voice.

Second, Tomashenko says, researchers are looking at distributed and federated learning—where your data doesn’t leave your device but machine learning models still learn to recognize speech by sharing their training with a bigger system. Another approach involves building encrypted infrastructure to protect people’s voices from snooping. However, most efforts are focused on voice anonymization.

Anonymization attempts to keep your voice sounding human while stripping out as much of the information that could be used to identify you as possible. Speech anonymization efforts currently involve two separate strands: anonymizing the content of what someone is saying by deleting or replacing any sensitive words in files before they are saved and anonymizing the voice itself. Most voice anonymization efforts at the moment involve passing someone’s voice through experimental software that will change some of the parameters in the voice signal to make it sound different. This can involve altering the pitch, replacing segments of speech with information from other voices, and synthesizing the final output.

Does anonymization technology work? Male and female voice clips that were anonymized as part of the Voice Privacy Challenge in 2020 definitely do sound different. They’re more robotic, sound slightly pained and could—to some listeners at least—be from a different person than the original voice clips. “I think it can already guarantee a much higher level of protection than doing nothing, which is the current status,” says Vincent, who has been able to reduce how easy it is to identify people in anonymization research. However, humans aren’t the only listeners. Rita Singh, an associate professor in Carnegie Mellon University’s Language Technologies Institute, says that total de-identification of the voice signal is not possible, as machines will always have the potential to make links between attributes and individuals, even connections that aren’t clear to humans. “Is the anonymization with respect to a human listener or is it with respect to a machine listener?” says Shri Narayanan, a professor of electrical and computer engineering at the University of Southern California.

“True anonymization is not possible without completely changing the voice,” Singh says. “When you completely change the voice, then it's not the same voice.” Despite this, it is still worth developing voice-privacy technology, Singh adds, as no privacy or security system is totally secure. Fingerprints and face identification systems on iPhones have been spoofed in the past, but overall, they’re still an effective method of protecting people’s privacy.

Bye, Alexa

Your voice is increasingly being used as a way to verify your identity. For example, a growing number of banks and other companies are analyzing your voiceprints, with your permission, to replace your password. There’s also the potential for voice analysis to detect illness before other signs are obvious. But the technology to clone or fake someone’s voice is advancing quickly.

If you have a few minutes of someone’s voice recorded, or in some instances a few seconds, it’s possible to recreate that voice using machine learning—The Simpsons’ voice actors could be replaced by deep fake voice clones, for instance. And commercial tools for recreating voices are readily available online. “There’s definitely more work in speaker identification and producing speech to text and text to speech than there is in protecting people from any of those technologies,” Turner says.

Many of the voice anonymization techniques being developed at the moment are still a long way from being used in the real world. When they are ready to be used it’s likely that companies will have to implement tools themselves, to protect their customers' privacy—there’s currently little individuals can do to protect their own voice. Avoiding calls with call centers or companies that use voice analysis, and not using voice assistants, could limit how much your voice is recorded and reduce possible attack opportunities.

But the biggest protections may come from legal cases and protections. Europe’s GDPR covers biometric data, including people’s voices, in its privacy protections. Guidelines say people should be told how their data is being used and provide consent if they’re being identified, and that some restrictions should be placed on personalization. Meanwhile, in the US, courts in Illinois— home to some of the strongest biometric laws in the country—are increasingly inspecting cases involving people’s voice data. McDonald’s, Amazon, and Google are all facing judicial scrutiny over how they use people’s voice data. The decisions in these cases could lay down new rules for the protection of people’s voices.