Noise-canceling headphones utilize AI technology to isolate and amplify a single voice.
Modern life is noisy, but noise-canceling headphones can help reduce unwanted sounds. However, they typically muffle all noises indiscriminately, meaning you might miss something you actually want to hear.
A new prototype AI system aims to address this issue. Known as Target Speech Hearing, this system allows users to select a specific person whose voice will remain audible while all other sounds are canceled out.
Currently a proof of concept, the technology's creators are in discussions to integrate it into popular brands of noise-canceling earbuds and hearing aids.
“Listening to specific people is such a fundamental aspect of how we communicate and interact in the world with other humans,” says Shyam Gollakota, a professor at the University of Washington involved in the project. “But it can get really challenging, even if you don’t have any hearing loss issues, to focus on specific people in noisy situations.”
Previously, the researchers trained a neural network to recognize and filter out certain sounds, like babies crying, birds tweeting, or alarms ringing. However, isolating human voices presents a tougher challenge, requiring more complex neural networks. These networks must operate in real time on headphones with limited computing power and battery life. To meet these constraints, the team used an AI compression technique called knowledge distillation. This involves training a large AI model (the “teacher”) on millions of voices and then having it train a smaller model (the “student”) to perform similarly.
The student model was then taught to extract specific vocal patterns from the surrounding noise captured by microphones on commercially available noise-canceling headphones.
To activate the Target Speech Hearing system, the user holds a button on the headphones for several seconds while facing the person they want to focus on. During this “enrollment” process, the system records an audio sample from both headphones and uses it to extract the speaker’s vocal characteristics, even amidst other speakers and noises.
These characteristics are fed into a second neural network running on a microcontroller connected to the headphones via USB. This network continuously prioritizes the chosen voice over others, even if the wearer turns away. The more data the system gathers on a speaker’s voice, the better it becomes at isolating it.
Currently, the system can successfully enroll a targeted speaker whose voice is the only loud one present. However, the team aims to improve it to work even when the loudest voice isn't the target speaker.
Singling out a voice in a noisy environment is very challenging, says Sefik Emre Eskimez, a senior researcher at Microsoft specializing in speech and AI. “I know that companies want to do this,” he says. “If they can achieve it, it opens up lots of applications, particularly in a meeting scenario.”
While much speech separation research is theoretical, this work has clear real-world applications, notes Samuele Cornell, a researcher at Carnegie Mellon University’s Language Technologies Institute. “I think it’s a step in the right direction,” Cornell says. “It’s a breath of fresh air.”
No comments: