Neuroengineers have crafted a breakthrough device that uses machine-learning neural networks to read brain activity and translate it into speech.
An article, published Tuesday in the journal Scientific Reports, details how the team at Columbia University's Zuckerman Mind Brain Behavior Institute used deep-learning algorithms and the same type of tech that powers devices like Apple's Siri and the Amazon Echo to turn thought into "accurate and intelligible reconstructed speech." The research was reported earlier this month but the journal article goes into far greater depth.
The human-computer framework could eventually provide patients who have lost the ability to speak an opportunity to use their thoughts to verbally communicate via a synthesized robotic voice.
"We've shown that, with the right technology, these people's thoughts could be decoded and understood by any listener," Nima Mesgarani, principal investigator on the project, said in a statement.
When we speak our brains light up, sending electrical signals zipping around the old thought box. If scientists can decode those signals and understand how they relate to forming or hearing words, then we get one step closer to translating them into speech. With enough understanding -- and ample processing power -- that could create a device that directly translates thinking into speaking.
And that's what the team has managed to do, creating a "vocoder" that uses algorithms and neural networks to turn signals into speech.
To do this, the research team asked five epilepsy patients who were already undergoing brain surgery to help out. They attached electrodes to different exposed surfaces of the brain, then had the patients listen to 40 seconds worth of spoken sentences, repeated randomly six times. Listening to the stories helped train the vocoder.
Next, the patients listened to speakers counting from zero to nine, while their brain signals were fed back into the vocoder. The vocoder algorithm, known as WORLD, then spat out its own sounds, which were cleaned up by a neural network, eventually resulting in robotic speech mimicking the counting. You can hear what that sounds like here. It's not perfect, but it's certainly understandable.
"We found that people could understand and repeat the sounds about 75 percent of the time, which is well above and beyond any previous attempts," Mesgarani said.
The researchers concluded that the accuracy of the reconstruction relies on how many electrodes were planted on the patient's brains and how long the vocoder was trained for. As expected, increasing the electrodes and increasing the length of training allows the vocoder to gather more data and results in a better reconstruction.
Looking forward, the team wants to test what kind of signals are emitted when a person just imagines speaking, as opposed to listening to speech. They also hope to test a more complex set of words and sentences. Improving the algorithms with more data could eventually lead to a brain implant that bypasses speech altogether, turning a person's thoughts into words.
That would be a monumental step forward for many.
"It would give anyone who has lost their ability to speak, whether through injury or disease, the renewed chance to connect to the world around them," Mesgarani said.
NASA turns 60: The space agency has taken humanity farther than anyone else, and it has plans to go further.
Taking It to Extremes: Mix insane situations -- erupting volcanoes, nuclear meltdowns, 30-foot waves -- with everyday tech. Here's what happens.