Overview: A newly developed machine learning model can predict the words a person will speak based on their neural activity recorded by a minimally invasive neuroprosthetic device.
Researchers from HSE University and Moscow State University of Medicine and Dentistry have developed a machine learning model that can predict what word a subject will utter based on their neural activity recorded with a small set of minimally invasive electrodes.
The article ‘Speech decoding from a small set of spatially segregated minimally invasive intracranial EEG electrodes with a compact and interpretable neural network’ is published in the Journal of Neural Engineering. The research was funded by a grant from the Russian government as part of the National Science and Universities Project.
Millions of people around the world are affected by speech disorders that limit their ability to communicate. Causes of speech loss can vary and include stroke and certain congenital conditions.
Technology is now available to restore communication function to such patients, including “silent speech” interfaces that recognize speech by tracking the movement of the muscles of articulation as the person pronounces words without making a sound. However, such devices do not help some patients, such as those with facial muscle paralysis.
Speech neuroprostheses – brain-computer interfaces that can decode speech based on brain activity – could provide an accessible and reliable solution to restore communication with such patients.
Unlike personal computers, devices with a brain-computer interface (BCI) are controlled directly by the brain without the need for a keyboard or microphone.
A major barrier to wider use of BCIs in speech prostheses is that this technology requires highly invasive surgery to implant electrodes into brain tissue.
The most accurate speech recognition is achieved by neuroprostheses with electrodes covering a large part of the cortical surface. However, these solutions for reading brain activity are not intended for long-term use and pose significant risks to the patients.
Researchers from the HSE Center for Bioelectrical Interfaces and Moscow State University of Medicine and Dentistry have studied the possibility of creating a functioning neuroprosthesis that can decode speech with acceptable accuracy by reading brain activity from a small set of electrodes placed in have been implanted in a limited cortical area.
The authors suggest that this minimally invasive procedure could even be performed under local anesthesia in the future. In the current study, the researchers collected data from two patients with epilepsy who already had intracranial electrodes implanted for presurgical mapping to locate seizure zones.
The first patient was implanted bilaterally with a total of five sEEG shafts with six contacts each, and the second patient was implanted with nine electrocorticographic (ECoG) strips with eight contacts each.
Unlike ECoG, electrodes for sEEG can be implanted without a complete craniotomy through a burr hole in the skull. In this study, only the six contacts of a single sEEG shaft in one patient and the eight contacts of one ECoG strip in the other were used to decode neural activity.
The subjects were asked to read aloud six sentences, each presented 30 to 60 times in a random order. The sentences varied in structure and most words within a single sentence began with the same letter. The sentences contain a total of 26 different words. While the subjects were reading, the electrodes recorded their brain activity.
This data was then matched with the audio signals to form 27 classes, including 26 words and one class of silence. The resulting training dataset (with signals recorded in the first 40 minutes of the experiment) was fed into a machine learning model with a neural network-based architecture.
The learning task for the neural network was to predict the next spoken word (class) based on the neural activity data prior to its utterance.
When designing the architecture of the neural network, the researchers wanted to make it simple, compact, and easy to interpret. They devised a two-stage architecture that first extracted internal speech representations from the recorded brain activity data, produced log-mel spectral coefficients, and then predicted a specific class, ie a word or silence.
Thus trained, the neural network achieved 55% accuracy using only six channels of data recorded by a single sEEG electrode in the first patient and 70% accuracy using only eight channels of data recorded by a single ECoG strip in the second patient. Such accuracy is comparable to that demonstrated in other studies using devices that required electrodes to be implanted over the entire cortical surface.
The resulting interpretable model makes it possible to explain in neurophysiological terms which neural information contributes most to predicting a word about to be spoken.
The researchers examined signals coming from different neuronal populations to determine which of them were critical to the downstream task.
Their findings were consistent with the speech mapping results, suggesting that the model uses neural signals that are crucial and can therefore be used to decode imaginary speech.
Another advantage of this solution is that it does not require manual feature engineering. The model has learned to extract speech representations directly from the brain activity data.
The interpretability of the results also indicates that the network decodes signals from the brain rather than any additional activity, such as electrical signals from the articulatory muscles or those resulting from a microphone effect.
The researchers emphasize that the prediction was always based on the neural activity data prior to the utterance. This, they argue, ensures that the decision rule does not exploit the auditory cortex’s response to already spoken speech.
“The use of such interfaces entails minimal risks for the patient. If all goes well, it could be possible to decode imaginary speech from neural activity recorded by a small number of minimally invasive electrodes implanted under local anesthesia in an outpatient setting,” – Alexey Ossadtchi, study lead author, director from the Center for Bioelectrical Interfaces at the HSE Institute for Cognitive Neuroscience.
About this neurotech research news
Writer: Ksenia Bregadze
Contact: Ksenia Bregadze – HSE
Image: The image is in the public domain
Original research: Closed access.
“Speech decoding of a small set of spatially separated minimally invasive intracranial EEG electrodes with a compact and interpretable neural network” by Alexey Ossadtchi et al. Journal of Neural Engineering
Speech decoding of a small set of spatially separated minimally invasive intracranial EEG electrodes with a compact and interpretable neural network
Objectively. Speech decoding, one of the most intriguing brain-computer interface applications, opens up numerous possibilities from patient rehabilitation to direct and seamless communication between human species. Typical solutions are based on invasive recordings with a large number of distributed electrodes implanted via craniotomy. Here we investigated the possibility of creating a speech prosthesis in a minimally invasive setting with a small number of spatially separated intracranial electrodes.
Approach. We collected one hour of data (from two sessions) in two patients who had invasive electrodes implanted. We then used only the contacts involving a single stereotactic electroencephalographic (sEEG) shaft or an electrocorticographic (ECoG) stripe to decode neural activity in 26 words and one class of silence. We used a compact convolutional network-based architecture whose spatial and temporal filter weights allow physiologically plausible interpretation.
Head Results. We achieved an average accuracy of 55% using only six channels of data recorded with a single minimally invasive sEEG electrode on the first patient and an accuracy of 70% with only eight channels of data recorded on a single ECoG strip on the second patient on the second patient. classify 26+1 openly spoken words. Our compact architecture required no use of pre-designed functions, learned quickly, and resulted in a stable, interpretable, and physiologically meaningful decision rule that worked successfully over a contiguous data set collected during a different time interval than that used for training. Confirm spatial features of the crucial neuronal populations with active and passive speech mapping results and show the inverse space-frequency relationship characteristic of neural activity. Compared to other architectures, our compact solution performed on par with or better than those recently reported in the neural speech decoding literature.
Meaning. We demonstrate the possibility of building a speech prosthesis with a small number of electrodes and based on a compact feature-engineering-free decoder derived from a small amount of training data.