TY - JOUR
T1 - Keyword spotting using human electrocorticographic recordings
AU - Milsap, Griffin
AU - Collard, Maxwell
AU - Coogan, Christopher
AU - Rabbani, Qinwan
AU - Wang, Yujing
AU - Crone, Nathan E.
N1 - Funding Information:
This project has been supported by the National Institutes of Health (R01 NS088606, R01 NS091139).
Publisher Copyright:
Copyright © 2019 Milsap, Collard, Coogan, Rabbani, Wang and Crone.
PY - 2019
Y1 - 2019
N2 - Neural keyword spotting could form the basis of a speech brain-computer-interface for menu-navigation if it can be done with low latency and high specificity comparable to the “wake-word” functionality of modern voice-activated AI assistant technologies. This study investigated neural keyword spotting using motor representations of speech via invasively-recorded electrocorticographic signals as a proof-of-concept. Neural matched filters were created from monosyllabic consonant-vowel utterances: one keyword utterance, and 11 similar non-keyword utterances. These filters were used in an analog to the acoustic keyword spotting problem, applied for the first time to neural data. The filter templates were cross-correlated with the neural signal, capturing temporal dynamics of neural activation across cortical sites. Neural vocal activity detection (VAD) was used to identify utterance times and a discriminative classifier was used to determine if these utterances were the keyword or non-keyword speech. Model performance appeared to be highly related to electrode placement and spatial density. Vowel height (/a/ vs /i/) was poorly discriminated in recordings from sensorimotor cortex, but was highly discriminable using neural features from superior temporal gyrus during self-monitoring. The best performing neural keyword detection (5 keyword detections with two false-positives across 60 utterances) and neural VAD (100% sensitivity, ~1 false detection per 10 utterances) came from high-density (2 mm electrode diameter and 5 mm pitch) recordings from ventral sensorimotor cortex, suggesting the spatial fidelity and extent of high-density ECoG arrays may be sufficient for the purpose of speech brain-computer-interfaces.
AB - Neural keyword spotting could form the basis of a speech brain-computer-interface for menu-navigation if it can be done with low latency and high specificity comparable to the “wake-word” functionality of modern voice-activated AI assistant technologies. This study investigated neural keyword spotting using motor representations of speech via invasively-recorded electrocorticographic signals as a proof-of-concept. Neural matched filters were created from monosyllabic consonant-vowel utterances: one keyword utterance, and 11 similar non-keyword utterances. These filters were used in an analog to the acoustic keyword spotting problem, applied for the first time to neural data. The filter templates were cross-correlated with the neural signal, capturing temporal dynamics of neural activation across cortical sites. Neural vocal activity detection (VAD) was used to identify utterance times and a discriminative classifier was used to determine if these utterances were the keyword or non-keyword speech. Model performance appeared to be highly related to electrode placement and spatial density. Vowel height (/a/ vs /i/) was poorly discriminated in recordings from sensorimotor cortex, but was highly discriminable using neural features from superior temporal gyrus during self-monitoring. The best performing neural keyword detection (5 keyword detections with two false-positives across 60 utterances) and neural VAD (100% sensitivity, ~1 false detection per 10 utterances) came from high-density (2 mm electrode diameter and 5 mm pitch) recordings from ventral sensorimotor cortex, suggesting the spatial fidelity and extent of high-density ECoG arrays may be sufficient for the purpose of speech brain-computer-interfaces.
KW - Articulation
KW - Automatic speech recognition (ASR)
KW - Brain computer interface (BCI)
KW - Electrocorticography (ECoG)
KW - Keyword spotting (KWS)
KW - Sensorimotor cortex (SMC)
KW - Speech
KW - Superior temporal gyrus (STG)
UR - http://www.scopus.com/inward/record.url?scp=85059694901&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85059694901&partnerID=8YFLogxK
U2 - 10.3389/fnins.2019.00060
DO - 10.3389/fnins.2019.00060
M3 - Article
C2 - 30837823
AN - SCOPUS:85059694901
SN - 1662-4548
VL - 13
JO - Frontiers in Neuroscience
JF - Frontiers in Neuroscience
IS - FEB
M1 - 60
ER -