speech waveform

Joe Toscano
PhD Student
Cognition & Perception
Dept. of Psychology
University of Iowa

Office: E420G Seashore Hall
Lab: 131 Spence Labs

Phone: 319-335-0692

Email:
joseph-toscano at uiowa.edu



Neural measures of perceptual processing

What do listeners do with seemingly irrelevant variation in the speech signal? For example, differences between people's voices, speaking rate, and dialect can all lead to differences in how speech is produced, even when speakers are trying to say the same thing. Thus, although we may treat all words with a "b" as having the same sound, there is actually a set of sounds that make up the "b" category. This variability complicates speech perception, making it difficult to find simple mappings between speech sounds and language. Nonetheless, people can understand spoken language quickly and accurately, treating variants of "b" as members of the same category.

A number of researchers have recently suggested that these differences are actually informative (even if they look like noise on the surface). Critically, this suggests that listeners should be sensitive to differences that do not signal different categories. Recent work has shown that listeners are sensitive to within-category acoustic differences at the level of phonological categories or lexical representations. However, we do not know whether perceptual encoding itself is influenced by category information. Behavioral responses are filtered through phonological categories, making it difficult to answer this question.

N1 component

(A) ERP waveform as a function of VOT. (B) Mean N1 amplitude as a function of VOT and stimulus continuum. (C) Mean N1 amplitude as a function of VOT and target voicing category (target-response trials). The size of each data point is proportional to the number of trials for that condition.

The event-related brain potential (ERP) technique is well-suited to studying this problem. ERPs are a measure of brain activity that can be obtained using non-invasive electrodes attached to the head. Electrical activity produced by the brain can be detected at the scalp and recorded by these electrodes in real-time. Because of its temporal precision, we can examine speech processing during spoken word recognition and identify components associated with different processes. We conducted an ERP experiment designed to examine the effects of changes in a continuous acoustic cue with respect to perceptual encoding (using the auditory N1; ca. 150 ms post-stimulus) and categorization (using the P3; ca. 450 ms).

Subjects were presented with a series of sounds that varied from one phonetic category to another. For example, the sound clip below varies in voice onset time (VOT) along a continuum from the word dart to the word tart. When you play it you will hear the words varying in nine VOT steps from 0 ms (a good dart) to 40 ms (a good tart).

In this experiment, subjects were presented with stimuli like these and asked whether or not the word they heard was the same as a target word. We found that the amplitude of the auditory N1 varied linearly with changes in VOT and was not influenced by the phonological category the subject was monitoring for, nor by how they categorized the stimuli. P3 amplitude also varied with VOT, but depended on which category the subject was monitoring for. This suggests that perception is continuous with respect to changes in the speech signal and that the effects of categories observed in behavioral responses are the result of later-occurring processes that use phonological information.

P3 component

(A) ERP waveform as a function of VOT adjusted for each subjects' category boundary (relative VOT). (B) Mean P3 amplitude as a function of relative VOT and target voicing condition.

These results may also have practical implications. This methodology provides a unique window into perceptual processing of speech. Many models of language and reading impairment are centered on the notion that normal language users ignore small differences in sounds, and a failure to do so results in impairment. These results challenge that premise -- perceiving fine-grained detail may actually be necessary for successful language use.

We are now looking at responses to acoustic cues for other phonological contrasts (fricatives and vowels), and future work will extend this approach to examine other questions about the nature of early perceptual processing of speech.


More information



Valid CSS! Valid HTML 4.01 Transitional