Another major research focus in our lab is understanding the mechanisms underlying the delvelopment of speech perception. Different languages carve up speech sounds into different categories and infants must learn the properties of their native language. How do they do this? How do infants categorize speech sounds early in development? What does this tell us about how speech is learned and perceived?
One method we have to answer these questions is to use variations of habituation procuedures to test infant's discrimination abilities. In these experiments, infants are seated on their parent's lap in front of a large display. To test auditory discrimination, a video is played with repeating auditory stimulus. For example, the infant might hear "b" words over and over ("beach", "bear", "ball", etc.). Then, infants are presented with a visual target and a repeating auditory stimulus that is either the same as one they heard before (e.g. "beach") or different ("peach"). When the infant looks away from the display, the auditory stimulus stops, and when they look back, another one is played. Infants quickly learn that they can control whether the stimulus by their head movements. We can then measure the amount of time the infant spends looking at the display (and hearing the stimulus) depending on what they heard. If the infant can discriminate between the two sounds, we expect them to look longer for the one that is different than the one they heard before, since infants generally prefer novel stimuli over familiar ones.
We have used this method to assess how infants categorize individual speech sounds, such as those that distinguish the words "beach" and "peach". Do they categorize these sounds the same way an adult English speaker would? We have also looked at how they categorize sounds that are different acoustically, but correspond to the same category in English. For example, the "d"s in the sounds "ad" and "da" are very different acoustically, yet English speakers treat these as the same speech sound category. Do infants also categorize sounds in this way? How do they learn to group them together?
The newest method we use involves specially designed equipment allowing us to eye track infants. This setup uses a remote sensor
attached to a baby cap which tells a computer where in space the subjects head is. This information is then used to direct a small
camera mounted in front of the subject and on a servo motor to point at the face of the subject. This way which ever way the infant moves,
the camera is always able to keep his face in view. Jumping this technical hurdle the rest of the system work just like equipment for adult
eye tracking, the camera emits infrared light which bounces off of the back of the infant’s pupil allowing the computer to track two points of
light, the pupil and corneal reflection.
Several of our experiments are currently using this method with various tweaks on the design. For testing how infants categorize allophonic variation we are using a design called Anticipatory Eye Movements (AEM), in this design a small round face like object appears on the bottom of the screen, travels up disapearing behind an opaque occluder and then reappears on either the left or right side of the screen.
This project examines how infants categorize basic speech sounds (e.g. ba, pa, ta and da), and what those categories might look like. We are interested in whether these categories are abstract and discrete, or whether they are continously sensitive to the properties of the acoustic signal. We are also interested in how well infants are able to represent the range of possible acoustic inputs--do their primitive categories cover many possible tokens, or are there gaps in which specific sounds are not assigned a category?
|
Students Cheyenne Munson Kristine Kovack |
Collaborators Richard Aslin(University of Rochester) |
Allophones are members of the same phonemic category which differ acoustically from each other. For example, in English the /b/ sound at the beginning of a word like ‘Badger’, is actually very different from the /b/ sound at the end of a word, like ‘Stab’. One challenge for infants is determining which acoustic variations constitute a phonemic distinction and which simply fall under the umbrella of allophonic variation. In this study we are using the head turn preference procedure to examine how 8 and 12 month old infants categorize short speech sounds which vary either phonetically or allophonically, and how this develops.
|
Students Marcus Galle |
Collaborators John Kingston (University of Massachusetts-Amherst ) |
Language consists of a stream of sounds that must be interpreted and given meaning. Particular sound patterns form clusters of acoustic cues that vary between different languages (Lisker & Abramson, 1964). These clusters are used to define what sounds are used in a particular language. In the process of learning language, one of the fist tasks encountered by infants is to determine which sound patterns are used in their native language, and which are not. One way to do this is by statistical learning, that is, learning which sound patterns form clusters in their language. To do this, they need to be sensitive to small differents in the speech signal and they must be able to keep track of which sound patterns cluster together, which they can do (Maye et al., 2002; McMurray & Aslin, 2005).
We have modeled this process using an approach called a mixture of Gaussians. In this type of model, categories are represented by Gaussian distributions. In the case of speech categories, these distributions would vary along an acoustically relevant dimension (e.g. voicing) and the different Gaussians would represent different categories in that dimension. English, for example, would have two categories in the voicing dimension: one representing voiced sounds and one representing voiceless sounds. The model learns the structure of these categories through exposure to exemplars of voice-onset time (VOT) sampled from a generated dataset that is based on the properties of real languages, such as English. Each time the model "hears" a VOT token, it can update the parameters that define the structure of each of the Gaussians.
The model initially begins with more Gaussians than are needed to model the data. This reflects the fact that infants do not know ahead of time how many categories their native language will have for a particular phonetic feature. For example, if the infant is learning English, there will be two voicing categories, but there will be three categories if the infant is learning Hindi. The model determines the correct number of categories during training.
Over the course of exposure to many VOT tokens, the model adjusts its parameters to match the number of categories and properties of the distributions from the given language. The movie to the right shows the model learning. The x-axis is VOT (in ms), and the y-axis is the strength of the category's representation. The red line represents the dataset the model is being trained on, and the vertical lines represent the center points of the model's Gaussians. As the model is trained, the total number of Gaussians decreases (since it starts with more than the number of categories), and the Gaussians adjust their positions to account for the incoming data. At the end, the model has reached a stable state where it has learned the voicing categories for English.
This model demonstrates how speech categories can be learned from statistical information, and it allows us to examine the dynamics of that development over time.
References
|
Students Joe Toscano |
Collaborators Richard Aslin (University of Rochester) |
Papers
Presentations
Toscano, J. and McMurray, B. (2005, November). Statistical learning, cross-linguistic constraints, and the acquisition of speech categories: a computational approach. Talk presented at the 11th Midcontinental Workshop in Phonology, University of Michigan, Ann Arbor, MI.
Many phonetic categories are defined by multiple acoustic cues. A number of experiments have shown that listeners weight these cues differently. In addition, cue weights change over the course of development. How are cue weights learned? One possibility is that infants and children might determine cue weights based on the reliability of individual acoustic cues - more reliable cues would be weighted higher, and less reliable ones would be weighted lower. We have developed a computational model using this approach to determine cue weights. The model is able to learn the approximate weights for a variety of acoustic cues occuring in different contexts and different languages. Experiments are planned to test the predictions of the model by looking at how children weight particular cues differently than adults.
|
Students Joe Toscano |
Presentations
Toscano, J. C. and McMurray, B. (2007, March). Taking statistical learning to the next level: A computational approach to the acquisition of multi-dimensional categories. Poster to be presented at the 2007 Biennial Meeting of the Society for Research in Child Development, Boston, MA.
Previous research has demonstrated that both infants and adults are able to learn and adjust acoustic-phonetic categories though short-term statistical learning. What happens to these speech categories during the learning process? We have produced two competing hypotheses based on predictions from two of our computational modeling approaches: the mixture of Gaussians and the Hebbian normalized recurrance network. Beacuse of the different ways each of these models represent speech categories, they make different predictions about how the learning process will unfold when listeners hear a particular pattern of acoustic cues. We are currently conducting experiments to determine which model more closely reflects human behavior during short-term statistical learning.
|
Students Joe Toscano |