Understanding speech in the presence of other speech: Perceptual mechanisms for auditory scene analysis in human listeners

ESRC logo

ESRC Grant no. ES/K004905/1

Principal investigator: Prof. Brian Roberts (B.Roberts@aston.ac.uk)
Research fellow: Dr Rob Summers (R.J.Summers@aston.ac.uk)


It is unusual to hear the speech of a particular talker in isolation; speech is typically heard in the presence of interfering sounds, such as the voices of other talkers. This project aims to elucidate the mechanisms by which listeners segregate the formants constituting the speech of one talker from those constituting the speech of another.

There is extensive evidence that grouping “primitives” such as common onset time are important for the perceptual grouping of non-speech sounds, but few studies have investigated the role of such cues in the grouping of speech formants. Psychophysical experiments are being conducted using simplified speech stimuli that permit at least two possible perceptual organisations to compete with each other. These competitive configurations are used to quantify the relative impact of extraneous formants on speech intelligibility as their acoustic properties are manipulated. This enables a detailed investigation of the extent to which across-formant grouping is determined by general-purpose grouping cues and by speech-specific grouping cues.

The relationship between primitive and high-level grouping constraints will also be explored, by determining whether linguistic information presented just prior to a speech stimulus can increase the perceptual exclusion of a competitor formant.

Acoustic source properties, across-formant integration, and speech intelligibility under competitive conditions
  • Summary of findings: Results of these experiments support the notion that the contribution of a formant to the phonetic identity of a speech sound is governed by the type of that formant's acoustic source properties, rather than whether or not it matches the source properties of the other formants.  

Informational masking of monaural target speech by a single contralateral formant
  • Summary of findings: A single contralateral formant can produce substantial informational masking of target speech. The effect of an extraneous formant on intelligibility depends primarily on variation of its frequency contour.

Across-formant integration and speech intelligibility: Effects of acoustic source properties in the presence and absence of a contralateral interferer
  • Summary of findings: These findings extend those from earlier research using dichotic targets. Acoustic source type and competition, rather than acoustic similarity, govern the phonetic contribution of a formant, even when target and interfering formants are matched for loudness.

Effects of differences in F0 and F0 contour on phonetic integration in a formant ensemble 
  • Summary of findings: In the absence of interference, a mismatch in F0 contour between F2 and F1+F3 has no detrimental effect on intelligibility. Intelligibility is reduced when an interfering formant (F2C) is added whose F0 contour matches that of F1+F3. As the difference in F0 between F2 and the other formants increases, intelligibility falls further. This effect depends on the mean difference in F0 between contours rather than the difference between contour shapes per se.

Effects of masker spectro-temporal coherence on the informational masking of speech 
  • Summary of findings: The intelligibility of F1+F2+F3 is lowered when an interfering formant (F2C) with time-varying frequency (inverted F2 contour) and constant amplitude is presented in the opposite ear. Four other conditions tested manipulations of F2C in which the amplitude contour was divided into short segments (100 or 200 ms long) and the order of the segments was either retained or randomised to introduce abrupt discontinuities in the F2C frequency contour. No additional or differential effect of amplitude segmentation or segment-order randomisation was observed.