There can be little doubt, unless you are a DJ, of course, that good high-frequency performance from a sound system is essential for good intelligibility. I have been going on about this for years but in the process perhaps have not sufficiently emphasized that the mid and low frequencies are also important for speech quality.
For some reason, it is often thought that vowels are low-frequency creatures leading a lowly existence, while consonants are higher up the pecking order and only reside at high frequencies; yet this is actually not the case. Broadly speaking, vowels occupy the frequency range from around 130Hz to 3kHz. Yes, you read that correctly: 3kHz!
But surely, I hear you say, 3kHz is where the consonants hang out! Indeed they do, but consonants are not just high-frequency animals; they occur over a broad spectral range from around 450Hz up to 8kHz.
Now this may come as a bit of a shock to many because there is a common belief that consonants only occur in the range from around 1500Hz to 4kHz. This may in part be due to the historical popularity of %Alcons as a predictor and later as a measure of speech intelligibility, with only the 2kHz octave being considered. Take the vowel sound “I,” for example. The fundamental frequency for male talkers for this vowel is generally around 130Hz, with the first formant (F1) typically occurring at 260 to 270Hz, the second formant (F2) at 2300HZ and the third formant (F3) at 3000Hz. Thus, the vowels are definitely encroaching on consonant territory (or is it the other way around?). Averaging all the vowels together (what a mouthful!) would produce an average F1 of around 500Hz and an average F2 of 1420Hz. For female talkers, the average F1 would be around 575Hz and the F2 at 1700Hz.
Vowels carry the power of the voice, which is why, if you look at the speech spectra I presented last month, the maximum
level occurs at around 250 to 500Hz. But what about intelligibility? That is a high-frequency phenomenon surely! Well no, that is not exclusively the case, although I guess it depends on how you define “high frequency,” because this can mean rather different things to different people.
For example, some would say that high frequencies occur above 2000Hz; others might say 1000Hz or even 4000Hz…and if you were an elephant, 500Hz would be positively stellar! Surprisingly, there is as much intelligibility information below around 1800Hz as there is above it. For example, Figure 1 shows the effect of applying a low-pass filter to a speech signal on the resulting intelligibility.
Interestingly, years ago in prehistoric times, when separate horns and bass cabinets used to be kludged together to form clusters, no real attention was paid to what was happening below the crossover point and the directivity of the low-frequency element, but, instead, all the intelligibility was assumed to emanate from the horn. Yet, in fact, only 50% or less, perhaps, may have done so, depending on the crossover frequency employed.
In practice, it is more useful and convenient to break the speech intelligibility contributions into octave or 1/3 octave bands and then weight these octave or 1/3 octave signal components in accordance to their relative contributions to intelligibility. Figure 2, for example, shows the weighting applied by the Speech Transmission Index method of rating intelligibility. The importance of the 2kHz octave band can clearly be seen, but notice the contributions at lower frequencies that, again, show that speech intelligibility is not just about high frequencies. Although the highest weightings do, indeed, cover the consonant frequency range, the vowels are certainly not ignored but are also well represented. It’s an interesting thought, but if the scale were to be in terms of sound quality rather than intelligibility, then a rather different distribution plot would emerge.
But it’s not just a question of frequency: The temporal patterns of the vowels and consonants are also of crucial importance because this is the way in which a vowel transitions into the adjacent consonant in connected (i.e., real) speech. Vowel sounds typically have a duration of around 120mS, while consonants are very much shorter at around 80 to 90mS. Words, being made up of a combination of vowels and consonants have longer durations, being perhaps 250 to 300mS for simple, short words, and longer again for multi-syllable combinations.
So, next time you are designing or working on a sound system, don’t just think consonants but ensure that your system can handle the vowels, as well, and is therefore free from IVS (Irritable Vowel Syndrome). And remember, there is more to speech than meets the ear.
[button type=”large” color=”white” link=”http://viewer.zmags.com/publication/a03494ab#/a03494ab/1″ ]Read More From This Issue[/button]