Last month, I discussed the various sound level meter frequency weightings and integration times, and how these can affect the measured values of both speech (or music) signals and noise. I also demonstrated the futility of trying to read a rapidly varying signal such as speech, using the “Fast” response time. The resulting 15dBA variation was shown to be tamed to a readable value by employing either the “Slow” integration time (available on nearly all sound level meters) or by measuring the LAeq (or even LZeq) of the wanted signal. The Sound (pressure) levels I obtained for the identical PA message were Fast = 78.0 to 92.5dBAF, 83.1 to 86.4dBAS and 85.5 to 85.9dB LAeq.
There were several other values that I could have obtained when measuring the wanted speech signal and background noise. For example, on some meters, it is possible to measure the maximum and minimum values that occurred within the measurement period. The maximum reading is often incorrectly referred to as the “peak” value (i.e., after peak hold). However, in acoustics (and audio, come to that), “Peak” has a very distinct and defined meaning, and it is most certainly not the same as the max hold/or max level facility found on a sound level meter.
That refers to the maximum RMS value, and it will have an integration time constant associated with it (e.g., 125ms for Fast and 1.0 seconds for Slow). Although determining the max SPL is a simple operation, it is not a particularly useful parameter when considering speech signals, because the value displayed could be a unique “one-off” occurrence that has little to do with what happens the rest of the time.
A true “peak” reading instrument will have a time constant of just 50µs and so will give rise to much higher readings than the max RMS values, typically increasing by around 20dB for a speech signal. In fact, the difference between the long-term RMS and the peak value gives us the Crest Factor of the signal. For speech, this usually is taken as being around 20 to 22dB; for pink noise, it is about 12 to 14dB, and for a sine wave, it is 3dB. The Crest Factor, therefore, gives us an idea of the headroom that a particular signal will require if it is not to be clipped or distorted by the signal chain or transducer.
Traditionally, Crest Factors tend to be unweighted readings, but there is nothing to stop an “A” or “C” weighted value from being determined. (Interestingly, if you were to do this, the two values would be rather different…but that’s another story). In public address and sound reinforcement systems, it generally is not necessary to provide 20dB of signal headroom. Indeed, in many cases, this would be prohibitively expensive and unnecessary. The reason is that there is little energy in a 50µs speech peak and it can be truncated (posh pseudo-technical term for clipped) quite happily to a certain degree without any audible effect because the rest of the sound system “smooths out” the signal, a bit like soft clipping.
Now, I would not advocate doing this if recording or broadcasting the signal, but for SR/PA purposes using a conventional system in a reflective environment, such massive headroom is not required. This, of course, begs the question as to what headroom is actually required in practice. There are two ways of determining this. First, the headroom can be reduced until, subjectively, the limit of acceptability is reached or, alternatively, the amplitude statistics of typical speech signals can be studied and a potential value derived.
Both approaches require the long-term average value of the speech signal to be determined in order to provide a steady baseline (i.e., as though a leveler or slow AGC circuit were being used). As I noted earlier, there is virtually no energy in the real peaks; so, instead, the statistics of the RMS signal levels have to be analyzed. A way to do this would be to consider the percentage of the time that a speech signal exceeds a given value and relate this back to the long-term average level. Typical percentiles could be, say, the 10%, 5% and 1% exceedances (i.e., the RMS level that is exceeded for a given percentage of the time). A while ago, I ran such a series of tests using a range of typical speech signals. The results are presented in Table 1.
For example, from the table, it can be seen that, for 1% of the time, the RMS value of the speech signal was 7.1dBA higher than the long-term average RMS level or that, on average, the LAmax was 8.9dB higher than the long-term speech level. How you want to interpret the data is up to you, but it is the long-term average level (the LAeq or average Slow integration value) that is the underlying parameter of importance and the one that determines the real level of the speech.
To put this in perspective, Figure 1 shows the RMS time history of a typical speech signal with the average (LAeq) value superimposed (no wonder it is impossible to read on a fast-responding meter!). Figure 2 compares the RMS and true peak values of the same signal, thereby showing the signal and its dynamic Crest factor. Although the difference (i.e., Crest factor) can be seen to be around 20dB, there are noticeable differences between the individual synchronized sample characteristics. Analyzing the amplitude characteristics of a typical pink noise shows this to be radically different to speech, and in no way can the two signals be thought to be similar or interchangeable for test purposes.