VQT detailed analysis includes following measurements:

  • Jitter
  • Clipping
  • Level
  • PESQ/Utterance
  • Delay/Utterance


Jitter data is obtained from the time alignment process. The utterance-by-utterance offset must be determined accurately to get a speech quality measure. Jitter is the variation in time offset between reference and degraded utterances. The GL VQT reports utterance offset by providing a minimum/maximum and standard deviation value. These three are measures of jitter in the speech as delivered to the listener. GL also reports the average offset.


Performance Examiner provides a number of diagnostic outputs that relate to the use of muting algorithms and discontinuous transmission. These outputs are generated by comparing the degraded signal to the reference signal.

Muting of a signal typically occurs when an error concealment algorithm at a receiver has insufficient information to replace missing or corrupted data. The muting estimate is provided in terms of the proportion of signal frames that have been muted by the system under test.

Discontinuous transmission (DTX) schemes aim to increase transmission efficiency by ceasing transmission during periods of talker inactivity. Temporal clipping occurs when the voice activity detection (VAD) algorithm in a DTX system misclassifies part of a speech utterance as noise, and replaces it with comfort noise at the receiver.

Front-end clipping refers to the case where the start of an utterance has been clipped. Back-end clipping refers the case where the end of an utterance has been clipped.

Hangover is a term applied to the period after the end of an utterance when a discontinuous transmission scheme continues to transmit as normal, rather than generating comfort noise.


For each measurement levels are calculated for the reference and degraded files. These levels are described below:

Measurement Definition
Active Speech Level (ASL) (dBov) Power Level (RMS) during periods of speech
Mean Noise Level (MNL) (dBov) Power Level (RMS) during periods of silence
RMS Mean Level (dBov) Power Level (RMS) of entire sample
DC Offset (PCM Units) DC Offset of input sample

The following results are interpreted from the data above:

Measurement Definition
Speech Level Gain (dB) Speech Level Gain of the system under test. Calculated: (ASL of degraded signal) minus (ASL of reference signal).
Noise Level Gain (dB) Gain calculated for noise in silent periods. Calculated as (MNL of degraded signal) minus (MNL of reference signal). May differ from the system gain if noise is added or suppressed.


A PESQ/PESQ LQ/PESQ LQO score is available on a per utterance basis. Each sample is broken into distinctive utterances, which GL provides an ITU score for each of the utterances.

For example front end clipping, which would only affect the 1st utterance, could cause the overall scores to be lower than expected. PESQ/Utterance will indicate this cause.


The delay per utterance results are acquired by comparing the beginning of each utterance in the reference file to the beginning of each utterance in the degraded file. This comparison takes place for each utterance in the reference and degraded files.

