GL Communications Inc.
 
 
 
 


Home >  Complete Voice Quality Testing Solutions  > Voice Quality Testing





  Download Voice Quality Testing ITU Algorithms Guide

  Download Voice Quality Testing Analysis Guide

  Download Complete VQT Solutions Brochure (PDF)


Description:

The GL Voice Quality Testing (VQT) product utilizes several industry standard ITU algorithms in order to measure the speech quality of a transmitted voice file. VQT compares the original unprocessed signal with the degraded version using PESQ (ITU-T P.862+P.862.1), PAMS (ITU-T P.800) and PSQM/PSQM+ (ITU-T P.861). The GL VQT can either be installed and operated on a stand-alone system or reside, as an optional feature, on other GL products.

  • Perceptual Evaluation of Speech Quality (PESQ)
  • Operations Performed by PESQ
  • Results Provided by PESQ
  • Perceptual Analysis Measurement System (PAMS)
  • Operations Performed by PAMS
  • Results provided by PAMS
  • Perceptual Speech Quality Measure (PSQM)
  • Results Provided by PSQM
  • ITU.56 Measurements

Perceptual Evaluation of Speech Quality (PESQ)

Modern communications networks include elements (bad coding, error-prone channels and voice activity detection) that cannot reliably be assessed by such conventional engineering metrics as signal-to-noise ratio. One way to measure customers' perception of the quality of these systems is to conduct a subjective test involving panels of human subjects. However, these tests are expensive and unsuitable for such applications as real-time monitoring.

PESQ provides an objective measure that predicts the results of subjective listening tests on telephony systems. To measure speech quality, PESQ uses a sensory model to compare the original, unprocessed signal with the degraded version at the output of the communications system. The result of comparing the reference and degraded signals is a quality score. This score is analogous to the subjective Mean Opinion Score (MOS) measured using panel tests according to ITU-T P.800.

PESQ incorporates many new developments that distinguish it from earlier models for assessing codecs. These innovations allow PESQ to be used with confidence to assess end-to-end speech quality as well as the effect of such individual elements as codecs.

In addition to the standard PESQ score, the GL VQT also provides the PESQ LQ and LQO (P.862.1) score. These revised scores exhibits better correlation to subjective listening quality test scores.


Operations Performed by PESQ

The processing carried out by the PESQ algorithm includes the following stages.

Level Alignment

In order to compare the signals, the reference speech signal and the degraded signal should be at the same, constant power level. This is necessary because the reference signal does not have to be at a defined level and because the gain of the system under test is unknown before testing. PESQ assumes that the subjective listening level is a constant 79dB SPL at the ear reference point [ITU-T P.830, section 8.1.2]. A gain is applied to both the reference and degraded signals to bring them to this level.

Input Filtering

Analog connections often introduce some degree of filtering. For example, PESQ models the receive path of the telephone handset using an input filter. This takes account of the effect of the electrical and acoustic components of the handset. The filter used is similar to the standard "modified IRS receive characteristic" [ITU-T P.830]. It is generally accepted that this has less effect on quality than coding distortions do. PESQ compensates for any filtering that has taken place in the network.

Time Alignment

The system under test may include a delay, which may be variable. In order to compare the reference and degraded signals, they need to be lined up with each other. PESQ applies voice activity detection to the signals to identify those parts of the signal that are speech, ignoring noise. The PESQ time offset measurements do not take account of the delay of the test equipment generating or recording the signal. This means that a time offset reported by PESQ on a file collected will be dependent upon the way in which the test process is executed.

  • First, PESQ aligns the overall speech signals (utterances). An utterance is a continuous speech burst identified by the voice activity detector that does not contain pauses longer than a pre-determined threshold (200ms). This process detects delay over major sections of the degraded signal compared to the reference signal.
  • Second, PESQ aligns overlapping sections of the speech (frames). This process detects delay that is variable over the length of an utterance, as this can be significant in packet-based networks.
  • The third stage does not occur immediately after the second stage, but is performed after the auditory transform has been calculated. The third stage realigns "bad intervals" (sections of the speech with very large disturbance), and improves the model's accuracy with a small number of files where delay changes are not correctly identified by the initial time alignment process.

Auditory Transform

In order to compare the reference and degraded signals, taking account of how a listener would have heard them, each is passed through an auditory transform that mimics certain key properties of human hearing. This gives a representation in time and frequency of the perceived loudness of the signal, known as the sensation surface.

Equalization

Part of the auditory transformation equalizes certain processes that have little subjective effect. First, the transfer function of the system is estimated, and is used to equalize the reference to the degraded in the auditory transform domain. This takes account of filtering in analogue components of the network such as telephone handsets. Second, the frame-by-frame amplitude gain of the system is estimated and used to equalize the auditory transform of degraded file to the reference. In both cases the equalization is partial - large amounts of filtering or gain variation are not cancelled, and therefore result in errors being measured.

Disturbance Processing

The difference between the sensation surfaces for the reference and degraded files is known as the error surface; this shows any audible differences introduced by the system under test. The error surface is analyzed by a process that takes account of the effect that small distortions in a signal are inaudible in the presence of loud signals (masking).

From the positive and negative errors, two disturbance parameters are calculated. They are calculated as non-linear averages over specific areas of the error surface. These disturbance parameters are:

  • The absolute (symmetric) disturbance - a measure of absolute audible error
  • The additive (asymmetric) disturbance - a measure of audible errors that are significantly louder than the reference

This analysis gives two error parameters that summarize the amount of each type of audible error. Finally, the error parameters are converted to a quality score, which is a linear combination of the average symmetric disturbance value and the average asymmetric disturbance value.


Results Provided by PESQ

PESQ (P.862)

PESQ returns a quality score, known as PESQ score, which conforms to ITU-T P.862. PESQ score lies on a scale from -0.5 to 4.5, though in most cases it is between 1 and 4.5. PESQ score correlates with subjective quality scores. However the PESQ score tends to be optimistic for poor quality speech and pessimistic for good quality speech. Alternative mappings for PESQ score have been developed which do exhibit a better correlation to subjective test scores. These are referred to as the PESQ-LQ and PESQ-LQO scores.

PESQ-LQ

PESQ-LQ scores are closer to the listening quality subjective opinion scale, which is standard in the industry and is defined in ITU-T P.800. Listening quality scores lie between 1 and 5. PESQ-LQ score lie between 1.0 and 4.5. This is because 4.5 is usually the maximum obtained in a subjective test.

Listening Quality Scale:

Score Quality of the speech
5 Excellent
4 Good
3 Fair
2 Poor
1 Bad

The score gives a measure of customers' perception of quality. The highest score, 4.5, means that no distortion is measured. As the amount of distortion increases the quality falls.

PESQ-LQO (P.862.1)

The aim of a separate recommendation ITU-T P.862.1 is to provide a single mapping from raw P.862 score to the Listening Quality Objective Mean Opinion Score (LQO-MOS). This latest ITU standard improves on the original PESQ (P.862) by correlating better to subjective test results.

Typical PESQ Score Comparisons

Based on simulations and real measurements, the table below represents the results of a number of typical networks and codecs with no errors or packet loss. In addition, it gives the scores that can be expected in some mobile network conditions where errors are significant.

Network Condition PESQ PESQ-LQ PESQ-LQO
Clean ISDN network 4.3 4.4 4.4
Analog network (G.711) 4.1 4.2 4.2
G.728 codec (16kbit/s) 3.8 3.9 3.9
G.729 codec (8kbit/s) 3.6 3.7 3.7
G.723.1 codec (6.3kbit/s) 3.5 3.4 3.5
GSM EFR codec (12.2kbit/s) 3.9 4.0 4.0
GSM FR codec (13kbit/s) 3.5 3.5 3.5
GSM-EFR mobile network in typical operating range 3.6 to 3.1 3.6 to 2.9 3.7 to 3.0
GSM-EFR mobile network in very poor conditions 2.2 1.6 1.8


Perceptual Analysis Measurement System (PAMS)

Traditionally the only way to measure customer's perception of the quality of modern communications was to conduct a subjective test, but these tests are expensive and unsuitable for applications such as real-time monitoring. PAMS provides an objective measure that predicts the results of subjective listening tests on a telephony system. To measure speech quality, PAMS uses a sensory model to compare the original, unprocessed signal with the degraded version at the output of the communications system. PAMS parameterizes different classes of errors and maps them to predictions of subjective listening quality and listening effort. The mappings are calibrated using a large database of subjective tests. Other diagnostics are also returned.

PAMS incorporates many new developments that distinguish it from earlier codec assessment models such as those given in ITU-T P.861. These innovations allow PAMS to be used with confidence to assess end-to-end speech quality as well as the effect of individual elements such as codecs.


Operations Performed by PAMS

The processing carried out by the PAMS algorithm includes the following stages.

Time Alignment

PAMS is a listening model and has no knowledge of the delay of the system. In order to compare the reference and degraded signals, however, they need to be lined up with each other. This enables the analysis to cancel any bulk delay and also most delay changes that might be caused by, for example, packet-based transmission.

Equalization

Analogue connection often introduces some degree of filtering. PAMS identifies any filtering that has taken place in the network and cancels its effect.

Auditory transform

In order to compare the reference and degraded signals in a meaningful way, they are passed through an auditory transform that mimics certain key properties of human hearing.

Error parameterization

This analysis gives a number of error parameters that summarize the amount of each type of audible error.

Regression

Finally the error parameters are mapped onto predictions of perceived listening quality and listening effort. These mappings are calculated and verified using a very large database of subjective tests to ensure that PAMS is able to predict quality for a wide range of distortion types.


Results provided by PAMS

PAMS returns quality scores on two different opinion scales, listening quality and listening effort. These scales are standard and are defined in [ITU rec. P.800]. Both Listening Quality and Listening Effort utilize a range between 1 and 5 and are usually quoted to two decimal places.

Listening Quality Scale:

Score Quality of the speech
5 Excellent
4 Good
3 Fair
2 Poor
1 Bad

Listening Effort Scale:

Score Effort required to understand the meaning of sentences
5 Complete relaxation possible; no effort required
4 Attention necessary; no appreciable effort required
3 Moderate effort required
2 Considerable effort required
1 No meaning understood with any feasible effort

The scores give a measure of customers' perception of quality. A PAMS score of 5 indicates that no distortion is measured. As the amount of distortion increases, the quality falls. Because they related to different aspects of subjectivity, the listening effort and listening quality scores are normally different if there is perceived distortion; listening effort is usually higher than listening quality.


Perceptual Speech Quality Measure (PSQM)

Subjective quality assessment of speech codecs is one of the key technologies in designing digital telecommunication networks. ITU Recommendation P.830 defines subjective testing methodologies for speech codecs. Since subjective quality assessment is time-consuming and expensive, it is therefore desirable to develop an objective quality assessment methodology to estimate the subjective quality of speech codecs with less subjective testing.

The most widely used objective speech quality measure demonstrating the performance of speech codecs is the Signal-to-Noise Ratio (SNR = S/N). However, it is pointed out that the SNR does not adequately predict subjective quality for modern network components. This is especially true for recent low bit-rate codecs. Therefore, a variety of more sophisticated objective quality measures, such as the LPC Cepstrum Distance Measure, Information Index (II), Coherence Function (CHF), Expert Pattern Recognition (EPR), and Perceptual Speech Quality Measure (PSQM) were developed. The performance of these systems, in terms of ability to give accurate estimates of subjective quality, has been investigated in ITU-T since the 1980s.

After careful comparisons among these objective quality measures, it was concluded that the PSQM best correlated with the subjective quality of coded speech.


Results Provided by PSQM

The VQT performs the PSQM measurement if the algorithm is licensed and the option is selected. The implementation of PSQM is based upon ITU-T Rec. P.861. The algorithm and functionality are described in P. 861 and not repeated here.

The mapping of the PSQM value to Mean Opinion Score (MOS) is described in P.861. PSQM score 0 is equivalent to excellent and 6.5 is very poor on the Listening Quality Scale defined in ITU-T Rec. P.800. For simplicity we suggest a linear re-scaling i.e. MOS Listening Quality = 5 - (4 * PSQM/6.5). Other mappings may be more appropriate. Both scoring ranges are available within the GL VQT application.


ITU.56 Measurements

The VQT always performs the ITU P.56 algorithm (ITU recommendation P.56, Method B) on the reference data and degraded data and calculates mean active speech level, activity factor and peak value for each input.


Buyer's Guide:

Item No. VQuad™ Network Options
VQT010 VQuad™ Software (Stand Alone)
VQT020 VQuad™ Wireless Phone Call Control
VQT012 VQuad™ Analog FXO 4-port Call Control
VQT013 VQuad™ with SIP (VoIP) Call Control
VQT015 VQuad™ with T1 E1 Call Control
Item No. VQuad™ Miscellaneous
VQT202 VQuad™ GPS Location and Timing Option (per node - including GPS receiver)
VQT030 Network Command Center (Multi-Node Command and Control Center for VQuad™ Systems)
VQT240 Universal Telephony Adapter (UTA) with Round Trip Delay Measurement
VQT250 High Quality USB Audio Capture Unit
VQT300 Portable Wireless Enclosure Kit
Item No. VQT
VQT002 Voice Quality Testing (PESQ only)
VQT004 Voice Quality Testing (PAMS, PSQM, PESQ)
VBA032 Near Real-time Voice-band Analyzer

* Specifications are subject to change without notice.

 Back to Voice Quality Testing Page
 
 
Home Page Sitemap Global Presence Email