Speech-to-Text Conversion Utility to Test Interactive Voice Response (IVR) and Voice Mail (VM) Systems

4th, May 2018

Welcome to the May 2018 issue of GL Communication’s Newsletter providing information and insight into our new speech-to-text conversion utility referred to as Speech Transcription Server.

Speech Transcription Server in IVR System

Speech Transcription Server in IVR System


GL’s Speech Transcription Server (STS) is a Speech-to-Text conversion application that enables the translation of spoken language into text along with analysis of the transcribed speech. Speech translation is performed on captured audio files in PCM or WAV formats. This application can be used for confirming voice prompts, testing Interactive Voice Response (IVR) and Voice Mail (VM) systems, as well as voice transmission over any network.

GL’s Speech Transcription Server is an automated PC-based speech-to-text conversion utility. This can be used as a standalone utility or integrated with other GL’s test tools for automation, precise call control, and quality analysis.  STS supports REST APIs which allows the utility to be used with other GL’s intrusive test tools such as MAPS™ and VQuad™ . One can send and receive transcription requests, as well as retrieve transcription results from database.

MAPS™ provides a unique architecture for multi-interface, multi-protocol simulation, which make it suitable for testing any core network, access network and inter-operability functions. VQuad™ Probe HD is an all-in-one self-contained hardware supporting multiple physical interfaces for connecting to practically any wired or wireless network while automatically performing end-to-end voice and data testing over any network.

By incorporating GL’s STS within MAPS™ and VQuad™ emulator platforms, users can automate testing of IVR tree traversal for pass/fail conditions with great precision. Records each prompt (IVR menus) in an automated fashion and forward the recorded audio files for speech-to-text transcription and analysis. Both MAPS™ and VQuad™ testing platforms allow usage of STS utility over various networks such as 2-wire (FXO, FXS), TDM, IP, and Wireless (GSM, UMTS, VoLTE, ...).

STS can also be used with GL’s Voice Quality Testing solution to measure network voice quality, effect of different codecs on speech transcription quality, effect of noise, echo, and bit error rate. STS supports various industry standard voice codecs - refer to Voice Codec webpage for more details.


The utility continuously monitors single or multiple folders for audio files. If there are audio files in the specified directory, the transcription will start immediately. If the directory is currently empty, then the application will wait for new PCM or WAV files. The file is placed in the speech transcription queue and waits for transcription. As the voice files are transcribed, results are displayed in the order they are transcribed and includes certainty score ranging from 0-1 where 0 indicating the lowest confidence score and 1 indicating the highest confidence score for the transcribed text.

Monitoring for Auto Transcription

Monitoring for Auto Transcription

Transcription results from monitored directories are automatically stored into a database. Clients can access the database directly or through REST API to retrieve results.

Transcription Results; Color-coded by directory name

Transcription Results; Color-coded by directory name

STS with VQuad™

The Speech Transcription Server with GL’s VQuad™ solution provides intrusive method for Voice Quality. Using the VQuad™, the customer sends a pre-defined voice sample from near to far-side, record at the far-side and analyze based on the file sent from near-side. This is a very accurate method for Voice Quality measurement but requires equipment at both ends of the call.

Alternatively, if access to both sides of the call is not available, the Speech-to-Text analysis feature in VQuad™ allows to simply specify a sentence at the far-end and provide a certainty score of the recorded file. The Speech to Text conversion can confirm if the received audio, for instance a customer speaking into their phone, matches with what was expected which in turn would confirm there was discernable audio over the network.

The Speech to Text analysis feature in VQuad™ supports two methods, Word analysis (looking for exact word matches throughout the file) and Text matching (analysis based on k-shingles). The Text matching methodology allows better matching, where the transcribed text is broken into two or three letter shingles and analyzed per shingle rather than per word. Speech analysis results can be sent to VQuad™ Central Database and viewed through GL WebViewer™.

Speech to Text using Manual Mode with Speech Analysis

Speech to Text using Manual Mode with Speech Analysis

The testing can be automated directly from the VQuad™ scripts using the Speech to Text commands for transcribing recorded audio and analyzing same against the reference text. The REST API allows users (clients) to request text transcription and transcription results of specific voice files on demand.


Users can leverage GL’s MAPS™ platforms to automate the testing of any IVR systems.  MAPS™ provides the necessary base to emulate different IVR call flows and user profiles with complete automation. MAPS™ allows transmission and recording of voice audio files over any telecommunication network interface such as FXO/FXS, 4-Wire, ISDN, SS7, GSM, UMTS, and VoLTE, and can be controlled/automated via scripting and API.

Each voice prompt can be verified by performing the following steps:

  • Upon call establishment, start file recording for a predetermined amount of time
  • The recorded file is transferred to the Speech Transcription Server
  • Speech-to-Text conversion happens automatically or on demand from GL test platform via REST API
  • Transcription results are stored in database and sent back to GL test platform on request via REST API
  • GL test platform analyzes the transcription result against expected prompt
  • Depending on analysis result, GL test platform responds to the prompt by transmitting DTMF or voice file
  • Same steps are repeated to test the next prompt

This procedure is not limited to IVR testing and can also be applied to any announcement verification and voicemail testing, where the voice prompt is recorded, transcribed, and compared with expected text.

For each transcribed prompt, detailed transcription view provides additional information such as the duration of the recorded prompt, certainty of the transcription, and validation results for each expected phrase.

IVR Testing of GL's Phone System using MAPS™ APS and Speech-to-Text

IVR Testing of GL's Phone System using MAPS™ APS and Speech-to-Text

Main Features

  • Ability to convert PCM or WAV files into text format
  • Supports multi-languages such as U.S./U.K. English, French, German, Italian, Japanese and many more
  • Cloud-based processing provides accurate translations (Requires Internet connectivity)
  • Monitor single or multiple folders containing audio files for automatic transcription
  • Each monitored folder can be configured for language and audio format
  • Full automation using VQuad scripting
  • Accurate analysis of transcribed text with quality (Pass/Fail) scores
  • Transcribe up to 30 seconds of speech files into text
  • Concurrent transcription of up to 30 voice files
  • Easy to access transcribed text via API or database
  • REST APIs support for transcription request and transcript retrieval
  • Base software includes 100,000 files transcriptions per year, validity can be extended with annual support contract
  • Support for Windows® 7 and above


  • Out-of-the box integration support with existing GL test platforms such as VQuad™ and MAPS™
  • REST API support for fast and easy integration with third party testing platforms
  • REST API server allows one Speech Transcription Server instance to serve multiple clients
  • Available Speech-to-Text analysis support for full test automation
  • Supplement GL’s Voice Quality Testing solution for passive method to verify good audio quality
  • Fast speech to text transcription – transcribes 30 seconds of voice file in less than 4 seconds
    (Transcription speed depends on Internet connection quality)

Back to Newsletter Index Page Back to Newsletter Index Page