GL’s Speech Transcription Server is a PC-based automated speech-to-text conversion application. Among numerous applications, the Speech Transcription Server can be used for confirming voice prompts (announcements) and aid intesting Interactive Voice Response (IVR) systems as well as voice transportation over any network. Network providers use the application to record the voice prompts associated with the IVR, perform a Speech to Text conversion on the recording to confirm the prompt was proper (based on what the prompt should be), and thus confirming their IVR functioning. The application can be used to verify network quality as well as effect of different codecs on the speech quality.

This application works with GL’s File Conversion Utility which supports various industry standard codecs. This application can be good companion for GL’s MAPS™, and VQuad™ applications that allow transmission and recording of voice to/from audio files over many industry standards telecommunication network interfaces like FXO/FXS/ 4-Wire/ISDN/SS7/GSM/UMTS/VoLTE etc.

The Speech-to-Text application can also supplement GL’s Voice Quality Testing solution providing a more passive method to verify good quality audio. Using the VQT intrusive method for Voice Quality, the customer needs to send a pre-defined voice sample from near to far-side, record at the far-side and analyze based on the file sent from near-side. This is a very accurate method for generating a Voice Quality measurement but requires equipment at both ends of the call. Alternatively, if access to both sides of the call is not available, the Speech-to-Text method can simply record a pre-specified sentence at the far-end and provide a Certainty score of the recorded Text based on what was expected. The Speech to Text conversion can confirm if the received audio, for instance a customer speaking into their phone, matches what was expected which would confirm there was relatively good audio on the network.

Main Features

  • Ability to convert PCM or WAV files into text format.
  • Single or Multiple folders containing audio files can be converted to text easily.
  • Transcribe up-to 20 seconds of speech files into text
  • Transcribe recorded short speech files into text; there is a maximum limit of 20 seconds for any single utterance.
  • Cloud-based processing provides accurate translations; requires internet connectivity for cloud based processing (HTTPs port 443 used)
  •  Supports multi-languages such as U.S English, French, German, Italian, Japanese and more.
  • Easy to access transcribed text via file, API or database
  • Base software includes 100,000 files transcriptions per year, validity can be extended with annual support contract.
  • REST APIs for transcription request and transcript retrieval.

Working Principle

The Speech Transcription Server can convert recorded audio (*.PCM or *.WAV) files into text format. Single or Multiple folders are monitored continuously for short audio files and once the files are detected, they are placed in the speech transcription queue for transcription. These files are sent to the cloud-based transcription service and are accurately converted to text.

Transcription Results

Results are listed in the order they are transcribed and includes source file information (such as File Name, Directory path, Length of the file, Modified Date and Time) along with the transcribed file information such as Transcribed Date and Time, Transcription output text, Certainty score, and the Time taken in seconds to transcribe the file. The certainty ranges from 0-1 where 0 indicating the lowest confidence score and 1 indicating the highest confidence score for the transcribed text. Anything above or equal to 0.8 is considered as good confidence score.

Rest API Server

REST API Server helps to transcribe and get the transcribed results on the Remote PC using HTTP request on the configured REST API port number.

Buyer's Guide

Speech-to-Text Conversion

