Multi-Band Excitation
Technology Abstract


The incredible success of the IMBETM and AMBE® Voice Compression Hardware and Software is attributed to the simple fact that they utilize a fundamentally different technology than standard speech coders. This technology is the outgrowth of research that started at the Massachusetts Institute of Technology in the early 1980’s. The original goal of this work was to develop a robust speech model that would outperform the linear prediction speech model used in traditional speech coders. The outcome of this research was the introduction of the innovative Multi-Band Excitation (MBE) speech model. This speech model has a unique speech coding framework that provides a number of advantages over linear prediction based speech coders such as CELP, RELP, VSELP, LPC-10, etc.

IMBETM and AMBE® Voice Compression Hardware and Software are undefeated in 8 consecutive independent performance evaluations. Most CELP (or CELP-based) speech coders make a single determination whether each speech segment is a periodic (voiced) signal, or a noise-like (unvoiced) signal. One major advantage of the MBE speech coder is that it divides each segment of speech into distinct frequency bands and makes a voice/unvoiced (V/UV) decision for each frequency band. This allows the excitation signal for a particular speech segment to be a mixture of periodic (voiced) and noise-like (unvoiced) energy. This added degree of freedom in the modeling of the excitation signal allows the MBE speech model to generate higher quality speech than conventional speech models. In addition, it allows the MBE speech model to be robust in the presence of background noise.

The inherent problem with linear prediction model based speech coders is that they do not provide high quality speech (or robustness to background noise) without the addition of a prediction residual. This prediction residual may be viewed as an error signal that corrects for inaccuracies in the linear prediction model. Elimination of this residual, as is done in the government standard 2.4 kbps LPC-10 system, results in a harsh, mechanical quality in the speech. Consequently, all high quality linear predictive speech coders transmit a residual. The primary difference between these systems is the manner in which they accomplish this task. The favored method used in linear predictive speech coding at rates below 8kbps is to divide the residual into small pieces or vectors and to then search through a codebook to find the code vector which is the closest match. Unfortunately, searching through a reasonable sized codebook is a computationally time consuming and complex task. Furthermore, a particular codebook is designed to operate at a fixed data rate and is not easily scaleable to other data rates.

IMBETM and AMBE®Voice Compression Hardware and Software eliminate fixed data rate and codebook problems since they are not based upon linear prediction. Instead, they use the advanced Multi-Band Excitation speech model to produce superior quality speech without the need for a residual signal. These models maintain speech intelligibility and naturalness at rates as low as 2kbps. As a result, these vocoders are less complex and require fewer computations than CELP or VSELP. Finally, the Multi-Band Excitation speech coders are easily scaleable to virtually any data rate above 2 kbps.

Digital Voice Systems Inc.’s speech coders have been selected as the standard for many types of international mobile communications, including satellite communications, commercial aircraft telephony, and digital mobile radio. They are widely accepted as the vocoder of choice in many other bandwidth limited applications such as secure communications, digital telephone answering devices, voice storage and voice mail.