Speech recognition

Mr Dave Fish

Jomega Ltd, Austrey, Warwickshire

The number of automotive manufacturers to make use of speech recognition systems within their products is steadily increasing, along with the number of applications. The interfaces to many of the ancillary controls could be based around speech recognition, assuming the speech engines were accurate enough to produce acceptable levels of reliability. Climate control, telephone, navigation system, radio and PDA (personal digital assistant) could, and will be, voice controlled. However, in many vehicles, the background noise levels associated with vehicle operation significantly reduce the reliability of the systems, producing unacceptably levels of incorrect identification.

In order to test speech engines, both for algorithm development and product selection, it is important to be able to test the systems with representative levels of background noise. This noise must represent a wide variety of sources, from engine and wind noise to rain drumming and passing trucks, under a similarly diverse range of operating conditions. As such, the process of testing a system rigorously under actual measured conditions becomes extremely onerous.

The purpose of this project is to identify a suitably compact, artificial signal that is short enough to allow extensive testing of the speech engines but remains representative of the in-vehicle operating environment encountered. To achieve this the signal must contain all the significant characteristics of the vehicle environment, including level, transience and spectral balance such that, statistically, it will represent the actual noise floor presented to the speech recognition algorithm over the vehicles operating envelope.

The question is how to identify the significant features in the background noise and how they should be incorporated into the artificial signal to ensure a statistically valid test signal is achieved.