Arabic Speech Recognition Using Hidden Markov Models

No Thumbnail Available
Iman Abu El Maaly, Abd El Rahman
Journal Title
Journal ISSN
Volume Title
The study of speech recognition is part of a quest for “artificially intelligent” machines that can “hear” , “understand”, and “act upon” spoken information and “speak” in completing the information exchange. As the objective of a robust, intelligent, fluent. Arabic machine remains a distant goal, we introduce an Arabic speech recognition system using Hidden Marco Models (HMM). It is a phoneme recognition system which is a one of the important steps towards a continuous speech recognition system with a large vocabulary size for Arabic. Signal modeling of this recognition system is a accomplished in four main steps. First, spectral shaping is accomplished by recording the speech signal at a sample frequency of 10kHz, and then emphasizing important middle frequency components in the signal by the pre-emphasis filtering using a first order differentiator. Second, spectral analysis techniques are implemented after segmenting the signal into frames of duration 20ms each, and with overlapping of 5ms. Different sets of parameters are extracted . Most of these parameters are passed on the Linear Prediction. Coding (LPC) technique. Another parameter (power) is extracted directly from the signal. Third, parameters transformation is accomplished by the differentiation process to better characterize temporal variations in the signal, and by the liftering process to enhance those portions of the cepstrum representing vocal tract information. The final sets of parameters obtained are the following. 1. Prediction coefficients. 7. Area function parameters. 2. Reflection coefficients. 8. Log area ratios parameters. 3. LP-derived cepstral coefficients. 9. Power. 4. Liftered cepstral coefficients. 10. Delta power. 5. Delta cepstral coefficients. 11. Delta-delta power. 6. Delta-delta cepstral coefficients. Fourth, the prewhitening transformation is implemented on some combinations of the above sets of parameters to remove correlation between their elements. Then, Vector Quantization (VQ) is implemented as a compression technique to reduce the computational complexity. The output of this stage is a set of observation vectors representing the signal. These observation vectors are used in the HMM recognition system. The speech recognition system adopted in this thesis is based on HMMs. A left-to-right HMM with 3-states is constructed for each of the 31 Arabic phonemes. The Viterbi reestimation method is used to estimate model parameters in the training phase of the recognition system. Viterbi decoding algorithm is implemented for solving the problem of choosing an optimal states sequence corresponding to the observation sequence and the model. In the recognition phase, the probability of the observation sequence, given the model is computed using the Viterbi decoding algorithm. At the final recognition stage, performance tests are accomplished on different sets of parameters. This system is constructed using the technique of Object Oriented Programming which makes management of the computer memory easier, and makes code reuse practical. The results of the performance tests confirm the superiority of the cepstral coefficients representation with recognition performance 74% over the other representations. Combinations of cepstral coefficients and each of the other parameter sets are tested. Results showed that supplementing the cepstral coefficients with delta power and delta-delta power improves the performance to 81%. These results are evaluated and compared to similar results on the recognition of English vowels given in [12]. In the light of results obtained, some characteristics of Arabic phonemes are observed. A new classification of Arabic phonemes, which is based on the positions of the front, back, and root of the tongue during the articulation of Arabic phonemes, is presented. Three new parameters are derived for resolving the confusability encountered between Arabic phonemes, and for improving the performance of the system. They are used in a second level of the recognizer (section 5.6). Performance results of these parameters are 44% for the “Emphatic/nonemphatic” parameter, 40% for the “Root” parameter, and 90% for the Hamza” parameter.
Arabic Speech, Recognition , Hidden, Markov Models