|
A NOVEL DBN MODEL FOR LARGE VOCABULARY CONTINUOUS SPEECH RECOGNITION AND PHONE SEGMENTATION Host Publication: Finds and Results from the Swedish Cyprus Expedition: A Gender Perspective at the Medelhavsmuseet Authors: G. Lv, D. Jiang, H. Sahli, R. Zhao and W. Verhelst Publisher: ISRST Publication Date: Jul. 2007 Number of Pages: 6 ISBN: 978-0-9727412-3-1
Abstract: In this paper we propose a new Single stream Multi-states Dynamic Bayesian Network (DBN) model for continuous speech recognition and phone segmentation. The proposed model is an augmentation of the word-phone DBN (WP-DBN) model of Bilmes and Bartels [5], to which we add an extra level of hidden nodes: states, resulting in a new word-phone-state DBN (WPS-DBN) model. In the proposed model, a word is composed of its corresponding phones, a phone is composed of a fixed number of states, and a state is associated with the observation feature. Hence, in the WPS-DBN model, the basic modeling units are phones, and the dynamic pronunciation process of a phone is modeled by the Gaussian Mixture Model (GMM) parameters of states and the transition probabilities between states. This allows emulating the modeling principle of Hidden Markov Models (HMM) in continuous speech recognition. The model allows obtaining both word and phone segmentation with timing boundaries. The results of the proposed approach are reported using both a continuously spoken digit database and a large vocabulary continuous speech database. Quantitative evaluation results, by comparing the performances of the WP-DBN model and the WPS-DBN model are given. Besides word recognition rates in different noisy environments, the phone segmentation accuracies (PSA) are also compared for the correctness and timing boundary accuracies of the phone segmentation results. External Link.
|
|