| US 7,587,321 B2 | ||
| Method, apparatus, and system for building context dependent models for a large vocabulary continuous speech recognition (LVCSR) system | ||
| Xiaoxing Liu, Beijing (China); Baosheng Yuan, Singapore (Singapore); and Yonghong Yan, Bearverton, Oreg. (US) | ||
| Assigned to Intel Corporation, Santa Clara, Calif. (US) | ||
| Appl. No. 10/332,652 PCT Filed May 08, 2001, PCT No. PCT/CN01/00684 § 371(c)(1), (2), (4) Date May 05, 2005, PCT Pub. No. WO02/091357, PCT Pub. Date Nov. 14, 2002. |
||
| Prior Publication US 2005/0228666 A1, Oct. 13, 2005 | ||
| Int. Cl. G10L 15/14 (2006.01) | ||
| U.S. Cl. 704—256.3 [704/256.4] | 30 Claims |

| 1. A method comprising:
receiving, by an analog-to-digital converter, an input signal representing input speech;
converting the input speech to digital input speech by the analog-to-digital converter;
generating a set of multiple-mixture monophone models via a set of single-mixture monophone models;
training the set of multiple-mixture monophone models with a set of single-mixture monophone transcripts to generate a set
of multiple-mixture context independent models;
generating a set of single-mixture triphone models;
training the set of single-mixture triphone models with a set of triphone transcripts to generate a set of context dependent
models;
estimating parameters of the set of context dependent models for the digitized input speech via a data dependent maximum a
posteriori (MAP) adaptation method, wherein parameters of tied states of the set of context dependent models are derived by
adapting corresponding parameters of the set of multiple-mixture context independent models through the MAP adaptation method
via training data associated with the corresponding tied states; and
outputting recognized speech, of the input speech, based on the estimated parameters.
|