US 7,587,321 B2
Method, apparatus, and system for building context dependent models for a large vocabulary continuous speech recognition (LVCSR) system
Xiaoxing Liu, Beijing (China); Baosheng Yuan, Singapore (Singapore); and Yonghong Yan, Bearverton, Oreg. (US)
Assigned to Intel Corporation, Santa Clara, Calif. (US)
Appl. No. 10/332,652
PCT Filed May 08, 2001, PCT No. PCT/CN01/00684
§ 371(c)(1), (2), (4) Date May 05, 2005,
PCT Pub. No. WO02/091357, PCT Pub. Date Nov. 14, 2002.
Prior Publication US 2005/0228666 A1, Oct. 13, 2005
Int. Cl. G10L 15/14 (2006.01)
U.S. Cl. 704—256.3  [704/256.4] 30 Claims
OG exemplary drawing
 
1. A method comprising:
receiving, by an analog-to-digital converter, an input signal representing input speech;
converting the input speech to digital input speech by the analog-to-digital converter;
generating a set of multiple-mixture monophone models via a set of single-mixture monophone models;
training the set of multiple-mixture monophone models with a set of single-mixture monophone transcripts to generate a set of multiple-mixture context independent models;
generating a set of single-mixture triphone models;
training the set of single-mixture triphone models with a set of triphone transcripts to generate a set of context dependent models;
estimating parameters of the set of context dependent models for the digitized input speech via a data dependent maximum a posteriori (MAP) adaptation method, wherein parameters of tied states of the set of context dependent models are derived by adapting corresponding parameters of the set of multiple-mixture context independent models through the MAP adaptation method via training data associated with the corresponding tied states; and
outputting recognized speech, of the input speech, based on the estimated parameters.