Talk
Abstract:
Switching Dynamic-System Models for Speech Articulation and
Acoustics
Li Deng
Microsoft
Research
deng@microsoft.com
A statistical generative model for the speech process is described
that embeds a substantially richer structure than the HMM currently
in predominant use for speech recognition. This (multi-level)
switching dynamic-system model generalizes and integrates the
HMM and the (piece-wise) stationary linear dynamic system (state-space)
model. Depending on the level and the nature of the switching
in the model design, various key properties of the speech dynamics
can be naturally represented in the model. Such properties include
the temporal structure of the speech acoustics, its causal articulatory
movements, and the control of such movements by the multidimensional
targets correlated with the phonological (symbolic) units of
speech in terms of overlapping articulatory features.
Some simplification of this model reduces it to several models
widely used in the control, econometrics, signal processing
(target tracking), and neural computation literatures. One main
challenge of using the multi-level switching dynamic-system
model for successful speech recognition, especially for unconstrained,
conversational speech recognition, is the computationally intractable
inference (decoding) on the posterior probabilities of the hidden
states --- both discrete phonological states and continuous
articulatory states. This leads to computationally intractable
EM-based parameter learning (training). Some research on approximate,
computationally tractable inference and learning algorithms
will be discussed in this lecture.
Mathematical
Foundations of Speech Processing and Recognition
2000-2001
Program: Mathematics in Multimedia
|