HOME    »    PROGRAMS/ACTIVITIES    »    Annual Thematic Program
Fall 2000
IMA Workshop
Mathematical Foundations of Natural Language Modeling
October 30-November 3, 2000


with partial support by The Office of Naval Research

Organizers:

Roni Rosenfeld
Carnegie Mellon University
roni@cmu.edu

Sanjeev Khudanpur
Center for Language and Speech Processing
Johns Hopkins University

khudanpur@jhu.edu

Mark Johnson
Department of Cognitive and Linquistic Sciences
Brown University
mj@cs.brown.edu

Frederick Jelinek
Department of Electrical and Computer Engineering
Johns Hopkins University

jelinek@jhu.edu

 

Registration for this workshop is now closed.

Language modeling is crucial to all applications that process human language with less than complete knowledge. This includes speech recognition, machine translation, optical character recognition, handwriting recognition, spelling and grammar correction, and others. Formal theories of grammar have so far failed to account adequately for actual natural usage of language. Stochastic versions of formal grammars are still less successful (as measured by cross entropy of their predictions) than simple Markovian models (ngrams) which are estimated from larger amounts of data. With the advent of huge textual corpora, a breakthrough in language modeling will come when we successfully integrate linguistic knowledge with statistical estimation techniques.

This workshop will bring together researchers who are working on various aspects of language modeling (stochastic grammars, clustering, maximum entropy models) with mathematicians with interest in these and related problems (Bayesian methods, clustering, information theory). The first 2 days will consist of an overview of the field and existing techniques, followed by presentations of ongoing research. The overall objective is to encourage interaction and collaborations between mathematicians and practitioners to pursue the next generation of solutions to language modeling problems. The specific goals are:

  • Familiarize the mathematicians with the language technology, outline underlying fundamental problems and currently popular/successful solutions.

  • Present novel models, ideas or approaches currently being pursued in the language modeling community.

  • Present recent advances in mathematics which may be relevant to the language modeling community.

WORKSHOP SCHEDULE

Monday Tuesday
MONDAY, OCTOBER 30
STATISTICAL LANGUAGE MODELING
All talks are in Lecture Hall EE/CS 3-180 unless otherwise noted.
8:30 am Coffee and Registration

Reception Room EE/CS 3-176

9:10 am Willard Miller, Fred Dulles, and Sanjeev Khudanpur Introduction
9:30 am

Harry Printz
IBM T.J. Watson Center and Roni Rosenfeld
Carnegie Mellon University

Overview of SLM Applications
10:30 am Break Reception Room EE/CS 3-176
11:00 am-12:00 pm Roni Rosenfeld
Carnegie Mellon University

An Accelerated Introduction to Statistical Language Modeling

2:00 pm Jerome Bellegarda
Apple Computer, Inc.

Data-Driven Semantic Language Modeling

Slides   pdf    postscript

"Exploiting Latent Semantic Information in Statistical Language Modeling," Proceedings of the IEEE, Vol. 88, No. 8, pp. 1279-1296, August 2000.

3:00 pm Break Reception Room EE/CS 3-176
3:30-4:30 pm

James Baker (Dragon Systems) , Dietrich Klakow (Phillips), Harry Printz (IBM) and Alejandro Murua (U. of Washington)

Panel Discussion on "The State of the Art in SLM"

Harry Printz Talk Acoustic Confusability   pdf    postscript    associated paper   pdf    postscript

4:30 pm IMA Tea
A variety of appetizers and beverages will be served.
IMA East, 400 Lind Hall
TUESDAY, OCTOBER 31
STATISTICAL COMPUTATIONAL LINGUISTICS
All talks are in Lecture Hall EE/CS 3-180 unless otherwise noted.
9:15 am Coffee Reception Room EE/CS 3-176
9:30 am Christopher Manning
Stanford University

Probabilistic Models in Computational Linguistics

Talk    pdf

10:30 am Break Reception Room EE/CS 3-176
11:00 am-12:00 pm Mark Johnson
Brown University

An Introduction to Probabilistic Grammars and their Applications

Talk   pdf    postscript

2:00 pm Michael Collins
AT&T Labs-Research
Statistical Models for Natural Language Parsing
3:00 pm Break Reception Room EE/CS 3-176
3:30-4:30 pm Fernando Pereira, (At&T Labs), Jason Eisner (University of Rochester), and Chanshu Ji,(University of North Carolina) Panel Discussion on "From Parsing to Text Understanding; What are the Real Challenges?"
WEDNESDAY, NOVEMBER 1
MAXIMUM ENTROPY AND EM TECHNIQUES
All talks are in Lecture Hall EE/CS 3-180 unless otherwise noted.
9:15 am Coffee Reception Room EE/CS 3-176
9:30 am Sanjeev Khudanpur
Johnds Hopkins University
Maximum Entropy Techniques and Exponential Models in SLM/SCL
10:30 am Break Reception Room EE/CS 3-176
11:00 am-12:00 pm Andreas Stolcke (SRI International), Stefan Riezler (Universitat Stuttgart), D. Klakow (Phillips), and Zhiyi Chi, (University of Chicago)

Panel Discussion on ``Modeling Techniques for Combining Multiple Information Sources"

Andreas Stolcke Talk   pdf    postscript

2:00 pm Frederick Jelinek
Johns Hopkins University

Parser - Driven Language Modeling

Slides

3:00 pm Break Reception Room EE/CS 3-176
3:30-4:30 pm Eugene Charniak (Brown University), Lillian Lee (Cornell University) Larry Gillick (Dragon Systems), and Peter Bickel,(UC-Berkeley) Panel Discussion on "Applications of EM Techniques"
THURSDAY, NOVEMBER 2
BAYESIAN METHODS AND MCMC
All talks are in Lecture Hall EE/CS 3-180 unless otherwise noted.
9:15 am Coffee Reception Room EE/CS 3-176
9:30 am Julian E. Besag
University of Washington
Markov Chain Monte Carlo and Bayesian Computation
10:30 am Break Reception Room EE/CS 3-176
11:00 am-12:00 pm Roni Rosenfeld (Carnegie Mellon University), Olivier Catoni (CNRS), and Jean-Phillippe Vert (Ecole Normale Superieur) Panel Discussion on "Bayesian Techniques in Computational Models of Natural Language"

Adaptive Context Tree and Text Categorization

2:00 pm Mehryar Mohri
AT&T Labs - Research
Finite-State Language Modeling
3:00 pm Break Reception Room EE/CS 3-176
3:30-4:30 pm Steven Abney (AT&T ), Ya'acov Ritov, (Hebrew University of Jersusalem) and Stu Geman, (Brown) Panel Discussion on "Connections between Weighted Finite State Techniques and More Traditional Statistical Models"
6:00 pm Workshop Dinner Caspian Restaurant
FRIDAY, NOVEMBER 3
FUTURE DIRECTIONS
All talks are in Lecture Hall EE/CS 3-180 unless otherwise noted.
9:15 am Coffee Reception Room EE/CS 3-176
9:30-10:30 am New Multidisciplinary Research Proposals from Workshop Participants  
10:30 am Break Reception Room EE/CS 3-176
11:00 am New Multidisciplinary Research Proposals from Workshop Participants (Cont'd.)  
12:00-12:15 pm Roni Rosenfeld (Carnegie Mellon University), Sanjeev Khudanpur (Johns Hopkins University), and Mark Johnson (Brown) Closing Remarks
Monday Tuesday

LIST OF CONFIRMED PARTICIPANTS

as of 11/2/2000
Name Department Affiliation
Steven Abney   AT&T
Joan Bachenko   Linguistech Technologies
James Baker   .
Marian Barry    
Jerome Bellegarda Advanced Technology Group Apple Computer, Inc.
Julian E. Besag Statistics University of Washington
Peter Bickel Statistics University of California-Berkeley
Dan Boley Computer Science Un of MN
Jamylle Carter   Institute for Mathematics and its Applications
Olivier Catoni Probabilites et Modeles Aleatoires C.N.R.S.
Eugene Charniak Computer Science Brown University
Li-Tien Cheng   Institute for Mathematics and its Applications
Zhiyi Chi Statistics University of Chicago
Michael Collins   AT&T Labs-Research
Akira Date Advanced Brain Signal Processing Lab. Riken Brain Science Institute
Mukund Deshpande Computer Science Un of Mn
Fred Dulles   Institute for Mathematics & its Applications
Jason Eisner    
Selim Esedoglu   Institute for Mathematics and its Applications
Paul Garrett Mathematics University of Minnesota
Stu Geman Applied Mathematics Brown University
Larry Gillick Vice President of Research Dragon Systems, Inc.
Marcia Gini CSCI Un of MN
Frederick Jelinek Electrical and Computer Engineering Johns Hopkins University
Chuanshu Ji Statistics University of North Carolina - Chapel Hill
Mark Johnson Cognitive & Linguistic Sciences Brown University
Sanjeev Khudanpur Center for Language and Speech Processing Johns Hopkins University
Dietrich Klakow   Philips Research
Christopher Lang   Indiana University Southeast
Lillian Lee Computer Science Cornell University
Elizabeth Lovance   Linguistic Technology
Martin Maiers Computer Science University of Minnesota
Christopher Manning Computer Science Stanford University
David McKoskey Research/Development Linguistic Technologies, Inc.
Dan Melamed   Westgroup
Willard Miller   Institute for Mathematics & its Applications
Mehryar Mohri   AT&T Labs - Research
Alejandro Murua Statistics University of Washington
Sergey Pakhomov ILLASL (Linguistics Program) University of Minnesota
Fernando Pereira   Whiz Bang! Labs
Harry Printz   IBM T.J. Watson Research Center
Kashif Riaz Strategic Development West Group
Stefan Riezler Institut fur Maschinelle Sprachverarbeitung Universitat Stuttgart
Ya'acov Ritov Statistics The Hebrew University of Jerusalem
Roni Rosenfeld Computer Science Carnegie Mellon University
Judith Schlesinger   IDA/Center for Computing Sciences
Michael Schonwetter    
Andreas Stolcke   SRI International
Paul Thompson   West Group
Jean-Philippe Vert DMI-LMENS Ecole Normale Superieure
Shaojun Wang CALD Carnegie Mellon University
Ken Williams   Un of MN


Talk Abstracts

2000-2001 Program: Mathematics in Multimedia

Go