HOME    »    PROGRAMS/ACTIVITIES    »    Annual Thematic Program
Talk Abstract
Prospecting for Novelty in Text Mining

Jaime Carbonell
Carnegie Mellon University

Jaime_Carbonell@lti.cs.cmu.edu

Discovery of novel relevant information in large, dynamically-updated document collection is a difficult challenge. Though previously neglected, novelty is perhaps as important as relevance in text mining and information retrieval, especially with the vast growth of on-line information stores, including the web itself. The presentation focuses on multiple manifestations of novelty, such as:

1) "Give me a summary of what's new in the Microsoft trial this week,"

2) "Alert me when there is a new astronomical discovery," or simply have an IR system that ranks documents for novelty and non-redundancy as well as relevance to query. Measures of of novelty discussed include Maximal Marginal Relevance in multi-document summarization, dissimilarity with history for new-event detection in newswire and broadcast streams, and new research on automatically detecting linguistic indicators for novelty in documents. This latter is based on recent results with statistical machine learning methods for genre classification, extended to first-report detection.


Back to Workshop Schedule

Back to IMA "HOT TOPICS" Workshop: Text Mining

Connect With Us:
Go