HOME    »    PROGRAMS/ACTIVITIES    »    Annual Thematic Program
Talk Abstract
Document Clustering: Is Hierarchical Clustering Really Better?

Michael Steinbach
University of Minnesota
steinbac@cs.umn.edu

In this presentation we compare the two main approaches to document clustering: agglomerative hierarchical clustering and K-means. Hierarchical clustering is often portrayed as the better quality clustering approach, but is limited because of its quadratic time complexity. In contrast, K-means has a time complexity which is linear in the number of documents, but is thought to produce inferior clusters. Sometimes the two approaches are combined so as to "get the best of both worlds." This paper presents research that disputes the view that hierarchical clustering produces better quality document clusters than K-means or its variants. In particular, we give results which show that a simple and efficient "bisecting" K-means technique is as good or better than the pure or hybrid hierarchical approaches we tested. While our results are limited in scope, we believe that researchers interested in document clustering will find them quite interesting.


Back to Workshop Schedule

Back to Reactive Flow and Transport Phenomena

Back to IMA "HOT TOPICS" Workshops

Connect With Us:
Go