Clustering High-dimensional Data with Path Metrics: A Balance of Density and Geometry

Tuesday, October 27, 2020 - 1:25pm - 2:25pm
Anna Little (The University of Utah)

This talk discusses multiple methods for clustering high-dimensional data, and explores the delicate balance between utilizing data density and data geometry. I will first present path-based spectral clustering, a novel approach which combines a density-based metric with graph-based clustering. This density-based path metric allows for fast algorithms and strong theoretical guarantees when clusters concentrate around low-dimensional sets. However, the method suffers from a loss of geometric information, information which is preserved by simple linear dimension reduction methods such as classic multidimensional scaling (CMDS). The second part of the talk will explore when CMDS followed by a simple clustering algorithm can exactly recover all cluster labels with high probability. However, scaling conditions become increasingly restrictive as the ambient dimension increases, and the method will fail for irregularly shaped clusters. Finally, I will discuss how a more general family of path metrics, combined with MDS, give low-dimensional embeddings which respect both data density and data geometry. This new method exhibits promising performance on single cell RNA sequence data and can be computed efficiently by restriction to a sparse graph.

Anna Little received her PhD from Duke University in 2011, where she worked under Mauro Maggioni to develop a new multiscale method for estimating the intrinsic dimension of a data set. From 2012-2017 she was an Assistant Professor of Mathematics at Jacksonville University, a primarily undergraduate liberal arts institution where in addition to teaching and research she served as a statistical consultant. In 2018 she began a research postdoc in the Department of Computational Mathematics, Science, and Engineering at Michigan State University, where she worked with Yuying Xie and Matthew Hirn on statistical and geometric analysis of high-dimensional data. She recently accepted a tenure-track position in the Department of Mathematics at the University of Utah, which will begin in January 2021.