Large Sample Asymptotics of Graph-based Methods in Machine Learning: Mathematical Analysis and Implications

Monday, October 22, 2018 - 1:25pm - 2:25pm
Lind 305
Nicolas Garcia-Trillos (University of Wisconsin, Madison)
Many machine learning procedures aimed to extract information from data can be defined as precise mathematical objects that are constructed in terms of the data. It is often assumed that the data is “big” in complexity but also in quantity, and in this “large amount of data’’ setting, a basic mathematical concept that one can explore is that of closure of a given class of statistical procedures (i.e. what are the limiting procedures as the number of data points available goes to infinity.) In this talk, I will explore this notion in the context of graph-based methods. Examples of such methods include minimization of Cheeger cuts, spectral clustering, and graph-based bayesian semi-supervised learning, among others. I will introduce some of the mathematical ideas needed for the analysis, as well as show some of the implications of it: our results show statistical consistency of the methods, provide with quantitative information in the form of scaling of parameters and rates of convergence, imply qualitative properties at the discrete level, and suggest the use of appropriate algorithms.