October 2 - 4, 2013
Keywords of the presentation: Limit theorem of probability, Point processes, Gaussian random fields
My first working assumption for this tutorial will be that the participants know something about persistent homology (PH). In particular, they know that PH is associated with filtrations, the two most common being the filtrations of upper or lower level sets of smooth functions and filtrations arising from building simplicial (Rips, Cech) complexes over point cloud data.
My second assumption is that the participants would also like to understand what happens in situations in which the underlying structure, whether it be a function or a point cloud, is random. (The motivation for such a desire should come from the realisation that data analysed by TDA is typically sampled, and so random.)
Based on this, I plan to give a quick run through a number of basic topics in Probability and Stochastic Processes, all of which are necessary (and maybe even sufficient) for understanding most issues of randomness for TDA. The topics that I plan to cover will include:
1: Limit theorems in Probability (laws of large numbers and the central limit theorem)
2: Random walk and Brownian motion (and why they are not terribly relevant to TDA)
3: Poisson and other point processes (without which there is no way to understand point clouds)
4: Gaussian random fields (which provide a powerful model for random Morse functions)
The tutorial will be blackboard based and so topic-flexible. Thus participants should bring their favourite questions with them, or, better still, send them to me ahead of time.
Keywords of the presentation: Multivariate Statistics
Many modern data contain measurements on large numbers of variables, I will present an overview of methods used in standard nonparametric statistics, including preprocessing and graphical visualizations.
All of these methods are available with the standard statistical language R which I will use to
present the examples.
Keywords of the presentation: clustering, clustering algorithms, foundations of clustering, spectral clustering vs manifold learning
Clustering, or finding groups in data, is as old as machine learning
itself. However, as more people use clustering in a variety of
settings, the last few years we have brought unprecedented
developments in this field.
This tutorial will survey the most important clustering methods in use
today from a unifying perspective, and will then present some of the
current paradigms shifts in data clustering.
Keywords of the presentation: learning theory, Bayesian inference, toplogoical summaries, persistence homology
The research areas of statistical machine learning and Bayesian inference in the context of topological data analysis will be explained. We begin with an explanation of two foundational frameworks for inference: statistical machine leaning and Bayesian inference. We use a geometric problem, dimension reduction
as the setting to provide a basis for these two frameworks.
We then describe the work in topological data analysis that falls under these settings. This includes learning the topology of a manifold or stratified space and probabilistic modeling with persistence diagrams.
We will also discuss the problem of manifold learning and the centrality of the Laplace-Beltrami operator. We will discuss generalizations of this problem that utilize the combinatorial Laplacian (Hodge operator) and the relation of this to clustering and random walks.
Keywords of the presentation: topological data analysis, statistical inference, minimality
Recent advances in computational geometry and computational topology have made it possible to compute topological invariants of sets and functions from sample points. These types of data summaries provide new tools for preprocessing, summarizing and visualizing complex and even high dimensional data. As a result, the number and the variety of applications of topological data-analytic methods have been growing rapidly.
Despite such increase in popularity, the statistical properties and effectiveness of the methodologies of topological data analysis, as well as their potential for offering novel tools for statistical inference, remain largely unexplored.
This tutorial will provide a broad overview of frequentist statistical inference for researchers with background in algebraic topology and computational geometry. I will first describe the key inferential tasks of parameter estimation, hypothesis testing and uncertainty assessment by confidence sets. I will then cover more advanced topics in statistical inference, such as minimality and nonparametric statistics. Throughout the tutorial I will rely as much as possible on examples that are directly relevant to topological data analysis.
Keywords of the presentation: random fields, fMRI, dimension reduction
In the first lecture, we will provide an overview of the various ways that topological information
is used in signal detection problems in functional MRI (fMRI) and other
imaging applications. The principal tool used involves computing the expected
number of critical points of various types of a smooth random field under
some predetermined null hypothesis. We will describe roughly
how some of these calculations can be carried out
using the so-called Gaussian Kinematic Formula.
In the second lecture, we will describe some typical dimension reduction
tools used in statistics and machine learning. Not surprisingly, many of these techniques
build on the SVD of some data-matrix. Topics covered will include
(generalized) PCA, sparse PCA, some ICA and, time permitting, matrix completion.