HOME    »    SCIENTIFIC RESOURCES    »    Volumes
Abstracts and Talk Materials
Introduction to Statistics and Probability for Topologists
October 2-4, 2013

Robert Adler

Probability and Stochastic Processes for Persistent Homologists

Keywords of the presentation: Limit theorem of probability, Point processes, Gaussian random fields

My first working assumption for this tutorial will be that the participants know something about persistent homology (PH). In particular, they know that PH is associated with filtrations, the two most common being the filtrations of upper or lower level sets of smooth functions and filtrations arising from building simplicial (Rips, Cech) complexes over point cloud data.

My second assumption is that the participants would also like to understand what happens in situations in which the underlying structure, whether it be a function or a point cloud, is random. (The motivation for such a desire should come from the realisation that data analysed by TDA is typically sampled, and so random.)

Based on this, I plan to give a quick run through a number of basic topics in Probability and Stochastic Processes, all of which are necessary (and maybe even sufficient) for understanding most issues of randomness for TDA. The topics that I plan to cover will include:

1: Limit theorems in Probability (laws of large numbers and the central limit theorem) 2: Random walk and Brownian motion (and why they are not terribly relevant to TDA) 3: Poisson and other point processes (without which there is no way to understand point clouds) 4: Gaussian random fields (which provide a powerful model for random Morse functions)

The tutorial will be blackboard based and so topic-flexible. Thus participants should bring their favourite questions with them, or, better still, send them to me ahead of time.

Susan Holmes

Multivariate Statistics

Keywords of the presentation: Multivariate Statistics

Many modern data contain measurements on large numbers of variables, I will present an overview of methods used in standard nonparametric statistics, including preprocessing and graphical visualizations. All of these methods are available with the standard statistical language R which I will use to present the examples.

Marina Meila

Classic and Modern Data Clustering

Keywords of the presentation: clustering, clustering algorithms, foundations of clustering, spectral clustering vs manifold learning

Clustering, or finding groups in data, is as old as machine learning itself. However, as more people use clustering in a variety of settings, the last few years we have brought unprecedented developments in this field.

This tutorial will survey the most important clustering methods in use today from a unifying perspective, and will then present some of the current paradigms shifts in data clustering.

Sayan Mukherjee

Statistical Learning and Bayesian Inference and Topology

Keywords of the presentation: learning theory, Bayesian inference, toplogoical summaries, persistence homology

The research areas of statistical machine learning and Bayesian inference in the context of topological data analysis will be explained. We begin with an explanation of two foundational frameworks for inference: statistical machine leaning and Bayesian inference. We use a geometric problem, dimension reduction as the setting to provide a basis for these two frameworks.

We then describe the work in topological data analysis that falls under these settings. This includes learning the topology of a manifold or stratified space and probabilistic modeling with persistence diagrams.

We will also discuss the problem of manifold learning and the centrality of the Laplace-Beltrami operator. We will discuss generalizations of this problem that utilize the combinatorial Laplacian (Hodge operator) and the relation of this to clustering and random walks.

Alessandro Rinaldo

Statistical Inference for Topological Data Analysis

Keywords of the presentation: topological data analysis, statistical inference, minimality

Recent advances in computational geometry and computational topology have made it possible to compute topological invariants of sets and functions from sample points. These types of data summaries provide new tools for preprocessing, summarizing and visualizing complex and even high dimensional data. As a result, the number and the variety of applications of topological data-analytic methods have been growing rapidly. Despite such increase in popularity, the statistical properties and effectiveness of the methodologies of topological data analysis, as well as their potential for offering novel tools for statistical inference, remain largely unexplored.

This tutorial will provide a broad overview of frequentist statistical inference for researchers with background in algebraic topology and computational geometry. I will first describe the key inferential tasks of parameter estimation, hypothesis testing and uncertainty assessment by confidence sets. I will then cover more advanced topics in statistical inference, such as minimality and nonparametric statistics. Throughout the tutorial I will rely as much as possible on examples that are directly relevant to topological data analysis.

Jonathan Taylor

Topological Inference in fMRI / Dimension Reduction

Keywords of the presentation: random fields, fMRI, dimension reduction

In the first lecture, we will provide an overview of the various ways that topological information is used in signal detection problems in functional MRI (fMRI) and other imaging applications. The principal tool used involves computing the expected number of critical points of various types of a smooth random field under some predetermined null hypothesis. We will describe roughly how some of these calculations can be carried out using the so-called Gaussian Kinematic Formula.

In the second lecture, we will describe some typical dimension reduction tools used in statistics and machine learning. Not surprisingly, many of these techniques build on the SVD of some data-matrix. Topics covered will include (generalized) PCA, sparse PCA, some ICA and, time permitting, matrix completion.