HOME    »    SCIENTIFIC RESOURCES    »    Volumes
SCIENTIFIC RESOURCES

Abstracts and Talk Materials
High Dimensional Phenomena
September 26 - 30, 2011

We propose new convex relaxations for the maximum clique, maximum edge biclique, and clique partitioning problems. These relaxations are based on the observation that there is a one-to-one correspondence between cliques or bicliques in the input graph and rank-one blocks in its adjacency matrix. These problems may be formulated as rank minimization or rank-constrained optimization problems, which are subsequently relaxed to convex programs using the nuclear norm. In the case that the input graph contains an instance of the desired structure plus diversionary edges and nodes,our relaxation is {bf exact}: solving the relaxation recovers the optimal solution for the original hard combinatorial problem. For each problem, we provide theoretical bounds on the size of the planted subgraph and amount of noise that our algorithm can tolerate and still recover the exact solution. Our heuristic is applicable to any input graph, and we provide numerical examples that empirically demonstrate the effectiveness of our algorithm for graphs drawn from clustering and image segmentation applications.

Keywords of the presentation: Convex optimization, submodular functions

Sparse methods for supervised learning aim at ﬁnding good linear predictors from as few variables as possible, i.e., with small cardinality of their supports. This combinatorial selection problem is often turned into a convex optimization problem by replacing the cardinality function by its convex envelope (tightest convex lower bound), in this case the L1-norm. In this work, we investigate more general set-functions than the cardinality, that may incorporate prior knowledge or structural constraints which are common in many applications: namely, we show that for nondecreasing submodular set-functions, the corresponding convex envelope can be obtained from its Lovasz extension, a common tool in submodular analysis. This deﬁnes a family of polyhedral norms, for which we provide generic algorithmic tools (subgradients and proximal operators) and theoretical results (conditions for support recovery or high-dimensional inference). By selecting speciﬁc submodular functions, we can give a new interpretation to known norms, such as those based on rank-statistics or grouped norms with potentially overlapping groups; we also deﬁne new norms, in particular ones that can be used as non-factorial priors for supervised learning.

Low-dimensional linear subspace approximations to high-dimensional data are a common approach to handling problems of estimation, detection and prediction, with applications such as network monitoring, collaborative filtering, object tracking in computer vision, and environmental sensing. Corrupt and missing data are the norm in many high-dimensional situations, not only because of errors and failures, but because it may be impossible to collect all the interesting measurements or impractical to compute in real-time using all the data available. Recently developed algorithms for "Matrix Completion" and "Robust PCA" have offered tools to find low-dimensional approximations to data even when data are missing and corrupt. However, these algorithms operate on all the available data at once and assume that the underlying low-dimensional subspace remains constant. This poster describes two alternatives, a subspace tracking algorithm called GROUSE (Grassmannian Rank-one Update Subspace Estimation) and its robust counterpart GRASTA (Grassmannian Robust Adaptive Subspace Tracking Algorithm). Both are incremental algorithms that process one incomplete and/or corrupted measurement vector at a time. Thus GROUSE and GRASTA operate with considerably less computation than the other algorithms, while at the same time allowing flexibility for real-time applications such as tracking and anomaly detection. I will present these algorithms and show their application to subspace tracking, matrix completion, robust PCA, and background and foreground separation in video surveillance.

Entropy and Convex Geometry in High Dimensions
December 31, 1969

We develop an information-theoretic perspective on some questions in high-dimensional convex geometry. First, we describe a new equipartition property for log-concave probability measures, which is a precise formulation of the intuition that log-concave distributions are effectively uniformly distributed on compact sets. Second, we develop some Gaussian comparison results for log-concave measures, including some entropic formulations of the hyperplane conjecture. Third, we establish a new reverse entropy power inequality for convex measures analogous to V. Milman's reverse Brunn-Minkowski inequality.

Keywords of the presentation: spectral algorithms, CCA, graphical models, latent trees

This talk considers the problem of learning the structure of a broad class of multivariate latent variable tree models, which include a variety of continuous and discrete models (including the widely used linear-Gaussian models, hidden Markov models, and Markov evolutionary trees). The setting is one where we only have samples from certain observed variables in the tree and our goal is to estimate the tree structure (emph{i.e.}, the graph of how the underlying hidden variables are connected to the observed variables). We provide the Spectral Recursive Grouping algorithm, an efficient and simple bottom-up procedure for recovering the tree structure from independent samples of the observed variables. Our finite sample size bounds for exact recovery of the tree structure elucidate certain natural dependencies on underlying statistical and structural properties of the underlying joint distribution. Furthermore, our sample complexity guarantees have no explicit dependence on the dimensionality of the observed variables, making the algorithm applicable to many high-dimensional settings. At the heart of our algorithm is a spectral quartet test for determining the relative topology of a quartet of variables, which only utilizes certain second order statistics and is based on the determinants of certain cross-covariance matrices.

This talk is based on joint work with A. Anandkumar, D. Hsu, S. Kakade, L. Song and T. Zhang.

Multiscale Geometric and Spectral Analysis of Plane Arrangements
December 31, 1969

Modeling high dimensional data by multiple low-dimensional planes is an important problem in many applications such as computer vision and pattern recognition. In the most general setting where only coordinates of the data are given, the problem asks to determine the optimal model parameters, estimate the model planes, and cluster the data accordingly. Though many algorithms have been proposed, most of them need to assume prior knowledge of the model parameters and thus address only the last two components of the problem. In this paper we propose an accurate and efficient algorithm based on multiscale SVD analysis and spectral methods to tackle the problem.

Keywords of the presentation: Harmonic Analysis , dimensional reduction , machine learning

We describe a mathematical framework for dimensional reduction, learning and organization of databases . The database could be a matrix of a linear transformation for which the goal is to reorganize the matrix so as to achieve compression and fast algorithms. Or the database could be a collection of documents and their vocabulary, an array of sensor measurements such as EEG , or financial a time series or segments of recorded music. If we view the database as a questionnaire , we organize the population into a contextual demographic diffusion geometry and the questions into a conceptual geometry, this is an iterative process in which each organization informs the other, with the goal of entropy reduction of the whole data base.

This organization being totally data agnostic applies to the other examples thereby generating automatically a data driven conceptual /contextual pairing.We will describe the basic underlying tools from Harmonic Analysis for measuring success in extracting structure , tools which enable functional regression prediction and basically signal processing methodologies.

Keywords of the presentation: Semidefinite programming.

This tutorial will briefly cover recent developments in semidefinite programming and some of their geometrical applications.

The Delsarte-Goethals frame has been proposed for deterministic compressive sensing of sparse and compressible signals. Its performance in compressive imaging applications falls short of that obtained for arbitrary sparse vectors. Prior work has proposed specially tailored signal recovery algorithms that partition the recovery of the input vector into clustered and unclustered portions. We present a formal analysis of the Delsarte-Goethals frame that shows that its hampered performance in compressive imaging is due to the presence of clustered sparse vectors in its null space. Such a clustered structure is characteristic in commonly-used raster scanning of 2-D wavelet representations. We also show that a simple randomized raster scanning of the wavelet coefficient vector can remedy these shortcomings, allowing the use of standard recovery algorithms. Additionally, we propose an alternative deterministic raster scanning that achieves similar recovery performance. Experimental results confirm the improvements in recovery performance for both the noiseless and noisy measurement regimes.

This is joint work with Sina Jafarpour and Robert Calderbank.

We consider a deviation from the mean in the problem begin{equation} D(x)=sup_{fin F}mu{f-E_{mu} fgeq x},qquad text{for} xinRlabel{abstr} end{equation} for the class $F$ of all integrable, $fin L_1(V,d,mu)$, Lipschitz functions on a probability metric space $(V,d,mu)$. Consider the following closely related isoperimetric problem. Given $t>0$, begin{equation} text{minimize} mu(A^h) text{over all} Asubset V text{with} mu(A)geq t, label{absext} end{equation} where $A^h={uin V: d(u,A)leq h}$ is an $h$-neighborhood of $A$. We solve the problem $eqref{abstr}$ for spaces $(V,d,mu)$ such that for every $t geq 0$ there exists a solution of $eqref{absext}$ which does not depend on $h$. As corollaries we get exact solutions of $eqref{abstr}$ for Euclidean unit sphere $S^{n-1}$ with a geodesic distance and a normalized Haar measure, for $R^n$ equipped with a Gaussian measure and for $n$-dimensional cube, rectangle, powers of tori, Diamond graph equipped with uniform measure and Hamming distance. We also obtain a general result for probability metric spaces. Let $F^{ast}$ be a family of distance functions $f(u)=-d(A,u)$, with $Asubset V$ measurable. For all probability metric spaces we prove that $$sup_{finF}mu{f-E_{mu} fgeq x}=sup_{finF^{ast}}mu{f-E_{mu} fgeq x},qquad xinR.$$

Matched Filtering from Limited Frequency Samples
December 31, 1969

In this paper, we study a simple correlation-based strategy for estimating the unknown delay and amplitude of a signal based on a small number of noisy, randomly chosen frequency-domain samples. We model the output of this "compressive matched filter" as a random process whose mean equals the scaled, shifted autocorrelation function of the template signal. Using tools from the theory of empirical processes, we prove that the expected maximum deviation of this process from its mean decreases sharply as the number of measurements increases, and we also derive a probabilistic tail bound on the maximum deviation. Putting all of this together, we bound the minimum number of measurements required to guarantee that the empirical maximum of this random process occurs sufficiently close to the true peak of its mean function. We conclude that for broad classes of signals, this compressive matched filter will successfully estimate the unknown delay (with high probability, and within a prescribed tolerance) using a number of random frequency-domain samples that scales inversely with the signal-to-noise ratio and only logarithmically in the in the observation bandwidth and the possible range of delays.

The theory of Distilled Sensing (DS) establishes that sequential adaptive sensing provides significant quantifiable improvements in effective signal-to-noise ratio (relative to non-adaptive sensing) in noisy high dimensional testing problems. The DS idea was recently extended to highly undersampled settings typical in compressed sensing (CS), and it was shown that a Compressive Distilled Sensing (CDS) method can produce similar performance gains in noisy CS problems. However, the results obtained in the initial work on CDS were restricted to settings where the unknown sparse signals of interest have limited dynamic range (on the order of a constant). In this work we describe a new adaptive compressive sensing approach which utilizes sequentially designed sparse measurement matrices. Our analysis of this approach shows that the quantifiable benefits of adaptivity in noisy CS problems can be achieved without restrictive conditions on the dynamic range of the signal being acquired.

This is joint work with Rich Baraniuk (Rice University), Rui Castro (Eindhoven University of Technology), and Rob Nowak (University of Wisconsin.

Entropy estimation is a common approach to empirically determining the spread of non-spherical distributions and has been applied to areas such as anomaly detection, intrinsic dimension estimation, information fusion, and structure estimation in factor graphs. k-NN graph estimators are a popular choice for estimating entropy with desirable properties including consistency and ease of implementation. However, in the high dimensional setting, the bias of k-NN graph estimators dominates the mean square error (MSE) and decays very slowly in the sample size T at a rate of order T^(-1/d) where d is the dimension. We present a weighted k-NN estimator that improves the MSE convergence rate to T^(-1) independent of dimension. We illustrate the weighted estimator for an anomaly detection problem in wireless sensor networks.

Keywords of the presentation: dependency graphs, sparse correlation models, hubs of high vertex degree, false discovery error control, local Bhattacharrya divergence

Random matrices are measured in many areas of engineering, social science, and natural science. When the rows of the matrix are random samples of a vector of dependent variables the sample correlations between the columns of the matrix specify a correlation graph that can be used to explore the dependency structure. However, as the number of variables increases screening for highly correlated variables becomes futile due to a high dimensional phase transition phenomenon: with high probability most variables will have large sample correlations even when there is no correlation present. We will present a general theory for predicting the phase transition and provide Poisson limit theorems that can be used to predict finite sample behavior of the correlation graph. We apply our framework to the problem of hub discovery in correlation and partial correlation (concentration) graphs. We illustrate our correlation screening framework in computational biology and, in particular, for discovery of influential variables in gene regulation networks. This is joint work with Bala Rajaratnam at Stanford University.

On the Power of Adaptivity in Sparse Recovery
September 27, 2011

Keywords of the presentation: sparse recovery, compressive sensing, adaptivity

The goal of sparse recovery is to recover a sparse (with few non-zeros) approximation of data from the available information. We consider the problem of recovering the (approximately) best k-sparse approximation x* of an n-dimensional vector x from linear measurements of x. It is known that this task can be accomplished using m=O(k log (n/k)) *non-adaptive* measurements and that this bound is tight.

In this talk we show that if one is allowed to perform measurements that are adaptive, then the number of measurements can be considerably (sometimes exponentially) reduced, compared to the non-adaptive case. We also discuss implications of our results to data stream computing and compressive sensing.

High Dimensional Data Re-parameterization
December 31, 1969

We present an efficient method for computing an extendable re-parameterization of high dimensional stochastic data sets generated by nonlinear mixing of independent physical parameters. Our method relies on the spectral decomposition of a particular diffusion kernel constructed from observed data. According to Sturm-Liouville oscillation theory, a subset of the kernel eigenvectors are monotonic functions of the original parameters. Thus, a re-parameterization via these eigenvectors provides an inverse map back to the space of physically meaningful parameters. We further show how the inverse map can be efficiently extended to newly observed data, and how the extension enhances the performance of data classification.

We demonstrate the use of our method for solving empirically inverse problems such as the reconstruction of geological formations from electro-magnetic measurements, and source localization in Acoustics.

The least absolute shrinkage and selection operator (Lasso) for linear regression exploits the geometric interplay of the l2-data error objective and the l1-norm constraint to arbitrarily select sparse models. Guiding this uninformed selection process with sparsity models has been precisely the center of attention over the last decade in order to improve learning performance. To this end, we alter the selection process of Lasso in this paper to explicitly leverage combinatorial sparsity models (CSMs) via the combinatorial selection and least absolute shrinkage (CLASH) algorithm. A highlight is the introduction of a new algorithmic definition of CSMs, which we dub as the Polynomial time Modular ε-Approximation Property (PMAPε). PMAPε enables us to determine the impact of approximate combinatorial projections within CLASH. We then provide concrete guidelines how to leverage sets with PMAPε within CLASH, and characterize CLASH's estimation guarantees as a function of ε as well as the set restricted isometry constants of the regression matrix. Finally, we present experimental results using both simulated and real world data to demonstrate the effectiveness of CLASH.

Small Value Probability and Metric Entropy
September 30, 2011

Keywords of the presentation: Small Value Probability, Metric Entropy, Rare Events, Gaussian Measures

Small value probabilities or small deviations study the decay probability that positive random variables behave near zero. In particular, small ball probabilities provide the asymptotic behavior of the probability measure inside a ball as the radius of the ball tends to zero. Metric entropy is defined as the logarithmic of the minimum covering number of compact set by balls of very small radius. In this talk, we will provide an overview on precise connections between the small value probability of Gaussian process/measure and the metric entropy of the associated compact operator. Interplays and applications to many related problems/areas will be given, along with various fundamental tools and techniques from high dimensional probability theory. Throughout, we use Brownian motion (integral operator) and Brownian sheets (tensored Brownian motion and operator) as illustrating examples.

The theory of Compressive Sensing (CS) has enabled the efficient acquisition of high-bandwidth (but sparse) signals via nonuniform low-rate sampling protocols. While most work in CS has focused on reconstructing the high-bandwidth signals from nonuniform low-rate samples, in this work, we consider the task of inferring the modulation of a communications signal directly in the compressed domain, without requiring signal reconstruction. We show that the Nth power nonlinear features used for Automatic Modulation Recognition (AMR) are compressible in the Fourier domain, and hence, that AMR of M-ary Phase-Shift-Keying (MPSK) modulated signals is possible by applying the same nonlinear transformation on nonuniform compressive samples. We provide analytical support for the accurate approximation of AMR features from nonuniform samples, present practical rules for classification of modulation type using these samples, and validate our proposed rules on simulated data.

Alternating Direction Methods for Sparse and Low-Rank Optimization
December 31, 1969

We show how alternating direction methods are used recently in sparse and low-rank optimization problems. We also show the iteration complexity bounds for two specific types of alternating direction methods, namely alternating linearizaton and multiple splitting methods. We prove that the basic versions of both methods require $O(1/eps)$ iterations to obtain an eps-optimal solution, and the accelerated versions of these methods require $O(1/sqrt{eps})$ iterations, while the computational effort needed at each iteration is almost unchanged. Numerical results on problems with millions of variables and constraints arising from computer vision, machine learning and image processing are reported to demonstrate the efficiency of the proposed methods.

We discuss several geometric approaches to the study of data sets in high-dimensional spaces that are assumed to have low-intrinsci dimension. On the one hand, we discuss diffusion geometry type of approaches, based on constructing proximity graphs between the data points in high dimensions, and using diffusion processes on such graphs to construct coordinates for the data and perform learning tasks. On the other hand, we discuss novel approaches based on multiscale geometric analysis, based on studying the behavior of local covariance matrices of the data at different scales. We apply this latter approach to intrinsic dimension estimation, multiscale dictionary learning, and density estimation.

Multiscale Geometric Dictionaries for Point-cloud Data
December 31, 1969

We develop a novel geometric multiresolution analysis for analyzing intrinsically low dimensional point clouds in high-dimensional spaces, modeled as samples from a d-dimensional set M (in particular, a manifold) embedded in D-dimensional Euclidean space, in the regime d<<D. This type of situation has been recognized as important in various applications, such as the analysis of sounds, images, and gene arrays. In this paper we construct data-dependent multiscale dictionaries that aim at efficient encoding and manipulating of the data. Unlike existing constructions, our construction is fast, and so are the algorithms that map data points to dictionary coefficients and vice versa. In addition, data points have a guaranteed sparsity in terms of the dictionary.

We consider supervised learning problems where the features are embedded in a directed graph, and whose solutions are known to be a priori sparse. We address the task of selecting variables corresponding to connected components of the graph, and leverage to that effect recent work on structured sparsity. We introduce convex and non-convex regularization functions achieving this goal, which involve an exponential number of groups of variables, and show that when the graph is acyclic the combinatorial problems related to these formulations can be solved with network flow optimization techniques. We present experiments on image and genomic data, demonstrating the benefits of these penalties over existing ones as well as the scalability of our approach for prediction tasks.

This is a joint work with Bin Yu (UC Berkeley).

The information content of many phenomena of practical interest is often much less than what is suggested by their actual size. As an inspiring example, one active research area in biology is to understand the relations between the genes. While the number of genes in a so-called gene network can be large, the number of contributing genes to each given gene in the network is usually small compared to the size of the network. In other words, the behavior of each gene can be expressed as a sparse combination of other genes. The purpose of this work is to develop new theory and algorithms for exploiting this type of simplicity in the analysis of high-dimensional dynamical systems with a particular focus on system identification and estimation.

We look at high-dimensional but sparse systems, time varying systems with few piecewise-constant parameter changes, and large-scale interconnected dynamical systems when the graph has a sparse flow. Inspired by the field of Compressive Sensing (CS), we call these problems Compressive System Identification (CSI), Compressive Topology Identification (CTI), etc.

Active Clustering and Ranking
September 26, 2011

Keywords of the presentation: multiple testing, sparse recovery, sequential testing

Clustering and ranking is often based on pairwise similarities (metric data) or comparisons (ordinal data). Most methods assume that the entire collection of all possible pairwise similarities or comparisons are known, but in high-dimensional settings there may be missing data and/or the costs of collecting this information may be prohibitive. The existence of clusters or other intrinsic low-dimensional structure can induce redundancy in these data, and therefore it may be possible to robustly cluster or rank based on a small subset of the data. I will discuss two results that demonstrate this phenomenom. First, I will describe a clustering procedure that generates a hierarchical clustering of N objects using only N log N similarities, instead of all N(N-1)/2 similarities. The method can recover all clusters of size larger than log N, even in the presence of a limited fraction of arbitrarily corrupted or noisy similarities. Second, if N objects are embedded in d-dimensions (d

Keywords of the presentation: low-rank matrix approximation, subset selection

I will discuss recent algorithmic developments for the classical problem of approximating a given matrix by a low-rank matrix. This is motivated by the need of faster algorithms for very large data and certain applications that want the approximating matrix to have rows living in the span of only a few rows of the original matrix, which adds a combinatorial twist to the problem. The novel algorithms are based on sampling rows randomly (but non-uniformly) and random projection, from which a low rank approximation can be computed.

Keywords of the presentation: Eigenvalues, sample covariance matrix, population matrix

I will begin by reviewing limiting properties of the eigenvalues of a class of sample covariance matrices, where the vector dimension and the sample size approach infinity, their ratio approaching a positive constant. These properties are relevant in situations in multivariate analysis where the vector dimension is large, but the number of samples needed to adequately approximate the population matrix (as prescribed in standard statistical procedures) cannot be attained. Work has been done in estimating the population eigenvalues from those of the sample covariance matrix. I will introduce a method devised by X. Mestre, and will present an extension of his method to another ensemble of random matrices important in wireless communications.

Despite the empirical success of spectral clustering, its performance under noise and in high dimensions is not well understood. We provide a precise characterization of the misclustering rate of spectral clustering under noisy similarities. Using minimax theory, we establish information theoretic lower bounds on the amount of noise any clustering algorithm can tolerate and demonstrate that spectral clustering is near-optimal. The gap relative to the optimal performance is the statistical price for computational efficiency gained by solving a relaxation of the minimum ratio-cut problem. Additionally, in high dimensional settings, the clusters can be very small and are often characterized by only a few relevant features. To enable simultaneous extraction of such a cluster and the relevant features, we consider a sparsity penalized spectral biclustering method and compare its performance to minimax limits.

Almost any kind of experimental sciences requires the analysis of the data. Depending on the specific application, the collection of data may consist of measurements, signals, or images. In mathematical framework, all of those objects are functions. One way to analyze them is by representing into a wavelet decomposition. Wavelets provide reconstruction (approximation) of the original function (the collection of data). In order to characterize the approximation class, one has to establish Bernstein type inequalitiy which helps to investigate the magnitude of the coefficients in a wavelet decomposition of the function. The coefficients tell in what way the analyzing function needs to be modified in order to reconstruct the data. We analyze wavelet coefficients for both the family of Daubechies orthonormal wavelets and the family of semiorthogonal spline wavelets in L_p space. In particular, Bernstein type inequalities associated with wavelets are presented. We obtain a sharp estimate of Bernstein type inequality for splines. We also study the asymptotic behavior of wavelet coefficients for both families mentioned above. Comparison of these two families is done by using a quantity C_{k,p} related to the wavelet generators. In addition we determine the exact rate of decay of the m-th order B-spline wavelet and thereby describe the asymptotic behavior of the wavelet. Finally we will use splines in order to get the asymptotically best constant in Ball's integral inequality.

Keywords of the presentation: Random quantum states, entanglement, partial transpose, Wishart ensemble, K-convexity

We study generic properties of high-dimensional quantum states. Specifically, for a random state on H=C^d otimes C^d obtained by partial tracing a random pure state on H otimes C^s, we consider the problem whether it is typically separable or typically entangled. We show that a threshold occurs when the ancilla dimension s is of order roughly d^3. Our approach allows to similarly analyze other properties such as for example positive partial transpose (PPT). Mathematically, each problem reduces to studying properties (albeit somewhat exotic) of high-dimensional complex Wishart matrices. The arguments rely on high-dimensional probability, classical convexity, random matrices, and geometry of Banach spaces. Based on joint work with G. Aubrun and D. Ye.

Computable performance bounds on sparsity recovery
December 31, 1969

In this work, we develop verifiable and computable performance bounds on sparsity recovery. We define a family of goodness measures for arbitrary sensing matrices as a set of optimization problems, and design algorithms based on fixed point theory to compute these goodness measures. The proposed algorithms solve a series of second-order cone programs, or linear programs. As a by-product, we implement an efficient algorithm to verify a sufficient condition for exact sparsity recovery in the noise-free case. We derive performance bounds on the recovery errors in terms of these goodness measures. We also analytically demonstrate that the developed goodness measures are non-degenerate for a large class of random sensing matrices, as long as the number of measurements is relatively large. Numerical experiments show that, compared with the restricted isometry based performance bounds, our error bounds apply to a wider range of problems and are tighter, when the sparsity levels of the signals are relatively low.

Keywords of the presentation: Greedy approximation, Compressed sensing, relaxation, thresholding

While the ℓ1 minimization technique plays an important role in designing computationally tractable recovery methods in compressed sensing, its complexity is still impractical for many applications. An attractive alternative to the ℓ1 minimization is a family of greedy algorithms. We will discuss several greedy algorithms from the point of view of their practical applicability and theoretical performance.

Keywords of the presentation: random matrix, large deviations

We introduce a new methodology for studying the maximum eigenvalue of a sum of independent, symmetric random matrices. This approach results in a complete set of extensions to the classical tail bounds associated with the names Azuma, Bennett, Bernstein, Chernoff, Freedman, Hoeffding, and McDiarmid. Results for rectangular random matrices follow as a corollary. This research is inspired by the work of Ahlswede--Winter and Rudelson--Vershynin, but the new methods yield essential improvements over earlier results. We believe that these techniques have the potential to simplify the study a large class of random matrices.

Graphical models compactly capture stochastic dependencies amongst a collection of random variables using a graph. Inference over graphical models corresponds to finding marginal probability distributions given joint probability distributions. Several inference algorithms rely on iterative message passing between nodes. Although these algorithms can be generalized so that the message passing occurs between clusters of nodes, there are limited frameworks for finding such clusters. Moreover, current frameworks rely on finding clusters that are overlapping. This increases the computational complexity of finding clusters since the edges over a graph with overlapping clusters must be chosen carefully to avoid inconsistencies in the marginal distribution computations. In this paper, we propose a framework for finding clusters in a graph for generalized inference so that the clusters are emph{non-overlapping}. Given an undirected graph, we first derive a linear time algorithm for constructing a block-tree, a tree-structured graph over a set of non-overlapping clusters. We show how the belief propagation (BP) algorithm can be applied to block-trees to get exact inference algorithms. We then show how larger clusters in a block-tree can be efficiently split into smaller clusters so that the resulting graph over the smaller clusters, which we call a block-graph, has lower number of cycles than the original graph. We show how loopy BP (LBP) can be applied to block-graphs for approximate inference algorithms. Numerical simulations show the benefits of running LBP on block-graphs as opposed to running LBP on the original graph. Our proposed framework for generalizing BP and LBP can be applied to other inference algorithms.

Keywords of the presentation: M-ellipsoid, shortest vector, integer programming

The M-ellipsoid of a convex body K is a fundamental object introduced by V. Milman. It has the same volume as K and can cover K with the union a small number of translations. There are several proofs of the existence of an M-ellipsoid. We present algorithms for constructing M-ellipsoids based on two of these proofs. The first is randomized and is via a proof by Klartag giving optimal covering bounds. The second, based on a proof by Pisier, achieves slightly weaker bounds but leads to a deterministic algorithm.

We use M-ellipsoids in a new algorithm for enumerating lattice points in a convex body. This has several consequences, including more efficient algorithms for (a) the shortest vector problem in any semi-norm (distance induced by a convex body) (b) integer programming (c) deterministic approximate volume computation.

This is joint work with Daniel Dadush and Chris Peikert.

Computing the first few singular vectors of a large matrix is a problem that frequently comes up in statistics and numerical analysis. Given the presence of noise, exact calculation is hard to achieve, and the following problem is of importance:

How much does a small perturbation to the matrix change the singular vectors ?

Answering this question, classical theorems, such as those of Davis-Kahan and Wedin, give tight estimates for the worst-case scenario. In this paper, we show that if the perturbation (noise) is random and our matrix has low rank, then better estimates can be obtained. Our method relies on high dimensional geometry and is different from those used an earlier studies.

Keywords of the presentation: HIgh-dimensional statistics; optimization theory.

Many methods for high-dimensional statistical problems are based on solving convex optimization problems that combine a loss function, measuring fit to the data, with a regularizer that encourages some type of low-dimensional'' structure. Examples include sparsity in vectors (Lasso), block and patterned (inverse) covariance matrices (graphical Lasso and variants), low-rank matrices, and structured models in non-parametric regression.

The literature on high-dimensional estimation has been growing at a rapid rate, and there are now a large number of results about various estimators and models. In this talk, we describe a general set of principles that can be used to derive bounds on the statistical error for a broad class of optimization-based estimators. Fast rates for high-dimensional problems are possible under two conditions: a decomposability property of the regularizer, a notion of restricted strong convexity that involves an interaction between the loss function and regularizer. We present a general theorem, and then illustrate how it can be used to derive various results, some known and others new, in a relatively straightforward way. We also discuss the consequences of these properties for guaranteeing fast convergence of simple gradient methods for solving the convex programs.

Some reference papers:

Statistical bounds: http://arxiv.org/abs/1010.2731v1 Fast optimization rates: http://arxiv.org/abs/1104.4824

Joint Denoising of a signal ensemble: the power of veto
December 31, 1969

This work presents a technique to denoise a signal ensemble by exploiting sparsity both at the inter and intra-signal level. The problem of signal denoising using thresholding estimators has received a lot of attention in the literature, starting in the 1990s when Donoho and Johnstone introduced the concept of wavelet shrinkage. In this approach, the signal is represented in a basis where it is sparse, and each noisy coefficient is thresholded by a parameter that depends on the noise level. We are extending this concept to a set of signals, under the assumption that the signals have a common support. Our approach is based on a vetoing mechanism, where in addition to thresholding, the inter-signal information is used to "save" a coefficient that otherwise would be "killed". Our method achieve a smaller risk than the independent denoising, and we quantify the expected value of this improvement. The results show a consistent improvement over the independent denoising, achieving results close to the ones produced by an oracle. We validate the technique using both synthetic and real world signals

Keywords of the presentation: spectral clustering, stochastic block model, social network

In recent years network analysis have become the focus of much research in many fields including biology, communication studies, economics, information science, organizational studies, and social psychology. Communities or clusters of highly connected actors form an essential feature in the structure of several empirical networks. Spectral clustering is a popular and computationally feasible method to discover these communities.

The Stochastic Block Model is a social network model with well defined communities. This talk will give conditions for spectral clustering to correctly estimate the community membership of nearly all nodes. These asymptotic results are the first clustering results that allow the number of clusters in the model to grow with the number of nodes, hence the name high-dimensional. Moreover, I will present on-going work on directed spectral clustering for networks whose edges are directed, including the enron data as an example.

Keywords of the presentation: Brown measure, single ring, perturbations

I will describe recent results (obtained jointly with A. Guionnet P. Wood) concerning perturbations of non-normal random matrices and their stabilization by additive noise. This builds on techniques introduced earlier in the context of the single ring theorem'', by Guionnet, Krishnapur, and the speaker.

Keywords of the presentation: Graphical model selection, Covariance estimation, thresholding

Undirected graphs are often used to describe high dimensional distributions. Under sparsity conditions, the graph can be estimated using ℓ1-penalization methods. This talk presents the following method. We combine a multiple regression approach with ideas of thresholding and refitting: first we infer a sparse undirected graphical model structure via thresholding of each among many ℓ1-norm penalized regression functions; we then estimate the covariance matrix and its inverse using the maximum likelihood estimator. Under suitable conditions, this approach yields consistent estimation in terms of graphical structure and fast convergence rates with respect to the operator and Frobenius norm for the covariance matrix and its inverse. We also derive an explicit bound for the Kullback Leibler divergence. Reference: http://arxiv.org/abs/1009.0530

 Connect With Us: Go
 © 2014 Regents of the University of Minnesota. All rights reserved. The University of Minnesota is an equal opportunity educator and employer Last modified on October 06, 2011 Twin Cities Campus:   Parking & Transportation   Maps & Directions Directories   Contact U of M   Privacy