September 26 - 30, 2011
We propose new convex
relaxations for the maximum clique, maximum edge biclique, and clique
partitioning problems. These relaxations are based on the observation
that there is a one-to-one correspondence between cliques or bicliques
in the input graph and rank-one blocks in its adjacency matrix. These
problems may be formulated as rank minimization or rank-constrained
optimization problems, which are subsequently relaxed to convex programs
using the nuclear norm. In the case that the input graph contains an
instance of the desired structure plus diversionary edges and nodes,our
relaxation is {bf exact}: solving the relaxation recovers the optimal
solution for the original hard combinatorial problem. For each problem,
we provide theoretical bounds on the size of the planted subgraph and
amount of noise that our algorithm can tolerate and still recover the
exact solution. Our heuristic is applicable to any input graph, and we
provide numerical examples that empirically demonstrate the
effectiveness of our algorithm for graphs drawn from clustering and
image segmentation applications.
Keywords of the presentation: Convex optimization, submodular functions
Sparse methods for supervised learning aim at ﬁnding good
linear predictors from as few variables as possible, i.e., with small
cardinality of their supports. This combinatorial selection problem is
often turned into a convex optimization problem by replacing the
cardinality function by its convex envelope (tightest convex lower bound),
in this case the L1-norm. In this work, we investigate more general set-functions than the cardinality, that may incorporate prior knowledge
or structural constraints which are common in many applications:
namely, we show that for nondecreasing submodular set-functions, the
corresponding convex envelope can be obtained from its Lovasz extension,
a common tool in submodular analysis. This deﬁnes a family of polyhedral
norms, for which we provide generic algorithmic tools (subgradients
and proximal operators) and theoretical results (conditions for support
recovery or high-dimensional inference). By selecting speciﬁc submodular
functions, we can give a new interpretation to known norms, such as those
based on rank-statistics or grouped norms with potentially overlapping
groups; we also deﬁne new norms, in particular ones that can be used
as non-factorial priors for supervised learning.
Low-dimensional linear subspace approximations to high-dimensional data are a common approach to handling problems of estimation, detection and prediction, with applications such as network monitoring, collaborative filtering, object tracking in computer vision, and environmental sensing. Corrupt and missing data are the norm in many high-dimensional situations, not only because of errors and failures, but because it may be impossible to collect all the interesting measurements or impractical to compute in real-time using all the data available. Recently developed algorithms for "Matrix Completion" and "Robust PCA" have offered tools to find low-dimensional approximations to data even when data are missing and corrupt. However, these algorithms operate on all the available data at once and assume that the underlying low-dimensional subspace remains constant. This poster describes two alternatives, a subspace tracking algorithm called GROUSE (Grassmannian Rank-one Update Subspace Estimation) and its robust counterpart GRASTA (Grassmannian Robust Adaptive Subspace Tracking Algorithm). Both are incremental algorithms that process one incomplete and/or corrupted measurement vector at a time. Thus GROUSE and GRASTA operate with considerably less computation than the other algorithms, while at the same time allowing flexibility for real-time applications such as tracking and anomaly detection. I will present these algorithms and show their application to subspace tracking, matrix completion, robust PCA, and background and foreground separation in video surveillance.
We develop an information-theoretic perspective on some questions in high-dimensional convex geometry. First, we describe a new equipartition property for log-concave probability measures, which is a precise formulation of the intuition that log-concave distributions are effectively uniformly distributed on compact sets. Second, we develop some Gaussian comparison results for log-concave measures, including some entropic formulations of the hyperplane conjecture. Third, we establish a new reverse entropy power inequality for convex measures analogous to V. Milman's reverse Brunn-Minkowski inequality.
Keywords of the presentation: spectral algorithms, CCA, graphical models, latent trees
This talk considers the problem of learning the structure of a broad
class of multivariate latent variable tree models, which include a
variety of continuous and discrete models (including the widely used
linear-Gaussian models, hidden Markov models, and Markov
evolutionary trees). The setting is one where we only have samples
from certain observed variables in the tree and our goal is to
estimate the tree structure (emph{i.e.}, the graph of how the underlying
hidden variables are connected to the observed variables). We
provide the Spectral Recursive Grouping algorithm, an efficient and
simple bottom-up procedure for recovering the tree structure from
independent samples of the observed variables. Our finite sample
size bounds for exact recovery of the tree structure elucidate
certain natural dependencies on underlying statistical and
structural properties of the underlying joint
distribution. Furthermore, our sample complexity guarantees have no
explicit dependence on the dimensionality of the observed variables,
making the algorithm applicable to many high-dimensional settings.
At the heart of our algorithm is a spectral quartet test for
determining the relative topology of a quartet of variables, which
only utilizes certain second order statistics and is based
on the determinants of certain cross-covariance matrices.
This talk is based on joint work with A. Anandkumar, D. Hsu, S. Kakade, L. Song and T. Zhang.
Modeling high dimensional data by multiple low-dimensional planes
is an important problem in many applications such
as computer vision and pattern recognition. In
the most general setting where only coordinates of
the data are given, the problem asks to determine
the optimal model parameters, estimate the model
planes, and cluster the data accordingly. Though
many algorithms have been proposed, most of them
need to assume prior knowledge of the model parameters
and thus address only the last two components of the problem.
In this paper we propose an accurate and efficient algorithm
based on multiscale SVD analysis and spectral methods to tackle
the problem.
Keywords of the presentation: Harmonic Analysis , dimensional reduction , machine learning
We describe a mathematical framework for dimensional reduction, learning and organization of databases . The database could be a matrix of a linear transformation for which the goal is to reorganize the matrix so as to achieve compression and fast algorithms. Or the database could be a collection of documents and their vocabulary, an array of sensor measurements such as EEG , or financial a time series or segments of recorded music. If we view the database as a questionnaire , we organize the population into a contextual demographic diffusion geometry and the questions into a conceptual geometry, this is an iterative process in which each organization informs the other, with the goal of entropy reduction of the whole data base.
This organization being totally data agnostic applies to the other examples thereby generating automatically a data driven conceptual /contextual pairing.We will describe the basic underlying tools from Harmonic Analysis for measuring success in extracting structure , tools which enable functional regression prediction and basically signal processing methodologies.
Keywords of the presentation: Semidefinite programming.
This tutorial will briefly cover recent developments in semidefinite programming and some of their geometrical applications.
The Delsarte-Goethals frame has been proposed for deterministic compressive sensing of sparse and compressible signals. Its performance in compressive imaging applications falls short of that obtained for arbitrary sparse vectors. Prior work has proposed specially tailored signal recovery algorithms that partition the recovery of the input vector into clustered and unclustered portions. We present a formal analysis of the Delsarte-Goethals frame that shows that its hampered performance in compressive imaging is due to the presence of clustered sparse vectors in its null space. Such a clustered structure is characteristic in commonly-used raster scanning of 2-D wavelet representations. We also show that a simple randomized raster scanning of the wavelet coefficient vector can remedy these shortcomings, allowing the use of standard recovery algorithms. Additionally, we propose an alternative deterministic raster scanning that achieves similar recovery performance. Experimental results confirm the improvements in recovery performance for both the noiseless and noisy measurement regimes.
This is joint work with Sina Jafarpour and Robert Calderbank.
We consider a deviation from the mean in the problem
begin{equation}
D(x)=sup_{fin F}mu{f-E_{mu} fgeq x},qquad text{for}
xinRlabel{abstr}
end{equation} for the class $F$ of all integrable, $fin L_1(V,d,mu)$,
Lipschitz functions on a probability metric space $(V,d,mu)$. Consider the
following closely related isoperimetric problem. Given $t>0$,
begin{equation}
text{minimize} mu(A^h) text{over all} Asubset V text{with} mu(A)geq
t, label{absext}
end{equation} where $A^h={uin V: d(u,A)leq h}$ is an $h$-neighborhood of
$A$. We solve the problem $eqref{abstr}$ for spaces $(V,d,mu)$ such that for
every $t geq 0$ there exists a solution of $eqref{absext}$ which
does not depend
on $h$.
As corollaries we get exact solutions of $eqref{abstr}$ for Euclidean unit
sphere $S^{n-1}$ with a geodesic distance and a normalized Haar measure, for
$R^n$ equipped with a Gaussian measure and for $n$-dimensional cube, rectangle,
powers of tori, Diamond graph equipped with uniform measure and Hamming
distance. We also obtain a general result for probability metric spaces. Let
$F^{ast}$ be a family of distance functions $f(u)=-d(A,u)$, with $Asubset V$
measurable. For all probability metric spaces we prove that
$$
sup_{finF}mu{f-E_{mu} fgeq x}=sup_{finF^{ast}}mu{f-E_{mu}
fgeq x},qquad xinR.
$$
In this paper, we study a simple correlation-based strategy for estimating the unknown delay and amplitude of a signal based on a small number of noisy, randomly chosen frequency-domain samples. We model the output of this "compressive matched filter" as a random process whose mean equals the scaled, shifted autocorrelation function of the template signal. Using tools from the theory of empirical processes, we prove that the expected maximum deviation of this process from its mean decreases sharply as the number of measurements increases, and we also derive a probabilistic tail bound on the maximum deviation. Putting all of this together, we bound the minimum number of measurements required to guarantee that the empirical maximum of this random process occurs sufficiently close to the true peak of its mean function. We conclude that for broad classes of signals, this compressive matched filter will successfully estimate the unknown delay (with high probability, and within a prescribed tolerance) using a number of random frequency-domain samples that scales inversely with the signal-to-noise ratio and only logarithmically in the in the observation bandwidth and the possible range of delays.
The theory of Distilled Sensing (DS) establishes that sequential adaptive sensing provides significant quantifiable improvements in effective signal-to-noise ratio (relative to non-adaptive sensing) in noisy high dimensional testing problems. The DS idea was recently extended to highly undersampled settings typical in compressed sensing (CS), and it was shown that a Compressive Distilled Sensing (CDS) method can produce similar performance gains in noisy CS problems. However, the results obtained in the initial work on CDS were restricted to settings where the unknown sparse signals of interest have limited dynamic range (on the order of a constant). In this work we describe a new adaptive compressive sensing approach which utilizes sequentially designed sparse measurement matrices. Our analysis of this approach shows that the quantifiable benefits of adaptivity in noisy CS problems can be achieved without restrictive conditions on the dynamic range of the signal being acquired.
This is joint work with Rich Baraniuk (Rice University), Rui Castro (Eindhoven University of Technology), and Rob Nowak (University of Wisconsin.
Entropy estimation is a common approach to empirically determining the spread of non-spherical distributions and has been applied to areas such as anomaly detection, intrinsic dimension estimation, information fusion, and structure estimation in factor graphs. k-NN graph estimators are a popular choice for estimating entropy with desirable properties including consistency and ease of implementation. However, in the high dimensional setting, the bias of k-NN graph estimators dominates the mean square error (MSE) and decays very slowly in the sample size T at a rate of order T^(-1/d) where d is the dimension. We present a weighted k-NN estimator that improves the MSE convergence rate to T^(-1) independent of dimension. We illustrate the weighted estimator for an anomaly detection problem in wireless sensor networks.
Keywords of the presentation: dependency graphs, sparse correlation models, hubs of high vertex degree, false discovery error control, local Bhattacharrya divergence
Random matrices are measured in many areas of engineering, social science, and natural science. When the rows of the matrix are random samples of a vector of dependent variables the sample correlations between the columns of the matrix specify a correlation graph that can be used to explore the dependency structure. However, as the number of variables increases screening for highly correlated variables becomes futile due to a high dimensional phase transition phenomenon: with high probability most variables will have large sample correlations even when there is no correlation present. We will present a general theory for predicting the phase transition and provide Poisson limit theorems that can be used to predict finite sample behavior of the correlation graph. We apply our framework to the problem of hub discovery in correlation and partial correlation (concentration) graphs. We illustrate our correlation screening framework in computational biology and, in particular, for discovery of influential variables in gene regulation networks. This is joint work with Bala Rajaratnam at Stanford University.
Keywords of the presentation: sparse recovery, compressive sensing, adaptivity
The goal of sparse recovery is to recover a sparse (with few non-zeros) approximation of data from the available information. We consider the problem of recovering the (approximately) best k-sparse approximation x* of an n-dimensional vector x from linear measurements of x. It is known that this task can be accomplished using m=O(k log (n/k)) *non-adaptive* measurements and that this bound is tight.
In this talk we show that if one is allowed to perform measurements that are adaptive, then the number of measurements can be considerably (sometimes exponentially) reduced, compared to the non-adaptive case. We also discuss implications of our results to data stream computing and compressive sensing.
Read More...
We present an efficient method for computing an
extendable re-parameterization of high dimensional stochastic data sets
generated by nonlinear mixing of independent physical parameters. Our method relies on the spectral decomposition of a particular diffusion kernel
constructed from observed data. According to Sturm-Liouville oscillation
theory, a subset of the kernel eigenvectors are monotonic functions of
the original parameters. Thus, a re-parameterization via these
eigenvectors provides an inverse map back to the space of physically
meaningful parameters. We further show how the inverse map can be
efficiently extended to newly observed data, and how the extension
enhances the performance of data classification.
We demonstrate the use of our method for solving empirically inverse problems such as the
reconstruction of geological formations from electro-magnetic measurements, and source localization
in Acoustics.
The least absolute shrinkage and selection operator (Lasso) for linear regression exploits the geometric interplay of the l2-data error objective and the l1-norm constraint to arbitrarily select sparse models. Guiding this uninformed selection process with sparsity models has been precisely the center of attention over the last decade in order to improve learning performance. To this end, we alter the selection process of Lasso in this paper to explicitly leverage combinatorial sparsity models (CSMs) via the combinatorial selection and least absolute shrinkage (CLASH) algorithm. A highlight is the introduction of a new algorithmic definition of CSMs, which we dub as the Polynomial time Modular ε-Approximation Property (PMAPε). PMAPε enables us to determine the impact of approximate combinatorial projections within CLASH. We then provide concrete guidelines how to leverage sets with PMAPε within CLASH, and characterize CLASH's estimation guarantees as a function of ε as well as the set restricted isometry constants of the regression matrix. Finally, we present experimental results using both simulated and real world data to demonstrate the effectiveness of CLASH.
Keywords of the presentation: Small Value Probability, Metric Entropy, Rare Events, Gaussian Measures
Small value probabilities or small deviations study the decay probability that positive random variables behave near zero.
In particular, small ball probabilities provide the asymptotic behavior of the probability measure inside a ball as the radius of the
ball tends to zero.
Metric entropy is defined as the logarithmic of the minimum covering number of compact set by balls of very small radius.
In this talk, we will provide an overview on precise connections between the small value probability
of Gaussian process/measure and the metric entropy of the associated compact operator.
Interplays and applications to many related problems/areas will be given, along with various fundamental tools and techniques from high dimensional probability theory.
Throughout, we use Brownian motion (integral operator) and Brownian
sheets (tensored Brownian motion and operator) as illustrating examples.
The theory of Compressive Sensing (CS) has enabled the efficient acquisition of high-bandwidth (but sparse) signals via nonuniform low-rate sampling protocols. While most work in CS
has focused on reconstructing the high-bandwidth signals from nonuniform low-rate samples, in this work, we consider the task of inferring the modulation of a communications signal directly in
the compressed domain, without requiring signal reconstruction. We show that the Nth power nonlinear features used for Automatic Modulation Recognition (AMR) are compressible in
the Fourier domain, and hence, that AMR of M-ary Phase-Shift-Keying (MPSK) modulated signals is possible by applying the same nonlinear transformation on nonuniform compressive
samples. We provide analytical support for the accurate approximation of AMR features from nonuniform samples, present practical rules for classification of modulation type using these
samples, and validate our proposed rules on simulated data.
We show how alternating direction methods are used recently in
sparse and low-rank optimization problems. We also show the iteration
complexity bounds for two specific types of alternating direction
methods, namely alternating linearizaton and multiple splitting methods.
We prove that the basic versions of both methods require $O(1/eps)$
iterations to obtain an eps-optimal solution, and the accelerated
versions of these methods require $O(1/sqrt{eps})$ iterations, while
the computational effort needed at each iteration is almost unchanged.
Numerical results on problems with millions of variables and constraints
arising from computer vision, machine learning and image processing are
reported to demonstrate the efficiency of the proposed methods.
We discuss several geometric approaches to the study of data sets in high-dimensional spaces that are assumed to have low-intrinsci dimension. On the one hand, we discuss diffusion geometry type of approaches, based on constructing proximity graphs between the data points in high dimensions, and using diffusion processes on such graphs to construct coordinates for the data and perform learning tasks. On the other hand, we discuss novel approaches based on multiscale geometric analysis, based on studying the behavior of local covariance matrices of the data at different scales. We apply this latter approach to intrinsic dimension estimation, multiscale dictionary learning, and density estimation.
We develop a novel geometric multiresolution analysis for analyzing intrinsically low dimensional point clouds in high-dimensional spaces, modeled as samples from a d-dimensional set M (in particular, a manifold) embedded in D-dimensional Euclidean space, in the regime d<<D. This type of situation has been recognized as important in various applications, such as the analysis of sounds, images, and gene arrays. In this paper we construct data-dependent multiscale dictionaries that aim at efficient encoding and manipulating of the data. Unlike existing constructions, our construction is fast, and so are the algorithms that map data points to dictionary coefficients and vice versa. In addition, data points have a guaranteed sparsity in terms of the dictionary.
We consider supervised learning problems where the features are embedded in a directed graph, and whose solutions are known to be a priori sparse. We address the task of selecting variables corresponding to connected components of the graph, and leverage to that effect recent work on structured sparsity. We introduce convex and non-convex regularization functions achieving this goal, which involve an exponential number of groups of variables, and show that when the graph is acyclic the combinatorial problems related to these formulations can be solved with network flow optimization techniques. We present experiments on image and genomic data, demonstrating the benefits of these penalties over existing ones as well as the scalability of our approach for prediction tasks.
This is a joint work with Bin Yu (UC Berkeley).
The information content of many phenomena of practical interest is often much less than what is suggested by their actual size. As an inspiring example, one active research area in biology is to understand the relations between the genes. While the number of genes in a so-called gene network can be large, the number of contributing genes to each given gene in the network is usually small compared to the size of the network. In other words, the behavior of each gene can be expressed as a sparse combination of other genes. The purpose of this work is to develop new theory and algorithms for exploiting this type of simplicity in the analysis of high-dimensional dynamical systems with a particular focus on system identification and estimation.
We look at high-dimensional but sparse systems, time varying systems with few piecewise-constant parameter changes, and large-scale interconnected dynamical systems when the graph has a sparse flow. Inspired by the field of Compressive Sensing (CS), we call these problems Compressive System Identification (CSI), Compressive Topology Identification (CTI), etc.
Keywords of the presentation: multiple testing, sparse recovery, sequential testing
Clustering and ranking is often based on pairwise similarities (metric data) or comparisons (ordinal data). Most methods assume that the entire collection of all possible pairwise similarities or comparisons are known, but in high-dimensional settings there may be missing data and/or the costs of collecting this information may be prohibitive. The existence of clusters or other intrinsic low-dimensional structure can induce redundancy in these data, and therefore it may be possible to robustly cluster or rank based on a small subset of the data. I will discuss two results that demonstrate this phenomenom. First, I will describe a clustering procedure that generates a hierarchical clustering of N objects using only N log N similarities, instead of all N(N-1)/2 similarities. The method can recover all clusters of size larger than log N, even in the presence of a limited fraction of arbitrarily corrupted or noisy similarities. Second, if N objects are embedded in d-dimensions (d
Keywords of the presentation: low-rank matrix approximation, subset selection
I will discuss recent algorithmic developments for the classical problem of approximating a given matrix by a low-rank matrix. This is motivated by the need of faster algorithms for very large data and certain applications that want the approximating matrix to have rows living in the span of only a few rows of the original matrix, which adds a combinatorial twist to the problem. The novel algorithms are based on sampling rows randomly (but non-uniformly) and random projection, from which a low rank approximation can be computed.
Keywords of the presentation: Eigenvalues, sample covariance matrix, population matrix
I will begin by reviewing limiting properties of the eigenvalues of a class of sample covariance matrices, where the vector dimension and the sample size
approach infinity, their ratio approaching a positive constant. These properties are relevant in situations in multivariate analysis where the vector dimension is large, but the number of samples needed to adequately approximate the population matrix (as prescribed in standard statistical procedures) cannot be attained. Work has been done in estimating the population eigenvalues from those of the sample covariance matrix. I will introduce a method devised by X. Mestre, and will present an extension of his method to another ensemble of random matrices important in wireless communications.
Despite the empirical success of spectral clustering, its performance under noise and in high dimensions is not well understood. We provide a precise characterization of the misclustering rate of spectral clustering under noisy similarities. Using minimax theory, we establish information theoretic lower bounds on the amount of noise any clustering algorithm can tolerate and demonstrate that spectral clustering is near-optimal. The gap relative to the optimal performance is the statistical price for computational efficiency gained by solving a relaxation of the minimum ratio-cut problem. Additionally, in high dimensional settings, the clusters can be very small and are often characterized by only a few relevant features. To enable simultaneous extraction of such a cluster and the relevant features, we consider a sparsity penalized spectral biclustering method and compare its performance to minimax limits.
Almost any kind of experimental sciences requires the analysis of the data. Depending on the specific application, the collection
of data may consist of measurements, signals, or images. In mathematical framework, all of those objects are functions. One
way to analyze them is by representing into a wavelet decomposition. Wavelets provide reconstruction (approximation) of the original
function (the collection of data). In order to characterize the approximation class, one has to establish Bernstein type
inequalitiy which helps to investigate the magnitude of the coefficients in a wavelet decomposition of the
function. The coefficients tell in what way the analyzing function needs to be modified in order to reconstruct the data.
We analyze wavelet coefficients for both the family of Daubechies orthonormal wavelets and the
family of semiorthogonal spline wavelets in L_p space. In particular, Bernstein type inequalities associated with wavelets
are presented. We obtain a sharp estimate of Bernstein type inequality for splines. We also study the asymptotic
behavior of wavelet coefficients for both families mentioned above. Comparison of these two families is done by using a quantity C_{k,p}
related to the wavelet generators. In addition we determine the exact rate of decay of the m-th order B-spline wavelet and thereby describe
the asymptotic behavior of the wavelet. Finally we will use splines in order to get the asymptotically best constant in Ball's integral inequality.
Keywords of the presentation: Random quantum states, entanglement, partial transpose, Wishart ensemble, K-convexity
We study generic properties of high-dimensional quantum states.
Specifically, for a random state on H=C^d otimes C^d obtained by partial tracing a random pure state on H otimes C^s, we consider the problem whether it is typically separable or typically entangled. We show that a threshold occurs when the ancilla dimension s is of order roughly d^3.
Our approach allows to similarly analyze other properties such as
for example positive partial transpose (PPT). Mathematically, each problem reduces to studying properties (albeit somewhat exotic) of high-dimensional complex Wishart matrices. The arguments rely on high-dimensional probability, classical convexity, random matrices, and geometry of Banach spaces.
Based on joint work with G. Aubrun and D. Ye.
Read More...
In this work, we develop verifiable and computable performance bounds on sparsity recovery. We define a family of goodness measures for arbitrary sensing matrices as a set of optimization problems, and design algorithms based on fixed point theory to compute these goodness measures. The proposed algorithms solve a series of second-order cone programs, or linear programs. As a by-product, we implement an efficient algorithm to verify a sufficient condition for exact sparsity recovery in the noise-free case. We derive performance bounds on the recovery errors in terms of these goodness measures. We also analytically demonstrate that the developed goodness measures are non-degenerate for a large class of random sensing matrices, as long as the number of measurements is relatively large. Numerical experiments show that, compared with the restricted isometry based performance bounds, our error bounds apply to a wider range of problems and are tighter, when the sparsity levels of the signals are relatively low.
Keywords of the presentation: Greedy approximation, Compressed sensing, relaxation, thresholding
While the ℓ1 minimization technique plays an important role in designing computationally tractable recovery methods in compressed sensing, its complexity is still impractical for many applications. An attractive alternative to the ℓ1 minimization is a family of greedy algorithms. We will discuss several greedy algorithms from the point of view of their practical applicability and theoretical performance.
Keywords of the presentation: random matrix, large deviations
We introduce a new methodology for studying the maximum eigenvalue of a sum of independent, symmetric random matrices. This approach results in a complete set of extensions to the classical tail bounds associated with the names Azuma, Bennett, Bernstein, Chernoff, Freedman, Hoeffding, and McDiarmid. Results for rectangular random matrices follow as a corollary. This research is inspired by the work of Ahlswede--Winter and Rudelson--Vershynin, but the new methods yield essential improvements over earlier results. We believe that these techniques have the potential to simplify the study a large class of random matrices.
Read More...
Graphical models compactly capture stochastic dependencies amongst a
collection of random variables using a graph. Inference over graphical
models corresponds to finding marginal probability distributions given
joint probability distributions. Several inference algorithms rely on
iterative message passing between nodes. Although these algorithms can
be generalized so that the message passing occurs between clusters of
nodes, there are limited frameworks for finding such clusters.
Moreover, current frameworks rely on finding clusters that are
overlapping. This increases the computational complexity of finding
clusters since the edges over a graph with overlapping clusters must
be chosen carefully to avoid inconsistencies in the marginal
distribution computations. In this paper, we propose a framework for
finding clusters in a graph for generalized inference so that the
clusters are emph{non-overlapping}. Given an undirected graph, we
first derive a linear time algorithm for constructing a block-tree, a
tree-structured graph over a set of non-overlapping clusters. We show
how the belief propagation (BP) algorithm can be applied to
block-trees to get exact inference algorithms. We then show how larger
clusters in a block-tree can be efficiently split into smaller
clusters so that the resulting graph over the smaller clusters, which
we call a block-graph, has lower number of cycles than the original
graph. We show how loopy BP (LBP) can be applied to block-graphs for
approximate inference algorithms. Numerical simulations show the
benefits of running LBP on block-graphs as opposed to running LBP on
the original graph. Our proposed framework for generalizing BP and LBP
can be applied to other inference algorithms.
Keywords of the presentation: M-ellipsoid, shortest vector, integer programming
The M-ellipsoid of a convex body K is a fundamental object introduced by V. Milman. It has the same volume as K and can cover K with the union a small number of translations. There are several proofs of the existence of an M-ellipsoid. We present algorithms for constructing M-ellipsoids based on two of these proofs. The first is randomized and is via a proof by Klartag giving optimal covering bounds. The second, based on a proof by Pisier, achieves slightly weaker bounds but leads to a deterministic algorithm.
We use M-ellipsoids in a new algorithm for enumerating lattice points in a convex body. This has several consequences, including more efficient algorithms for (a) the shortest vector problem in any semi-norm (distance induced by a convex body) (b) integer programming (c) deterministic approximate volume computation.
This is joint work with Daniel Dadush and Chris Peikert.
Computing the first few singular vectors of a large matrix is a problem
that frequently comes up in statistics and numerical analysis. Given the presence of noise, exact calculation is hard to achieve, and the following problem is of importance:
How much does a small perturbation to the matrix change the singular vectors ?
Answering this question, classical theorems, such as those of Davis-Kahan and Wedin, give tight estimates for the worst-case scenario.
In this paper, we show that if the perturbation (noise) is random and our matrix has low rank, then
better estimates can be obtained. Our method relies on high dimensional geometry and is different from those used an earlier studies.
Keywords of the presentation: HIgh-dimensional statistics; optimization theory.
Many methods for high-dimensional statistical problems are based on
solving convex optimization problems that combine a loss function,
measuring fit to the data, with a regularizer that encourages some
type of ``low-dimensional'' structure. Examples include sparsity in
vectors (Lasso), block and patterned (inverse) covariance matrices
(graphical Lasso and variants), low-rank matrices, and structured
models in non-parametric regression.
The literature on high-dimensional estimation has been growing at a
rapid rate, and there are now a large number of results about various
estimators and models. In this talk, we describe a general set of
principles that can be used to derive bounds on the statistical error
for a broad class of optimization-based estimators. Fast rates for
high-dimensional problems are possible under two conditions: a
decomposability property of the regularizer, a notion of restricted
strong convexity that involves an interaction between the loss
function and regularizer. We present a general theorem, and then
illustrate how it can be used to derive various results, some known
and others new, in a relatively straightforward way. We also discuss
the consequences of these properties for guaranteeing fast convergence
of simple gradient methods for solving the convex programs.
Some reference papers:
Statistical bounds: http://arxiv.org/abs/1010.2731v1
Fast optimization rates: http://arxiv.org/abs/1104.4824
This work presents a technique to denoise a signal ensemble by
exploiting sparsity both at the inter and intra-signal level. The
problem of signal denoising using thresholding estimators has received a
lot of attention in the literature, starting in the 1990s when Donoho
and Johnstone introduced the concept of wavelet shrinkage. In this
approach, the signal is represented in a basis where it is sparse, and
each noisy coefficient is thresholded by a parameter that depends on the
noise level. We are extending this concept to a set of signals, under
the assumption that the signals have a common support. Our approach is
based on a vetoing mechanism, where in addition to thresholding, the
inter-signal information is used to "save" a coefficient that otherwise
would be "killed". Our method achieve a smaller risk than the
independent denoising, and we quantify the expected value of this
improvement. The results show a consistent improvement over the
independent denoising, achieving results close to the ones produced by
an oracle. We validate the technique using both synthetic and real world
signals
Keywords of the presentation: spectral clustering, stochastic block model, social network
In recent years network analysis have become the focus of much
research in many fields including biology, communication studies,
economics, information science, organizational studies,
and social psychology. Communities or clusters of highly connected actors
form an essential feature in the structure of several empirical networks.
Spectral clustering is a popular and computationally feasible method to
discover these communities.
The Stochastic Block Model is a social network model with well defined
communities. This talk will give conditions for spectral clustering to
correctly estimate the community membership of nearly all nodes.
These asymptotic results are the first clustering results that
allow the number of clusters in the model to
grow with the number of nodes, hence the name high-dimensional.
Moreover, I will present on-going work on directed spectral clustering
for networks whose edges are directed, including the enron data as an example.
Read More...
Keywords of the presentation: Brown measure, single ring, perturbations
I will describe recent results (obtained jointly with A. Guionnet P. Wood) concerning perturbations of non-normal random matrices and their stabilization by additive noise. This builds on techniques introduced earlier in
the context of the ``single ring theorem'', by Guionnet, Krishnapur, and the speaker.
Keywords of the presentation: Graphical model selection, Covariance estimation, thresholding
Undirected graphs are often used to describe high dimensional distributions. Under sparsity conditions, the graph can be estimated using ℓ1-penalization methods. This talk presents the following method. We combine a multiple regression approach with ideas of thresholding and refitting: first we infer a sparse undirected graphical model structure via thresholding of each among many ℓ1-norm penalized regression functions; we then estimate the covariance matrix and its inverse using the maximum likelihood estimator. Under suitable conditions, this approach yields consistent estimation in terms of graphical structure and fast convergence rates with respect to the operator and Frobenius norm for the covariance matrix and its inverse. We also derive an explicit bound for the Kullback Leibler divergence. Reference: http://arxiv.org/abs/1009.0530