# <span class=strong>IMA Tea and Poster Session</span>

Monday, March 5, 2007 - 4:30pm - 6:30pm

Lind 400

**Geometry of Rank Tests**

Jason Morton (University of California, Berkeley)

We investigate the polyhedral geometry of conditional probability and

undirected graphical models, developing new statistical procedures

called convex rank tests. The polytope associated to an undirected

graphical conditional independence model is the graph associahedron.

The convex rank test defined by the dual semigraphoid to the n-cycle

graphical model is applied to microarray data analysis to detect

periodic gene expression.**Maximum Likelihood Estimation in Latent Class**

Yi Zhou (Carnegie-Mellon University)

Latent class models have been used to explain the heterogeneity of the observed relationship among a set of categorical variables and have received more and more attention as a powerful methodology for analyzing discrete data. The central goal of our work is to study the existence and computation of maximum likelihood estimates (MLEs) for these models, which are cardinal for assessment of goodness of fit and model selection. Our study is at the interface between the fields of algebraic statistics and machine learning.

Traditionally, the expectation maximization (EM) algorithm has been applied to compute the MLEs of a latent class model. However, the solutions provided by the EM correspond to local maxima only, so, although we are able to compute them effectively, we still lack methods for assessing uniqueness and existence of the MLEs. Another interesting problem in statistics is the identifiability of the model. When a model is unidentifiable, it is necessary to adjust the number of degrees of freedom in order to apply correctly goodness-of-fit tests. In our work, we show that both the existence and identifiability problems are closely related to the geometric properties of the latent class models. Therefore, studying the algebraic varieties and ideals arising from these models is particularly relevant to our problem. We include a number of examples as a way of opening a discussion on a general method for addressing both MLE existence and identifiability in latent class models.**Supervised Learning Artificial Neural Network Algorithms for Optimizing Mechanical Properties of Elastin-like Polypeptide Hydrogels for Cartilage Repair**

Joint work with Dana L. Nettles^{3}, Kimberly Trabbic Carlson^{3},

Ashutosh Chilkoti^{3}, Lori A. Setton^{3,4}, Mansoor A. Haider^{1,2}

Elastin-like polypeptide (ELP) hydrogels are a class of biomaterials that

have potential utility as a biocompatible scaffold for filling defects due

to osteoarthritis and for regenerating cartilage. Because of the facility

to genetically engineer elastin sequence, there are almost endless

possible configurations of ELPs and conformations of the networks formed

after crosslinking. ELP biomaterial function will exhibit a complex

dependence on these polymer characteristics that impacts properties

expected to affect cartilage regeneration, such as mechanical load

support. These complex structure-function relationships for crosslinked

ELP hydrogels are not well described. A method for predicting the

mechanical properties of ELP hydrogels was developed based on structural

properties and Supervised Artificial Neural Network (ANN) modeling. The

ANN Model used concentration, molecular weight, crosslink density, and

sample number to predict the dynamic shear modulus and loss angle of the

hydrogels. The ANN was implemented in a custom compiled code based on the

Scaled Conjugate Gradient minimization algorithm and a Monte Carlo Method

was used to expand the dataset. The ANN was trained using a varying

subsets of the full dataset (22 formulations), with the complementary

subset used for validation. Trained networks demonstrated excellent

accuracy in prediction of hydrogel dynamic shear modulus at physiological

temperature, based on polymer design and predictions were robust with

respect to statistical variations. The results are used to show the

validity of an intermediate screening process using ANNs to obtain the

optimal mechanical properties for the ELP.^{1}Biomathematics Graduate Program, North Carolina State University,^{2}

Department of Mathematics, North Carolina State University,^{3}Department

of Biomedical Engineering, Duke University,^{4}Department of Surgery, Duke

Medical Center**Classifying Disease Models Using Regular Polyhedral Subdivisions**

Debbie Yuster (Columbia University)

Genes play a complicated role in how likely one is to get a certain disease. Biologists would like to model how one's genotype affects their likelihood of illness. We propose a new classification of two-locus disease models, where each model corresponds to an induced subdivision of a point configuration (basically a picture of connected dots). Our models reflect epistasis, or gene interaction. This work is joint with Ingileif Hallgrimsdottir. For more information, see our preprint at arXiv:q-bio.QM/0612044.**Multiple Solutions to the Likelihood Equations in the Behrens-Fisher Problem**

Mathias Drton (University of Chicago)

The Behrens-Fisher problem concerns testing the statistical hypothesis

of equality of the means of two normal populations with possibly

different variances. This problem

furnishes one of the simplest statistical models for which the likelihood

equations may have more than one real solution. In fact, with

probability one, the equations have either one or three real solutions.

Using the cubic discriminant, we study the large-sample probability of

one versus three solutions.**A Flow that Computes the Best Positive Semi-definite Approximation of a Symmetric Matrix**

Kenneth Driessel (Iowa State University)

We work in the space of n-by-n real symmetric

matrices with the Frobenius inner product.

Consider the following problem:*Problem:***Positive semi-definite**Given an n-by-n real symmetric matrix

approximation.

A, find the positive semi-definite matrix

which is closest to A.

I discuss the following differential equation

in the space of symmetric matrices:

X^{′ }= (A-X)X^{2}+ X^{2}(A-X) .

The corresponding flow preserves inertia.

In particular, if the initial value X(0)=M

is a positive definite matrix then X(t)

is positive definite for all t>0. I

show that the distance between A

and X(t) decreases as t increases.

I also show that if A has distinct

nonzero eigenvalues (which is a generic

condition) then the solution X(t)

converges to the positive semi-definite

matrix which is closest to A.**Conditional Independence for Gaussian Random Variables is not Finitely Axiomatizable**

Seth Sullivant (Harvard University)

It is known that for general distributions, there is no finite list of conditional independence axioms that can be used to deduce all implications among a collection of conditional independence statements. We show the same result holds among the class of Gaussian random variables by exhibiting, for each n>3, a collection of n independence statements on n random variables, which, in the Gaussian case imply that X_1 is independent of X_2, but such that no subset implies that X_1 is independent of X_2. The proof depends on the fact that conditional independence models for Gaussian random variables are algebraic varieties in the cone of positive definite matrices and makes use of binomial primary decomposition.**Toric Ideals of Phylogenetic Invariants for the General Group-based Model on Claw Trees**

We address the problem of studying the toric ideals of

phylogenetic invariants for a general group-based model on an

arbitrary claw tree. We focus on the group_{2}and

choose a natural recursive approach that extends to other

groups. The study of the lattice associated with each

phylogenetic ideal produces a list of circuits that generate

the corresponding lattice basis ideal. In addition, we

describe explicitly a quadratic lexicographic Gröbner basis

of the toric ideal of invariants for the claw tree on an

arbitrary number of leaves. Combined with a result of Sturmfels

and Sullivant, this implies that the phylogenetic ideal of

every tree for the group_{2}has a quadratic

Gröbner basis. Hence, the coordinate ring of the toric

variety is a Koszul algebra.

This is joint work with Julia Chifman, University of Kentucky.**Metric Learning for Phylogenetic Invariants**

Nicholas Eriksson (Stanford University)

We introduce new methods for phylogenetic tree construction by using

machine learning to optimize the power of phylogenetic invariants.

Phylogenetic invariants are polynomials in the joint probabilities

which vanish under a model of evolution on a phylogenetic tree. We

give algorithms for selecting a good set of invariants and for

learning a metric on this set of invariants which optimally

distinguishes the different models. Our learning algorithms involve

semidefinite programming on data simulated over a wide range of

parameters. Simulations on trees with four leaves under the

Jukes-Cantor and Kimura 3-parameter models show that our method

improves on other uses of invariants and is competitive with

neighbor-joining. Our main biological result is that the trained

invariants can perform substantially better than neighbor joining on

quartet trees with short interior edges.

This is joint work with Yuan Yao (Stanford).**Linkage Problems and Real Algebraic Geometry**

Thorsten Theobald (Johann Wolfgang Goethe-Universität Frankfurt)

Joint work with Reinhard Steffens.

Linkages are graphs whose edges are rigid bars, and they arise

as a natural model in many applications in computational

geometry,

molecular biology and robotics. Studying linkages naturally

leads

to a variety of questions in real algebraic geometry, such as:- Given a rigid graph with prescribed edge lengths, how

many

embeddings are there? - Given a 1-degree-of-freedom linkage, how can one

characterize

and compute the trajectory of the vertices?

From the real algebraic point of view, these questions are

questions

of specially-structured real algebraic varieties. On the poster

we

exhibit some techniques from sparse elimination theory to

analyze these problems. In particular, we show that certain

bounds (e.g. for Henneberg-type graphs) naturally arise from

mixed volumes and Bernstein's theorem.- Given a rigid graph with prescribed edge lengths, how