Campuses:

<span class=strong>IMA Tea and Poster Session</span>

Monday, March 5, 2007 - 4:30pm - 6:30pm
Lind 400
  • Geometry of Rank Tests
    Jason Morton (University of California, Berkeley)
    We investigate the polyhedral geometry of conditional probability and
    undirected graphical models, developing new statistical procedures
    called convex rank tests. The polytope associated to an undirected
    graphical conditional independence model is the graph associahedron.
    The convex rank test defined by the dual semigraphoid to the n-cycle
    graphical model is applied to microarray data analysis to detect
    periodic gene expression.
  • Maximum Likelihood Estimation in Latent Class
    Yi Zhou (Carnegie-Mellon University)
    Latent class models have been used to explain the heterogeneity of the observed relationship among a set of categorical variables and have received more and more attention as a powerful methodology for analyzing discrete data. The central goal of our work is to study the existence and computation of maximum likelihood estimates (MLEs) for these models, which are cardinal for assessment of goodness of fit and model selection. Our study is at the interface between the fields of algebraic statistics and machine learning.



    Traditionally, the expectation maximization (EM) algorithm has been applied to compute the MLEs of a latent class model. However, the solutions provided by the EM correspond to local maxima only, so, although we are able to compute them effectively, we still lack methods for assessing uniqueness and existence of the MLEs. Another interesting problem in statistics is the identifiability of the model. When a model is unidentifiable, it is necessary to adjust the number of degrees of freedom in order to apply correctly goodness-of-fit tests. In our work, we show that both the existence and identifiability problems are closely related to the geometric properties of the latent class models. Therefore, studying the algebraic varieties and ideals arising from these models is particularly relevant to our problem. We include a number of examples as a way of opening a discussion on a general method for addressing both MLE existence and identifiability in latent class models.

  • Supervised Learning Artificial Neural Network Algorithms for Optimizing Mechanical Properties of Elastin-like Polypeptide Hydrogels for Cartilage Repair

    Joint work with Dana L. Nettles3, Kimberly Trabbic Carlson3,
    Ashutosh Chilkoti3, Lori A. Setton3,4, Mansoor A. Haider1,2

    Elastin-like polypeptide (ELP) hydrogels are a class of biomaterials that
    have potential utility as a biocompatible scaffold for filling defects due
    to osteoarthritis and for regenerating cartilage. Because of the facility
    to genetically engineer elastin sequence, there are almost endless
    possible configurations of ELPs and conformations of the networks formed
    after crosslinking. ELP biomaterial function will exhibit a complex
    dependence on these polymer characteristics that impacts properties
    expected to affect cartilage regeneration, such as mechanical load
    support. These complex structure-function relationships for crosslinked
    ELP hydrogels are not well described. A method for predicting the
    mechanical properties of ELP hydrogels was developed based on structural
    properties and Supervised Artificial Neural Network (ANN) modeling. The
    ANN Model used concentration, molecular weight, crosslink density, and
    sample number to predict the dynamic shear modulus and loss angle of the
    hydrogels. The ANN was implemented in a custom compiled code based on the
    Scaled Conjugate Gradient minimization algorithm and a Monte Carlo Method
    was used to expand the dataset. The ANN was trained using a varying
    subsets of the full dataset (22 formulations), with the complementary
    subset used for validation. Trained networks demonstrated excellent
    accuracy in prediction of hydrogel dynamic shear modulus at physiological
    temperature, based on polymer design and predictions were robust with
    respect to statistical variations. The results are used to show the
    validity of an intermediate screening process using ANNs to obtain the
    optimal mechanical properties for the ELP.

    1 Biomathematics Graduate Program, North Carolina State University, 2
    Department of Mathematics, North Carolina State University, 3 Department
    of Biomedical Engineering, Duke University, 4 Department of Surgery, Duke
    Medical Center


  • Classifying Disease Models Using Regular Polyhedral Subdivisions
    Debbie Yuster (Columbia University)
    Genes play a complicated role in how likely one is to get a certain disease. Biologists would like to model how one's genotype affects their likelihood of illness. We propose a new classification of two-locus disease models, where each model corresponds to an induced subdivision of a point configuration (basically a picture of connected dots). Our models reflect epistasis, or gene interaction. This work is joint with Ingileif Hallgrimsdottir. For more information, see our preprint at arXiv:q-bio.QM/0612044.
  • Multiple Solutions to the Likelihood Equations in the Behrens-Fisher Problem
    Mathias Drton (University of Chicago)
    The Behrens-Fisher problem concerns testing the statistical hypothesis
    of equality of the means of two normal populations with possibly
    different variances. This problem
    furnishes one of the simplest statistical models for which the likelihood
    equations may have more than one real solution. In fact, with
    probability one, the equations have either one or three real solutions.
    Using the cubic discriminant, we study the large-sample probability of
    one versus three solutions.
  • A Flow that Computes the Best Positive Semi-definite Approximation of a Symmetric Matrix
    Kenneth Driessel (Iowa State University)
    We work in the space of n-by-n real symmetric
    matrices with the Frobenius inner product.
    Consider the following problem:

    Problem: Positive semi-definite
    approximation.
    Given an n-by-n real symmetric matrix
    A, find the positive semi-definite matrix
    which is closest to A.

    I discuss the following differential equation
    in the space of symmetric matrices:


    X= (A-X)X2 + X2(A-X) .



    The corresponding flow preserves inertia.
    In particular, if the initial value X(0)=M
    is a positive definite matrix then X(t)
    is positive definite for all t>0. I
    show that the distance between A
    and X(t) decreases as t increases.
    I also show that if A has distinct
    nonzero eigenvalues (which is a generic
    condition) then the solution X(t)
    converges to the positive semi-definite
    matrix which is closest to A.
  • Conditional Independence for Gaussian Random Variables is not Finitely Axiomatizable
    Seth Sullivant (Harvard University)
    It is known that for general distributions, there is no finite list of conditional independence axioms that can be used to deduce all implications among a collection of conditional independence statements. We show the same result holds among the class of Gaussian random variables by exhibiting, for each n>3, a collection of n independence statements on n random variables, which, in the Gaussian case imply that X_1 is independent of X_2, but such that no subset implies that X_1 is independent of X_2. The proof depends on the fact that conditional independence models for Gaussian random variables are algebraic varieties in the cone of positive definite matrices and makes use of binomial primary decomposition.
  • Toric Ideals of Phylogenetic Invariants for the General Group-based Model on Claw Trees

    We address the problem of studying the toric ideals of
    phylogenetic invariants for a general group-based model on an
    arbitrary claw tree. We focus on the group 2 and
    choose a natural recursive approach that extends to other
    groups. The study of the lattice associated with each
    phylogenetic ideal produces a list of circuits that generate
    the corresponding lattice basis ideal. In addition, we
    describe explicitly a quadratic lexicographic Gröbner basis
    of the toric ideal of invariants for the claw tree on an
    arbitrary number of leaves. Combined with a result of Sturmfels
    and Sullivant, this implies that the phylogenetic ideal of
    every tree for the group 2 has a quadratic
    Gröbner basis. Hence, the coordinate ring of the toric
    variety is a Koszul algebra.

    This is joint work with Julia Chifman, University of Kentucky.
  • Metric Learning for Phylogenetic Invariants
    Nicholas Eriksson (Stanford University)
    We introduce new methods for phylogenetic tree construction by using
    machine learning to optimize the power of phylogenetic invariants.
    Phylogenetic invariants are polynomials in the joint probabilities
    which vanish under a model of evolution on a phylogenetic tree. We
    give algorithms for selecting a good set of invariants and for
    learning a metric on this set of invariants which optimally
    distinguishes the different models. Our learning algorithms involve
    semidefinite programming on data simulated over a wide range of
    parameters. Simulations on trees with four leaves under the
    Jukes-Cantor and Kimura 3-parameter models show that our method
    improves on other uses of invariants and is competitive with
    neighbor-joining. Our main biological result is that the trained
    invariants can perform substantially better than neighbor joining on
    quartet trees with short interior edges.

    This is joint work with Yuan Yao (Stanford).
  • Linkage Problems and Real Algebraic Geometry
    Thorsten Theobald (Johann Wolfgang Goethe-Universität Frankfurt)
    Joint work with Reinhard Steffens.

    Linkages are graphs whose edges are rigid bars, and they arise
    as a natural model in many applications in computational
    geometry,
    molecular biology and robotics. Studying linkages naturally
    leads
    to a variety of questions in real algebraic geometry, such as:

    1. Given a rigid graph with prescribed edge lengths, how
      many
      embeddings are there?

    2. Given a 1-degree-of-freedom linkage, how can one
      characterize
      and compute the trajectory of the vertices?


    From the real algebraic point of view, these questions are
    questions
    of specially-structured real algebraic varieties. On the poster
    we
    exhibit some techniques from sparse elimination theory to
    analyze these problems. In particular, we show that certain
    bounds (e.g. for Henneberg-type graphs) naturally arise from
    mixed volumes and Bernstein's theorem.