Main navigation  Main content
IMA Thematic Year on Probability and Statistics in Complex Systems: Genomics, Networks, and Financial EngineeringThis graphic was made by the Walrus Graph Visualization tool. It is a visualization of roundtrip time measurements made by CAIDA's macroscopic Internet topology monitor located in Herndon, Virginia, USA. Visualization by Young Hyun. Data analysis by Bradley Huffaker. More information is available. Copyright © 2002 The Regents of the University of California. 
The proposed program is devoted to the application of probability and statistics to problems in three areas: the genome sciences, networks and financial engineering. These application areas are all associated with complex systems, and strategies for system analysis will serve as an organizing principle for the program. (By complex systems we mean systems with a very large number of interacting parts such that the interactions are nonlinear in the sense that we cannot predict the behavior of the system simply by understanding the behavior of the component parts.) Furthermore, these areas share the common feature that they are systems for which a huge amount of data is available.Mathematical models developed for these systems must be informed by this data, if they are to provide a basis for scientific understanding of the systems and for critical decisionmaking about them. The mathematical and statistical foundations of this program will include stochastic modeling and simulation, statistics, and massive data set analysis, as well as dynamical systems, network and graph theory, optimization, control, design of computer and physical experiments, and statistical visualization. The program will be particularly appropriate for probability/statistics postdocs and longterm participants with some background in at least one of the three major areas of application and an interest in developing the integration tools that will provide them with an entrée into modeling/data integration issues in the other areas. There will be extensive tutorials in the application areas.
The health of human populations and biosystems, information networks, and financial systems, is fundamental to the success of modern civilization. A map of the human genome is nearly in hand, but we are just beginning to understand how to harness it. (For example, how does the onedimensional information encoded in DNA lead to the immensely complicated threedimensional structure of proteins, the protein folding problem?) Trying to understand the function of a single gene leads to a complicated set of analyses that requires the integration of stochastic and biological models with noisy, high dimensional data coming from multiple sources. Integration of this information for all of the genes, through comparative and evolutionary genomics, is critical in determining the role of gene related diseases in the human population, and in combating these diseases. A federation of 6000 autonomous networks, called the Internet, and wireless communications are on their way to providing anywhere, anytime multimedia communication. Interconnected power networks linking independent power producers with consumers in an environment with diminishing regulation raise many new questions regarding both the physical and economic operation of the electric power system. In few of these systems is there centralized control. How can we ensure that they work properly? In finance, the dynamics of the sequence of events triggered by the default of Russian government bonds in August 1998, demonstrates that the global financial system is an extraordinarily complex network of relations involving broker/dealers, banks, institutional investors, and other counterparties. The global volatility triggered by the default is a wakeup call to society on the importance of a deeper understanding and control of financial systems. In all these areas, issues such as network topology, the "degree of connectedness", computational complexity, and the probability of systemic failure are relevant, as is the capacity to sample a system and store large amounts of data. Furthermore, system constraints create complex dependencies amongst elements of the sampled data. For example coordinated gene expression causes DNA chip measurements to exhibit strong positive dependence amongst genes in common biochemical pathways, communication traffic with many sources and destinations shares common bottleneck links, and serial dependence is clearly present in financial timeseries data.
The mathematical sciences, and particularly probabilistic and statistical methods, are key to understanding the dependencies of these systems. Interacting stochastic systems and cellular automata, as well as dynamical systems and partial differential equations are examples of mathematical structures directed at understanding how one part of a system influences other parts and how those influences propagate. Historically, limited computational power restricted the size and complexity of the systems that could be usefully modeled and, in many settings, limited data made it difficult or impossible to evaluate the appropriateness and accuracy of proposed models. An explosion in computational power and other technical advances that support collection of large amounts of data have radically altered this situation. Simultaneous with and in part because of these technological advances, new areas of application have emerged that require the understanding of systems whose size and complexity tests the limits of even the most recent computational, mathematical, and statistical methodologies.
Understanding in the diverse areas of genomics, communication networks, and financial engineering will benefit from the broad view which we propose to adopt in the one year IMA program  a view based on the development and analysis of stochastic models and the implementation of the essential statistical analysis using advanced computational methods. To avoid stochastic modeling is to proceed at some peril. Pieces of a system might be considered in isolation using rudimentary data analysis, but this isolated approach may not provide the most efficient analysis and, more importantly, may not allow certain critical questions to be addressed. It is a basic premise of stochastic modeling that data are viewed as the realization of a stochastic process. An appropriate modeling framework allows inferences about unknown system elements or decisions about how to manage the system to be expressed in terms of the stochastic process. Indeed, if we can work out properties of the underlying stochastic process, we may come to a better understanding of the entire system. To do so requires sophisticated mathematical techniques. Computational algorithms are crucial for implementing the calculations suggested by the stochastic models. Advanced computer systems not only allow us to collect more data, but they also allow us to run much more sophisticated analyses than have been possible previously. Furthermore, statistical methods provide us not only with estimates or predictions of unknown quantities, but also with precise statements about our corresponding uncertainty.
Participating postdocs will require and will acquire skills in probability and the mathematical analysis of stochastic models, in the development of appropriate stochastic models by careful study of subject matter, in statistical inference, and in computational methods such as optimization and Monte Carlo.
A working draft of the human genome was made publicly available in summer 2000, with a final sequence, erring less often than once per 10,000 bases, to follow within two years. More than twenty microbial genomes are already complete, and the sequencing of both plant and animal model organisms is well underway. Coupled with the availability of sequence data are technologies that enable us to measure the simultaneous gene expression pattern in a cell. Obtaining such a mass of data will mark the beginning of a period of exceptional knowledge discovery in biology. Eric Lander of the Whitehead Institute has likened the effect on biology of these new resources to the effect on chemistry of the periodic table. Having a global view, knowing all the genes, their function, their common alleles, and the biochemical pathways in which they participate will have a profound effect on science and medicine. Mathematics and statistics have the potential to have a larger impact in the processing and analysis of genome data than the clearly substantial effect they have had in the fields of molecular biology and genetics to date. (See Calculating the Secrets of Life: Contributions of the Mathematical Sciences to Molecular Biology, Eric S. Lander and Michael S. Waterman, Editors; Committee on the Mathematical Sciences in Genome and Protein Structure Research, National Research Council, 1995.) The function of most genes is unknown, and stochastic modeling may improve the way inference from expression profiles works in the area of functional genomics. Stochastic models have long been used in evolutionary modeling, and new models and computational methods are needed to cope with whole genome comparisons in comparative genomics. Statistical methods are being refined in genetic mapping studies that now can in principle consider hundreds of thousands of markers in an attempt to find genes affecting complex diseases. The problem of inferring protein structure is longstanding and continues to demand the most sophisticated mathematical, statistical, and computational approaches. There are undoubtedly many more mathematical and statistical problems that will arise from the genome sciences.
The purpose of the term is to uncover emerging problems in computational molecular biology. CMB has a long history that includes techniques such as sequence alignment, sequencing, physical mapping, and so on. Each of these has a welldeveloped set of methods for its analysis (such as BLAST). Here we address the next generation of problems. Two recent examples should serve to illustrate the possibilities: SNP (single nucleotide polymorphism) detection, and DNA microarrays. SNPs are locations in DNA at which individuals vary greatly. There are now several molecular technologies for high throughput SNP detection. SNPs are used as markers for disease gene mapping, and currently play a central role in drug design in pharmacogenomics. Analyzing these data has provided a number of challenging statistical problems, in part because SNPs are not usually a random survey of molecular variation in the genome. DNA microarrays provide a way to study the relative expression levels of proteins in different biological backgrounds (e.g. cell cycle data, tumor presence/absence). Parallel assays of expression levels for many thousands of genes simultaneously results in highdimensional, noisy data. Problems involving image analysis, clustering and modeling expression profiles are central to many varied and important uses of arrays in human genetics and molecular biology.
We have outlined four possible workshops below. By the nature of the technologies involved, there are a number of overlapping topics. As in any other emerging science, the problems that are now of interest will probably be replaced by others by 2003, so these topics should be treated as illustrative.
Opening Tutorials/Kickoff
The program will open with a oneweek tutorial on "Tools for Model and Data Integration in the Genome Sciences" (including a Statistics Tutorial "Refresher in SPlus" that will provide useful background for the entire year program), followed by a brief minisymposium on "Information integration technologies for complex systems." The tutorial will be aimed at the postdocs, and others with a probability/statistics background. The purpose of the tutorial is 1) to prepare the IMA postdocs and other IMA participants for the Genomics program, 2) to provide graduate students and faculty from universities everywhere (particularly the IMA participating Institutions) with an entrée into modeling/data integration problems in the Genome Sciences and 3) to publicize to the wider community the importance and intellectual excitement involved in the understanding of these complex systems.
The Internet and other communication networks are growing and changing in such a way that they present a rapidly moving target for modeling and data collection and analysis. The problems associated with designing, engineering, and managing such rapidly moving and constantly evolving systems have shaped much of networking research in the past and are likely to play an even more important role in the future as the problems acquire a central element of scale, extending well beyond what has previously been considered.For example, there has been a great deal of interest and progress in the past decade in measuring, modeling and understanding the properties and performance implications of actual traffic flows as they traverse individual links or routers within the network. However, with the imminent deployment of novel scalable network measurement infrastructures and newlydesigned largescale network simulators, we will have access to a new generation of data sets of highestquality network measurements that are of unprecedented volume, are simultaneously collected from a very large number of points within the network, and have an extraordinary high semantic context.This transition from the traditional single link/routercentered view to a more global or networkwide perspective will have profound implications for trying to describe and understand the dynamic nature of largescale, complex internetworks such as the global Internet, where the interesting problems are those of interactions, correlations, and heterogeneities in time, space, and across the different networking layers.While these nextgeneration data sets can be fully expected to continue to reveal tantalizing variability, intriguing fluctuations, and unexpected behaviors, they will also raise many new data analysis and modeling issues and challenge the use of established and wellunderstood techniques.In particular, the problems of explaining why and how some of the observed phenomena occur, of predicting the stability and performance of truly largescale networks under alternative future scenarios, and of recommending longterm control strategies are certain to generate new research activities in the mathematical and physical sciences and will remain with us for the foreseeable future. Of course, by 2003, the important questions may look very different from the important questions today, but the characteristics of complex models and massive amounts of data will almost certainly remain for the foreseeable future.
The winter program will open with a short course and tutorial on "The Internet for Mathematicians" and "Measurement, Modeling and Analysis of the Internet." The purpose of the short course and tutorial is 1) to prepare the IMA postdocs and other IMA participants for the Communication Networks program, 2) to provide graduate students and faculty from universities everywhere (particularly the IMA participating Institutions) with an entrée into modeling/data integration problems in Communications Networks and 3) to publicize to the wider community the importance and intellectual excitement involved in the understanding of these complex systems.
This component of the program is concerned with advanced mathematical, statistical and computational methods in finance and econometrics. Finance has been profoundly influenced by relatively new ideas on how to measure the risk and return of investments. Real progress has come by combining new concepts in finance and riskmanagement with advanced mathematical modeling and an exponential increase in computing power. A more recent development has been the lowering of the cost of acquiring data and information via the Internet. Improved data access allows modelers to implement sophisticated systems, which can be used to make realtime decisions in terms of investing, managing risk, or allocating capital.
Mathematical ChallengesInverse Problems in Asset Pricing Theory. Financial Economics has a very elegant way of characterizing a system of prices which is consistent with noarbitrage (no free lunch): namely, the existence of a probability measure on future market scenarios such that any contingent claim can be priced as the expected value of its future cashflows. This result is due to K. Arrow and G. Debreu. In modern finance, the ArrowDebreu paradigm is used for pricing and hedging instruments that share the same underlying risks. These include, most prominently, derivative securities. Derivatives always exist in a universe in which the underlying asset or assets are present. The power of the ArrowDebreu measure is that it allows us (i) to price derivatives in relation to the underlying security and (ii) to make sure that these prices are not subject to arbitrages, i.e. that we are not systematically losing money by trading at certain levels. The other remarkable feature of the ArrowDebreu measures is that they form an "interpolation" between the prices of liquidly traded assets and less liquid assets for which price discovery is more difficult. In particular, the perturbation of the probability that characterizes the equilibrium gives useful information about the market risk of trading positions. The problem of interest is:
Construct ArrowDebreu probabilities that are consistent with concrete market situations involving several traded assets and multiple trading dates.
So far, only smalldimensional systems have been implemented. It is only in the last few years that we have enough computing power and theoretical understanding to begin to implement largescale systems. The development of mathematical and computational tools to solve this problem is very important since it is at the crossroads between AssetPricing Theory and financial applications of the theory. Its solution should drive the development of new modeling, statistical and computational methodology, and since similar inverse problems arise in other areas, methods developed here should find broader application.Development of new methods in the context of finance has the added benefit of ensuring that their validity will be thoroughly tested through implementation in the markets.
In the simplest models, prices are given as expectations of functions of a diffusion process.The problem then becomes to find a diffusion process satisfying several momenttype constraints. For example, one may be given m option prices and the characteristics of these contracts. The goal is to find a diffusion measure that is consistent with the observed prices (in the ArrowDebreu sense). This is a mathematically ill posed problem that is isomorphic to finding a probability measure from a few of its moments. Either no solution exists or there are many possible solutions. Continuous dependence on the data can be problematic.
Since the early 1990's several solutions have been suggested. Some are parametric in nature and exploit the structure of the equation in clever ways. Unfortunately, these approaches are restricted for the most part to model problems in one dimension and to parametric families of distributions that are not suitable for realistic problems. In reality, the models used by large brokerdealers in financial derivatives make use of multiple risk factors, so we are dealing with multidimensional diffusions and with complex constraints. The question then becomes:
Design stable numerical algorithms for selecting and calibrating financial models to market data that can be applied in the presence of multiple risk factors and many market constraints.
Scientific Interest. The scientific issues that arise in inverse problems in finance are not merely algorithmic. They touch upon the foundations of the field of financial economics and serve to validate or to invalidate ideas that remain untested in the markets. Here there is a big difference from physics and engineering. Whereas in the latter it is possible to repeat experiments under similar conditions and the models are basically mathematizations of physical laws, we know that no experiment in finance or economics can be reproduced exactly as in the past. We do not even know the relevant state variables, and consequently, the modeling of pricing probabilities and the selection problem become much more challenging and important than in most inverse problems in physics.
Mathematical ChallengesMonte Carlo Simulation: Asset Pricing, RiskManagement and Asset Allocation in Highdimensional Systems. This second area of problems has been exploding for several years, and recently, there have been several very important developments.
Develop a coherent analysis of Least Squares Monte Carlo algorithms for American options in high dimensional economics. Develop a theory for understanding numerical errors and statistical biases that arise from dynamic estimation of conditional expectations, earlyexercise dates, etc.
LargeScale Dynamic Asset Allocation Models. Since the intertemporal CAPM of Merton and Sharpe, people have been trying to apply dynamic programming ideas to solve allocation problems under different investment horizons and budget constraints. This theory seems to be OK but there are several elements that seem to indicate that there will be much more activity here. First, the academic papers assume that there are only one or two assets, that strategies are selffinancing and that utilities are homogeneous. All these assumptions are highly unrealistic. Despite the fact that the papers have been written and the (Nobel) prizes handed out, we expect that computers will finally allow us to actually run investments strategies which are diversified among dozens of assets with reasonable, complex scenarios and intertemporal reallocation according to reallife events.The goal is:
Develop platforms for largescale asset allocation models (~20 to 50 variables) that produce verifiable results. Include in these models the possibility of decision making by investors and statecontingent optimization. Include nonself financing portfolios.
The spring program will open with a oneweek tutorial. The purpose of the tutorial is 1) to prepare the IMA postdocs and other IMA participants for the Financial Engineering program, 2) to provide graduate students and faculty from universities everywhere (particularly the IMA participating Institutions) with an entrée into modeling and data integration problems in Financial Engineering and 3) to publicize to the wider community the importance and intellectual excitement involved in the understanding of these complex systems.

University of Wisconsin, Madison 

Courant Institute, NYU 

University of Illinois, Urbana 
 UC Berkeley 

Los Alamos National Labs 

MIT 

University of Wisconsin, Madison 

University of Southern California 

AT&T LabsResearch 
Scientists interested in long or short term visits can apply through the IMA general membership program or the IMA short term visit program (for visits connected to a workshop).
Name  Department  Affiliation  Period of Visit 

Scot Adams  School of Mathematics  University of Minnesota, Twin Cities  9/1/02  7/1/04 
Soohan Ahn  Department of Statistics  Seoul National University  9/8/03  2/19/04 
Montaz Ali  School of Computational and Applied Mathematics  University of the Witwatersrand  11/1/02  10/24/03 
Greg W. Anderson  School of Mathematics  University of Minnesota, Twin Cities  9/1/03  6/30/04 
HeeJeong Baek  Department of Mathematics  Seoul National University  3/13/04  6/30/04 
Peter Bank  HumboldtUniversität  3/28/04  5/7/04  
Maury Bramson  School of Mathematics  University of Minnesota, Twin Cities  9/1/03  6/30/04 
Rene Carmona  Operations Research and Financial Engineering  Princeton University  4/13/04  6/30/04 
Rohit Kallol Chatterjee  Department of Mathematics  University of Wisconsin, Madison  6/6/04  6/27/04 
Laura Chihara  Department of Mathematics  Carleton College  9/1/03  12/31/03 
Hi Jun Choe  Department of Mathematics  Yonsei University  12/29/03  2/28/04 
Wanyang Dai  Department of Mathematics  Nanjing University  1/4/04  3/8/04 
Hans Foellmer  Institut für Mathematik  HumboldtUniversität  4/10/04  7/10/04 
Shmuel Friedland  Department of Mathematics, Statistics, and Computer Science  University of Illinois, Chicago  9/3/03  6/30/04 
Anne Gundel  Institute for Mathematics  HumboldtUniversität  4/11/04  5/13/04 
Mark Handcock  University of Washington  11/3/03  11/24/03  
David C. Heath  Department of Mathematical Sciences  Carnegie Mellon University  4/10/04  5/29/04 
Ulrich Horst  Institut für Mathematik  HumboldtUniversität  4/14/04  6/27/04 
David R. Hunter  Department of Statistics  The Pennsylvania State University  11/1/03  11/22/03 
Naresh Jain  School of Mathematics  University of Minnesota, Twin Cities  9/1/03  6/30/04 
Christina Kendziorski  Department of Biostatistics and Medical Informatic  University of Wisconsin, Madison  9/15/03  12/31/03 
Mohammad Kazim Khan  Department of Mathematics  Kent State University  9/2/03  6/30/04 
Dohyun Kim  Seoul National University  9/8/03  2/25/04  
HyeRyoung Kim  Department of Mathematics  Seoul National University  1/6/04  3/13/04 
HyeRyoung Kim  Department of Mathematics  Seoul National University  4/26/04  8/8/04 
Hyejin Ku  Department of Mathematics & Statistics  York University  4/10/04  5/8/04 
Thomas G. Kurtz  Department of Mathematics  University of Wisconsin, Madison  9/8/03  5/31/04 
Jeong Hyun Lee  Department of Mathematics  Seoul National University  3/22/04  6/24/04 
Taerim Lee  Department of Information Statistics  Seoul National University  9/12/03  10/12/03 
Richard P. McGehee  School of Mathematics  University of Minnesota, Twin Cities  9/1/03  6/30/04 
Martina Morris  Department of Sociology and Statistics  University of Washington  11/3/03  11/24/03 
Michael Newton  Department of Statistics  University of Wisconsin, Madison  9/15/03  12/31/03 
Amir Niknejad  University of Illinois, Chicago  9/1/03  6/30/04  
Grzegorz A. Rempala  University of Louisville  9/2/03  6/30/04  
Luis Jose Roman  Department of Mathematical Sciences  Worcester Polytechnic Institute  5/2/04  5/31/04 
Fadil Santosa  Institute for Mathematics and its Applications  University of Minnesota, Twin Cities  9/1/97  8/31/04 
Arnd Scheel  School of Mathematics  University of Minnesota, Twin Cities  9/1/03  6/30/04 
Mihai Sirbu  Carnegie Mellon University  3/29/04  5/28/04  
Jacob Sterbenz  Princeton University  6/1/04  6/24/04  
Srdjan Stojanovic  Department of Mathematical Sciences  University of Cincinnati  3/28/04  5/29/04 
Michael Stutzer  Department of Finance  University of Colorado  5/17/04  6/11/04 
Peter Tankov  Centre de Mathématiques Appliquées  École Polytechnique  3/25/04  4/24/04 
Hui Wang  Division of Applied Mathematics  Brown University  1/1/04  5/31/04 
Stephen J. Willson  Department of Mathematics  Iowa State University  9/10/03  12/10/03 
Yuhong Yang  Iowa State University  9/1/03  6/30/04  
Ofer Zeitouni  School of Mathematics  University of Minnesota, Twin Cities  9/1/03  6/30/04 
Ilze Ziedins  Department of Statistics  University of Auckland  3/1/04  3/25/04 
Name  Affiliation (Current or last known) 

20032004  
Gerard Awanou  Mathematics, Statistics, and Computer Science Department University of Illinois, Chicago 
Karen Ball  Center for Communications Research Institute for Defense Analyses 
Antar Bandyopadhyay  Theoretical Statistics and Mathematics Unit Indian Statistical Institute 
Tim Garoni  School of Mathematical Sciences Monash University 
ChuanHsiang Han  Department of Quantitative Finance National Tsing Hua University 
Lea Popovic  Department of Mathematics & Statistics Concordia University 
Seminar:  IMA Complex Systems Seminar
Thursdays, 1:302:30 pm , 409 Lind Hall 
Connect With Us: 
© 2014 Regents of the University of Minnesota. All rights reserved.
The University of Minnesota is an equal opportunity educator and employer Last modified on October 06, 2011 