HOME    »    PROGRAMS/ACTIVITIES    »    Annual Thematic Program
October 20-24, 2003


Probability and Statistics in Complex Systems: Genomics, Networks, and Financial Engineering, September 1, 2003 - June 30, 2004

Max Alekseyev (Department of Computer Science, University of California, San Diego) maxal@cs.ucsd.edu

Genome Halving Problem (poster session)

Joint work with Pavel Pevzner.

Genome Halving Problem is motivated by an evolution mechanism that duplicates the entire genome. The result of such duplication, so-called perfectly duplicated genome, contains two identical copies of each chromosome. The genome then is a subject to reversal and/or translocation rearrangement operations. For given rearranged duplicated genome, Genome Halving Problem attempts to recover its closest perfectly duplicated ancestor. Solution to this problem is used as a building block for more sophisticated genome rearrangement algorithms.

Genome Halving Problem was first introduced and solved in a series of papers by Nadia El-Mabrouk and David Sankoff. Their algorithm is rather complex and, to the best of our knowledge, it was never implemented as a computer program. In our work we present a new simpler and more general algorithm for Genome Halving Problem as well as its implementation in C++.

Lars Arvestad (Stockholm Bioinformatics Center and Department Numerical Analysis and Computer Science, Royal Institute of Technology (KTH)) lars.arvestad@sbc.su.se  http://www.nada.kth.se/~arve

New Methods for Estimating Amino Acid Replacement Rates (poster sessionLong Version with Figure:   pdf    ps

Two new methods for estimating replacement rate matrices from protein sequence alignments are presented and shown to perform better than another recent method, Müller-Vingron's resolvent method, in a variety of settings. Furthermore, the best method is demonstrated to be robust on small datasets and practical also on very large datasets of real data. Neither short nor divergent sequence pairs have to be discarded, making the method economical with data.

Anne Bergeron (Département d'informatique de l'UQAM, Universite du Quebec a Montreal)  bergeron.anne@uqam.ca

Easy Ways to Clear Hurdles (poster session)    pdf    ps

Guillaume Bourque (Centre de Recherche Mathematiques, Universite de Montreal)  bourque@crm.umontreal.ca

A Comparative Approach for Multiple Gene Network Inference Using Time-Series Gene Expression Data
Long Version with Figure   pdf    ps
Slides:   html    pdf    ps    ppt

We present a method for gene network inference and revision based on time-series data. Gene networks are modeled using linear differential equations and a generalized stepwise multiple linear regression algorithm is used to recover the interaction coefficients. Our system was design for the recovery of gene interactions concurrently in many gene regulatory networks related by a graph or a tree. Suppose we are studying a certain regulatory network in different species of known phylogeny. We can think of the different networks as being related to each other in that way and use this information. Alternatively, we might be interested in the development stages of this network or we could be studying the same system but in different tissues related at a different level. The idea is that, given gene expression data for each species, or each stage of development, or each tissue, we seek to recover each individual network while minimizing a cost based on the differences along the edges of the graph or the tree. We show how this comparative framework allows new insights and facilitates the gene network inference process.

Fiona Brinkman (Simon Fraser University, Burnaby, BC, Canada)  brinkman@sfu.ca

Analysis of Horizontal Gene Transfers of Potential Relevance to Microbial Virulence
Slides:   html    pdf    ps    ppt

We have been using genome-wide bioinformatic approaches to identify horizontal gene transfers that are of interest for their potential role in bacterial virulence and the evolution of pathogenic microbes. Analyses of both bacteria-eukaryotic and bacteria-bacteria gene transfers are summarized, revealing possible patterns in the types of genes most often transferred between species. The implications are discussed, both in the context of the evolution of virulence and what is likely the most effective approach to control infectious disease agents.

Steven B. Cannon (Plant Biology Department, University of Minnesota, St. Paul, MN 55108, USA.) cann0010@tc.umn.edu

Distinguishing Orthologs from Paralogs by Integrating Comparative Genome Data and Gene Phylogenies (poster session)

Background: In eukaryotic genomes, most genes are members of gene families. When comparing genes from two species, therefore, most genes in one species will be homologous to multiple genes in the second. This often makes it difficult to distinguish orthologs (separated through speciation) from paralogs (separated by other types of gene duplication). Combining phylogenetic relationships and genomic position in both genomes helps to distinguish between these scenarios. This kind of comparison can also help to describe how gene families have evolved within a single genome that has undergone polyploidy or other large-scale duplications, as in the case of Arabidopsis thaliana and probably most plant genomes.

Results: We describe a suite of programs called OrthoParaMap that makes genomic comparisons, identifies syntenic regions, determines whether sets of genes in a gene family are related through speciation or internal chromosomal duplications, maps this information onto phylogenetic trees, and infers internal nodes within the phylogenetic tree that may represent local as opposed to speciation or segmental duplication. We describe the application of the software using three examples: the melanoma-associated antigen (MAGE) gene family on the X chromosomes of mouse and human; the 20S proteasome subunit gene family in Arabidopsis, and the major latex protein gene family in Arabidopsis.

Conclusion: OrthoParaMap combines comparative genomic positional information and phylogenetic reconstructions to identify which gene duplications are likely to have arisen through internal genomic duplications (such as polyploidy), through speciation, or through local duplications (such as unequal crossing-over). The software is freely available at http://www.tc.umn.edu/~cann0010/Software.html

Joint work with Georgiana May 1,2 and Nevin D. Young1,3.

1 Plant Biology Department, University of Minnesota, St. Paul, MN 55108, USA
2
Ecology, Evolution, and Behavior Department, University of Minnesota, St. Paul, MN 55108, USA
3 Plant Pathology Department, University of Minnesota, St. Paul, MN 55108, USA

Dimitra Chalkia (Department of Biology, The Pennsylvania State University, University Park, USA)  duc136@psu.edu

Phylogenetic Analysis of Formin Homology Proteins in Arabidopsis Thaliana and Oryza Sativa (poster session)

Joint work with Tatiana Bibikova, Simon Gilroy, Wojciech Makalowski.

The plant cell cytoskeleton plays an important role in many cellular processes, including cell polarity establishment and cytokinesis. Proteins that regulate cytoskeletal assembly are likely to be a part of the signaling cascade that governs plant cell morphogenesis. Formins are members of a large protein family that is defined by the presence of the highly conserved Formin Homology II (FH2) domain. In a wide range of organisms, including vertebrates, arthropods, nematodes and fungi, formins have been implicated in the regulation of cytoskeletal assembly and in the control of cytokinesis and cell polarity establishment and maintenance. The genomes of Arabidopsis thaliana and Oryza sativa contain putative formin-like proteins based on the presence of an FH2 domain. Arabidopsis thaliana formins have been tentatively sub-divided into two clades: Type I and Type II, based on the FH2 domain alignment. We have extended this analysis to cover both Arabidopsis and rice and have provided an evolutionary context for these plant formin families.Our phylogenetic analysis shows that formins are divided in two distinct clades in plants. This phylogenetic clustering is also supported by the stuctrural features of these proteins. This division of plant formins in two distinctive groups seems to predate the split of monocots/eudicots. The detailed evolutionary relationships of plant formins remain unclear. The placement of fungi formins at the basal position of the tree is in accordance with the most recent proposed phylogenetic scheme for eukaryotes. Animal and plant formins cluster together, and split into two major groups. This clustering may suggest that their last common ancestor had already at least two different types of formins.

Avril Coghlan (Department of Genetics, Trinity College Dublin, Ireland)  avril.coghlan@ucd.ie

Origins of Recently Gained Introns in Caenorhabditis    Slides:   html    pdf    ps    ppt

Joint work with Kenneth H. Wolfe.

The genomes of the nematodes Caenorhabditis elegans and C. briggsae both contain about 100,000 introns, of which about 6000 are unique to one species or the other. To study the origins of new introns, we used a rigorous method involving phylogenetic comparisons to animal orthologs and other nematode paralogs to identify cases where an intron content difference between C. elegans and C. briggsae was almost certainly caused by intron insertion rather than deletion. We identified 57 putative recently gained introns in C. briggsae and 112 in C. elegans. Novel introns have a stronger exon splice site consensus sequence than the general population of introns, and they show the same preference for phase 0 sites in codons over phases 1 and 2 as seen in the general population. More of the novel introns are inserted in genes that are expressed in the germline than expected by chance. As compared to matched control sets of C. briggsae introns, the novel introns in C. briggsae are more likely to contain an annotated repeat element (1.7-fold; P = 0.011), and the ends of the intron are more likely to be close to the ends of the repeat element (1.5-fold; P = 0.029). Similar but weaker trends are also seen in C. elegans novel introns. One family of C briggsae repeat elements, which is related to the Helitron class of putative nonautonomous transposons, is found in significantly more novel introns than reference introns (P < 1e-05). These results support the hypothesis that novel introns originate as a result of transposable element insertions into proto-splice site consensus sites in germline-expressed genes.

Ramana V. Davuluri (Human Cancer Genetics Program, Comprehensive Cancer Center, Department of Molecular Virology, Immunology & Medical Genetics, The Ohio State University, 420 W 12th Avenue, TMRF 524, Columbus, OH 43210, USA)  davuluri-1@medctr.osu.edu

Mammalian Promoter Database: A Computational Platform for Comparative Genomics of Mammalian Transcriptional Regulation (poster session)

Joint work with Hao Sun, Saranyan K. Palaniswamy, Twyla T. Pohar, and Victor Jin.

Transcription in mammalian cells is a highly complex process that involves multiple layers of general and gene-specific transcription factors. Although extensive molecular research has been providing important details about several transcription factors and their binding sites in the target gene promoters, the information generated over the years is highly fragmented. In order to better integrate this vast amount of information with the genome sequences, we have developed a new database called MPromDb (Mammalian Promoter Database), an information resource of mammalian gene regulatory regions. MPromDb (Version 1.0) contains 28,306 experimentally supported and 32,121 computationally annotated promoters, and mapping of 4,231 experimentally known binding sites, with links to published literature. Each promoter sequence in MPromDb is presented in the form of an image map with annotations of first exon, cis-regulatory elements and plots of CpG scores, with interactive contextual menus for easy navigation. MPromDb provides a platform for comparative genomics of transcriptional regulation, since promoters of orthologous genes are linked with each other and displayed in the same record. The current version contains 9,331 human-mouse orthologous pairs. The database can be searched for promoter sequences, transcription factors, and their direct target genes, through a user-friendly web interface at http://bioinformatics.med.ohio-state.edu/MPromDb.

Dannie Durand (Departments of Biological Sciences and Computer Science, Carnegie Mellon University)  durand@cmu.edu

Gene Clusters in Comparative Genomics: Accident or Design?    Slides:   pdf

Large scale gene duplication, the duplication of whole genomes and subchromosomal regions, is a major force driving the evolution of genetic functional innovation. Whole genome duplications are widely believed to have played an important role in the evolution of the maize, yeast and vertebrate genomes. Two or more linked clusters of similar genes found in distinct regions on the same genome are often presented as evidence of large scale duplication. However, as the gene order and the gene complement of duplicated regions diverge progressively due to insertions, deletions and rearrangements, it becomes increasingly difficult to distinguish remnants of common ancestral gene order from coincidental similarities in genomic organization. In this talk, I present computational approaches to validating gene clusters in comparative genomics.

Evan Eichler (Department of Genetics, Case Western Reserve University)  eee@po.cwru.edu

Recent Segmental Duplications and the Fragile Breakage Model of Human Genome Evolution

It has been estimated that 5% of the human genome consists of interspersed duplicated material that has arisen over the last 30-40 million years of evolution. A large proportion of these duplications exhibits an extraordinarily high degree of sequence identity at the nucleotide level (>95%) and are interspersed over large genomic distances (>1 Mb). The distribution of these duplications is non-random in the human genome. Through processes of non-allelic homologous recombination, these same regions are targets for rapid evolutionary turnover creating hotspots of mammalian chromosomal evolution and sites of genomic instability associated with disease within the human population. Preliminary analyses have suggested that the amount of segmental duplication may be a relatively unique property of our genome. We have developed systematic experimental and computational tools to examine duplication content from human and other sequenced vertebrate species. An analysis of the breakpoints of these duplications shows a significant enrichment of Alu-repeat elements, providing new insight into their mechanism of origin and preeminence within the primate genome. In additions based on our analysis of syntenic breakpoints between the mouse and human genome, we find that 25% (122/461) of mouse-human synteny breakpoints contain 10 kb of duplicated sequence. This association is highly significant (P<0.0001) when compared to a simulated random breakage model. These data support a non-random model of chromosomal evolution that implicates a predominance of both small-scale duplication and large-scale evolutionary rearrangements within specific regions of the human genome. Such properties should be considered when trying to reconstruct the evolutionary history of mammalian genomes.

Nadia El-Mabrouk (Departement of Computer Science, University of Montreal)  mabrouk@IRO.UMontreal.CA

Reconstructing the Ancestor of a Modern Genome with Multigene Families
Slides:   html    pdf    ps    ppt

Given a particular model of evolution and an optimization criterion, the problem is to recover an ancestor of a modern genome modeled as an ordered sequence of signed genes. One direct application is to infer gene orders at the ancestral nodes of a phylogenetic tree. Implicit in the rearrangement literature is that each gene is present exactly once in each genome. This hypothesis is clearly unguaranteed for divergent species containing several copies of highly paralogous and orthologous genes. In this presentation, we consider models of genome evolution that take multigene families into account.

We first present a genome-wide doubling event. Genome duplication is an important source of new gene functions and novel physiological pathways. Originally (ancestrally), a duplicated genome contains two identical copies of each chromosome, but through genomic rearrangements, this simple doubled structure is disrupted. At the time of observation, each of the chromosomes resulting from the accumulation of rearrangements can be decomposed into a succession of conserved segments, such that each segment appears exactly twice in the genome. We present exact algorithms for reconstructing the ancestral doubled genome in linear time, minimizing the number of inversions and/or translocations required to derive the observed order of genes along the present-day chromosomes.

The second part of the presentation will concern a model of duplications at a regional level. In this model, chromosomal regions (one or more genes) are duplicated from one location of the genome to another. Studies from human genomic sequence indicate that many of these segments have been duplicatively transposed in very recent evolutionary time. The implicit hypothesis is that a genome with multigene families has an ancestor containing exactly one copy of each gene that has evolved through a series of duplication transpositions and substring inversions. We present an algorithm for reconstructing an ancestral genome giving rise to the minimal number of duplication transpositions and reversals. We then show how to use this algorithm to recover gene orders at the ancestral nodes of a phylogenetic tree.

Allan G. Force (Benaroya Research Institute at Virginia Mason)  force@benaroyaresearch.org

Origin of Subfunctions and Modular Genes
Slides:   html    pdf    ps    ppt

Evolutionary explanations for the origin of modular genetic and developmental pathways almost always invoke some sort of long-term selective advantage, e.g., as a functional prerequisite to the evolution of phenotypic complexity or as an enhancer of evolvability. However, simple theoretical results demonstrate that even in the absence of any direct selective advantage, genetic modularity can spontaneously emerge through the acquisition of new gene subfunctions. Provided that population size is sufficiently small, random genetic drift and mutation can conspire to produce changes in the underlying genetic architecture of a species without necessarily altering the phenotype. Extensive genetic modularity may then accrue in a near-neutral fashion in permissive population- genetic environments, potentially opening novel pathways to morphological evolution. These results provide additional support for the proposition that many aspects of gene and genome complexity in multicellular eukaryotes may have arisen passively as population size reductions accompanied an increase in organism size, with the adaptive exploitation of such complexity occurring secondarily.

Anant Godbole (Mathematics Department, East Tennessee State University)  godbolea@mail.etsu.edu

Distributional Approximations in Genome Reconstruction (poster session)    pdf    ps

Steve Goldstein (Laboratory for Molecular and Computational Genomics, University of Wisconsin-Madison)  steveg@lmcg.wisc.edu

Graph Compression Algorithms for Efficiently Comparing Genomes (poster session)

Joint work with Adam Briska, Shiguo Zhou, and David C. Schwartz.

Optical Mapping is a system capable of producing genome-wide ordered restriction maps. Such a restriction map provides a description of an organism's genome, a description not unlike the sequence of the genome, albeit at a coarser resolution. Just as comparisons of whole genome sequences are leading to an exciting array of biological advances, comparisons of optical maps will provide a wealth of valuable information.

Now that optical mapping has entered the high-throughput era, there is a need for software to compare restriction maps of closely related organisms. We present an algorithmic framework for this task, closely modeled after DNA sequence comparison algorithms. The major challenge lies in adapting the exact matching phase of the sequence algorithms to handle the imprecision inherent in determining restriction fragment lengths. Our graph-based approach not only overcomes this challenge, but also can be applied to sequence algorithms, providing advantages over suffix-tree approaches.

Josefa González (Departament de Genètica i Microbiologia, Universitat Autònoma de Barcelona, 08193, Bellaterra (Barcelona), Spain)  icgm2@blues.uab.es

Duplicative and Conservative Transpositions of the Larval Serum Protein 1 Genes in the Genus Drosophila (poster session)

Joint work with Ferran Casals and Alfredo Ruiz.

In the genus Drosophila, homologous chromosomal elements show a remarkable conservation of gene content but not of gene order, indicating that paracentric inversions are the most common kind of genomic change. Detailed physical maps of chromosomes X, 2 and 4 of Drosophila repleta and D. buzzatii, both belonging to the Drosophila subgenus, were constructed and their gene rearrangements compared with the homologous chromosomes in D. melanogaster. We estimated that 393 paracentric inversions have been fixed in the whole genome since the divergence between D. repleta and D. melanogaster, that amounts to an average rate of 0.053 disruptions/Mb/myr. Only two exceptions to the chromosomal homologies were found and we have further analyzed one of them: the transposition of the Larval serum protein 1 (Lsp1) genes. Comparative molecular analysis of the transposed genes and their flanking regions can help to elucidate the time, direction and mechanism of gene transposition. In the D. melanogaster genome, three Lsp1 ge es, alpha, beta and gamma, are present and each is located on a different chromosome. We have characterized the molecular organization of Lsp1 genes in D. buzzatii and in D. pseudoobscura, a species of the Sophophora subgenus. Our results show that only two Lsp1 genes (beta and gamma) exist in these two species suggesting that the duplicative transposition generating Lsp1alpha, took place <30 myr ago in the D. melanogaster lineage. D. buzzatii and D. pseudoobscura show the same chromosomal localization and genomic organization, different from that of D. melanogaster for the Lsp1beta and Lsp1gamma genes. Thus we conclude that this is likely to be the ancestral organization and both genes must have conservatively transposed in the D. melanogaster lineage <30 myr ago. Finally, the duplicative transposition which gave rise to Lsp1beta and Lsp1gamma must have ocurred before the divergence of the three Drosophila species (40-62 myr ago). Overall, at least two duplicative and two conservative transpositions are necessary to explain the present chromosomal distribution of Lsp1 genes in the three Drosophila species. In D. buzzatii and D. pseudoobscura, Lsp1beta and Lsp1gamma are localized close to snRNA or tRNA genes. RNA genes have been implied in the origin of chromosomal rearrangements in prokaryotes and yeasts and we find clear evidence for a role of snRNA genes in the transposition of Lsp1beta genes in Drosophila. Analysis of the 5' non coding regions of the Lsp1beta and Lsp1gamma genes has led to identify the putative cis-acting regulatory regions of these genes which seemingly transposed along with the coding sequences.

Roderic Guigó (Institut Municipal d'Investigacio Medica (IMIM/UPF/CRG))  rguigo@imim.es

Comparative Gene Prediction
Slides:   html    pdf    ps    ppt

Comparative genomics is emerging as a powerful tool to characterize complex genomes. Gene prediction, in particular, has benefited from the availability of genome sequences from organisms across the whole eukaryotic spectrum. The comparison of the human and mouse genome sequences, for instance, has contributed substantially to refine the gene content of the human (and mouse) genomes. In my talk, I will stress how comparative genomes may be particularly useful to identify genes which deviated from the standard characteristics, and that, for this reason, may escape identification by other means.

Tzvika Hartman (Department of Computer Science and Applied Mathematics, Weizmann Institute of Science, Rehovot, Israel)  tzvi@wisdom.weizmann.ac.il

A Simpler 1.5-Approximation Algorithm for Sorting by Transpositions  
Extended Version:   pdf    ps

Joint work with Ron Shamir (School of Computer Science, Tel-Aviv University, Tel-Aviv, Israel).

In this work we study the problem of sorting by transpositions. First, we prove that the problem of sorting circular permutations by transpositions is equivalent to the problem of sorting linear ones. Hence, all algorithms for sorting linear permutations by transpositions can be used to sort circular permutations. Then, we derive our main result: A new quadratic 1.5-approximation algorithm, which is considerably simpler than the extant algorithms of Bafna and Pevzner (1998) and Christie (1999). Thus, the algorithm achieves running time which is equal to the best known, with the advantage of being much simpler. Moreover, the analysis of the algorithm is significantly less involved, and provides a good starting point for studying related open problems.

Elizabeth Ann Housworth (Departments of Mathematics and Biology, Indiana University)  ehouswor@indiana.edu

Measures of Conserved Synteny
Slides:   html    pdf    ps    ppt

Measures of conserved synteny are important for estimating the relative rates of chromosomal evolution in various lineages. We present a natural way to view the synteny conservation between two species from an Oxford grid--an r x c table summarizing the number of orthologous genes on each of the chromosomes 1 through r of the first species that are on each of the chromosomes 1 through c of the second species. This viewpoint suggests a natural statistic, which we call syntenic correlation, designed to measure the amount of synteny conservation between two species. This measure allows syntenic conservation to be compared across many pairs of species. We also discuss incorporating the dependency of the numbers of orthologues observed in the chromosome pairings between the two species into the estimates of the true number of conserved syntenies given the observed number of conserved syntenies.

Jens Lagergren (SBC (Stockholm Bioinformatics Center), & KTH (Kunliga Tekniska Högskolan)  jensl@nada.kth.se  http://www.nada.kth.se/~jensl/

Bayesian Gene/Species Tree Reconciliation and Orthology Analysis Using MCMC

Comparative genomics in general and orthology analysis in particular are becoming increasingly important parts of gene function prediction. Previously, orhtology analysis and reconciliation has been performed only with respect to the parsimony model. This discards many plausible solutions and sometimes precludes finding the correct one. In many other areas in bioinformatics probabilistic models have proven to be both more realistic and powerful than parsimony models.

We introduce a probabilistic gene evolution model based on a birth-death process in which a gene tree evolves "inside" a species tree. Based on this model, we develop a tool with the capacity to perform practical orthology analysis, based on Fitch's original definition, and more generally for reconciling pairs of gene and species trees. Our gene evolution model is biologically sound and intuitively attractive. We develop a Bayesian analysis based on MCMC which facilitates approximation of an a posteriori distribution for reconciliations. That is, we can find the most probable reconciliations and estimate the probability of any reconciliation, given the observed gene tree. This also gives a way to estimate the probability that a pair of genes are orthologs. To the best of our knowledge, this is the first successful introduction of this type of probabilistic methods, which flourish in phylogeny analysis, into reconciliation and orthology analysis.

The MCMC algorithm has been implemented and performs very well on synthetic as well as biological data. Using standard correspondences, our results carry over to allele trees as well as biogeography.

Bret Larget (Departments of Statistics and of Botany, University of Wisconsin - Madison)  larget@stat.wisc.edu

A Statistical Approach to the Estimation of Phylogeny from Genome Arrangements
Slides:   pdf

The determination of evolutionary relationships is a fundamental problem in evolutionary biology. Genome arrangement data offers a source of information for estimating phylogenetic trees that may be especially useful for distantly related species. A statistical approach to phylogenetic information is concerned with assessment of uncertainty in estimated phylogenetic trees. We describe a Bayesian framework for phylogenetic inference from genome arrangement data using Markov chain Monte Carlo and discuss our results on several data sets.

Emmanuelle Lerat (Department Ecology and Evolutionary Biology, University of Arizona)  lerat@email.arizona.edu

Lateral Gene Transfers and Organismal Phylogeny in Bacteria: Implications for Ancestral Genome Reconstruction

Genome reconstruction is of particular interest from a biological perspective. This knowledge can illuminate the history of events that led to the present contents and organization of genomes. The principle of reconstruction methods is the inference of rearrangements that occurred during the history of the genome. This makes the strong assumption that genes are faithfully transmitted with their genome through generations. However in bacteria, lateral (or horizontal) gene transfers (LGT) are known to be very numerous. LGT might be an obstacle in the attempt to establish genome history, because some homologous genes may be transmitted between different species. It has even been argued that LGT may prevent the establishment of organismal relationships based on individual gene phylogenies. Thus to reconstruct ancestral genomes in bacteria seems to be particularly hazardous unless LGT is taken into account. It is therefore very important to test the hypothesis of vertical transmission of the genes used in genome reconstruction. In order to determine the impact of LGT on the potential organismal phylogeny, an approach to multigene phylogeny using complete genomes is necessary to identify the genes that have been, without ambiguity, vertically transmitted and that are thus good candidates to be used in genome reconstruction. This will allow a real biological interpretation of the genome reconstruction but also facilitate the reconstruction itself.

Michael Lynch (Department of Biology, Indiana University, Bloomington, IN) mlynch@bio.indiana.edu  http://www.bio.indiana.edu/facultyresearch/faculty/Lynch.html

The Origins of Genome Complexity
Slides:   html    pdf    ps    ppt

Complete genomic sequences from diverse phylogenetic lineages reveal striking increases in genome complexity across the prokaryote to unicellular eukaryote to multicellular eukaryote boundaries. The changes include gradual growth in gene number resulting from the retention of duplicate genes, more abrupt increases in the abundance of spliceosomal introns and mobile genetic elements, and enhanced modularity of gene regulation. A case can be made that many of these changes emerged passively in response to substantial long-term population-size reductions that accompanied increases in organism size and magnified the power of random genetic drift. Under this model, much of the restructuring of eukaryotic genome organization and the roots of many aspects of organismal complexity were initiated by nonadaptive processes. Although the mutational changes necessary for genomic modification are initiated by molecular processes, the population-genetic environment ultimately defines the permissible paths of evolution. The simple genomes of most microbial species can be understood in this context, without invoking direct selection for streamlined genomes, and direct selection for complexity need not be invoked to explain genomic expansion in multicellular species.

Robert (Bob) Mau (Departments of Animal Health and Biomedical Sciences/Oncology University of Wisconsin-Madison)  robertm@genome.wisc.edu

Inferring Orthologous Regions via a Pseudo-Gibbs Sampler: Finding the Pieces of the Rearrangement Puzzle (poster session)   pdf    doc

Joint work with Aaron Darling, Frederick R. Blattner, and Nicole T. Perna1.

Aoife McLysaght (Department of Genetics, Trinity College Dublin, Ireland)  amclysag@uci.edu

Poxviruses and Adaptive Genome Evolution
Slides:   html    pdf    ps    ppt

We used complete sequence from twenty poxviruses to investigate the evolution of these virus genomes. We examined the pattern of genome content and genome arrangement evolution in the context of the virus phylogeny. We also examined the patterns of positive selection acting on genes in these genomes. We show that the rate of genome evolution is not constant over time, and that it may be possible to relate patterns of genome evolution and adaptive evolution acting on genes.

Daniel P. Miranker (Department of Computer Sciences, University of Texas - Austin)  miranker@cs.utexas.edu

Application of MoBIoS for Conserved Primer Pair Discovery (poster session)

Joint work with Weijia Xu, Wenguo Liu, and C. Randal Linder.

MoBIoS, a Molecular Biological Information System is a next generation database management system focused on scalable retrieval and mining of unorthodox biological data types that are poorly supported by relational database systems. MoBIoS comprises built-in data types for biological sequences and Mass Spectra. The MoBIoS storage manager extends traditional database systems by including built-in support for hierarchical clustering and nearest-neighbor and range search in metric spaces. In addition to built-in metrics to support sequence homology and protein identification, users may add their own metrics.

We report on the first biological application of MoBIoS; a comparative study of the entire genomes of the plants rice and Arabidopsis to determine conserved pairs of strings of DNA that could be used to prime polymerase chain reactions (PCRs). Identification of such set of paired conserved primers would allow amplification of evolutionarily homologous DNA regions in a taxonomically broad set of seed plants. The ability to amplify homologous regions in a widely divergent set of species has a number of applications, e.g., phylogenetic reconstruction and comparison of protein evolution in a broad set of organisms. Ultimately, this approach to identifying conserved primer pairs could provide the community of systematists with a universal set of DNA sequences that can be used for assembling the tree of life.

William J. Murphy (SAIC-Frederick, Inc., Laboratory of Genomic Diversity, National Cancer Institute Frederick, Maryland 21702)  murphywi@ncifcrf.gov

Reconstructing the Genomic Architecture of Mammalian Ancestors Using Multispecies Comparative Maps

Rapidly developing comparative maps in selected mammal species are providing an opportunity to reconstruct the genomic architecture of mammalian ancestors and study rearrangements that transformed this ancestral genome into existing mammalian genomes. Here we apply the recently developed Multiple Genome Rearrangement algorithm (MGR) to human, mouse, cat and cattle comparative maps (with 311-470 shared markers) to impute the ancestral mammalian genome. Reconstructed ancestors consist of 70-100 conserved segments shared across the genomes that have been exchanged by rearrangement events along the ordinal lineages leading to modern species genomes. Genomic distances between species, dominated by inversions (reversals) and translocations, are presented in a first multispecies attempt using ordered mapping data to reconstruct the evolutionary exchanges that preceded modern placental mammal genomes.

Joint work with Guillaume Bourque (Centre de Recherches Mathématiques, Université de Montréal, Montréal, Canada H3C 3J7), Glenn Tesler, Pavel Pevzner (Department of Computer Science and Engineering, University of California, San Diego La Jolla, California 92093-0114), and Stephen J. O'Brien (Laboratory of Genomic Diversity, National Cancer Institute Frederick, Maryland 21702).

Luay Nakhleh (Department of Computer Sciences, The University of Texas at Austin)  nakhleh@cs.utexas.edu

Reconstructing Reticulate Evolution in Species (poster session)

In 1997, Wayne Maddison made an important observation that led to a separate analysis approach for phylogeny reconstruction. In his seminal paper, Maddison observed that gene trees that are related by reticulation can be combined into a network via the computation of the minimum number of certain branch moves; this number is called the SPR (for Subtree Prune and Regraft) distance. The two main challenges for Maddison's approach are

(1) computational: computing the SPR distance between two trees is hard.

(2) systematic: in practice, it is very hard to obtain the correct gene trees.

In this poster we present our solutions to these two challenges. We address phylogenetic networks with constrained reticulation. For such networks, and trees induced by them, we present an efficient algorithm for measuring the SPR distance, as well as reconstructing the network from the given trees. We address the systematic challenge by considering a set of "good" gene trees instead of a single gene tree. We present results from extensive simulation studies that we conducted. Those results show a significant improvement of our method over Maddison's, as well as a clear outperformance over methods based on combined analysis of datasets.

This is a joint work with Tandy Warnow and Randy Linder.

Nikolas Nikolaidis (Institute of Molecular Evolutionary Genetics and Department of Biology, Pennsylvania State University, University Park, PA 16802, USA)  nxn7@psu.edu

Evolution of the Hsp70 Gene Superfamily in Two Sibling Species of Nematodes Caenorhabditis elegans and C. briggsae (poster session)

Joint work with Masatoshi Nei.

The Hsp70 gene superfamily of C. briggsae was characterized in an attempt to investigate the evolutionary relationships with the respective one of its sibling species C. elegans. The phylogenetic analyses included also genes from Drosophila melanogaster and Saccharomyces cerevisiae to clarify the long-term evolution of hsp70s. The Hsp70s are classified into three monophyletic groups according to their sub-cellular localization, namely, cytoplasm (CYT), endoplasmic reticulum (ER) and mitochondria (MT). The Hsp110 genes can be classified into the polyphyletic CYT group and the monophyletic ER group. The two nematode species encode two Hsp70 and two Hsp110 proteins localized in the ER and their highly heat-inducible genes contain introns. The different Hsp70 and Hsp110 groups appear to evolve following the model of independent or divergent evolution. These models can also explain the evolution of the ER and MT genes. On the other hand, the CYT genes are divided into heat-inducible and constitutively expressed genes. The constitutively expressed genes probably have evolved by the birth-and-death process and the rates of gene birth-and-death are different among all organisms studied. The heat-inducible genes show an intra-species phylogenetic clustering, suggesting sequence homogenization, probably by gene conversion-like events. In addition, these genes show high levels of sequence conservation in both intra- and inter-species comparisons, and in most comparisons the amino acid sequence similarity was higher than the nucleotide. These results suggest that purifying selection also played a crucial role in sequence conservation of the Hsp70s. Therefore, we suggest that the CYT heat-inducible genes have apparently followed a mixed evolutionary pattern with a combination of purifying selection, birth and death, and gene conversion-like events.

Stephen J. O'Brien (Chief Laboratory of Genomic Diversity, National Cancer Institute-Frederick)  obrien@ncifcrf.gov

The Landscape of Comparative Genomics in Mammals

Dense genetic maps of human, mouse and rat genomes that are based on coding genes, microsatellite and single nucleotide polymorphism (SNP) markers have been complemented by precise gene homologue alignment with moderate resolution maps of livestock, companion animals and additional mammal species. Comparative genetic assessment expands the utility of these maps in gene discovery, in functional genomics, and in tracking the evolutionary forces that sculptured the genome organization of modern mammalian species.

Ross Overbeek (Fellowship for Interpretation of Genomes-FIG)  Ross@theFIG.info

Exploiting Gene Clusters to Curate Annotations
Slides:   html    pdf     ps    ppt

Previously, we argued that gene clustering on prokaryotic genomes was the key to locating "missing genes," and we demonstrated that the technique worked remarkably well. The use of clusters is also the key to straightening out many of the assignments that could not be made precisely based only on similarities and motifs. We will consider the case of gene clusters related to leucine degradation as an example; they occur in phylogentically diverse organisms, and many of the genes involved currently have inaccurate or imprecise annotations. Comparative analysis of clusters, as well as occurrence profiles, can be used to methodically construct chains of assignments that follow from a few basic observations. This sets the stage where a single carefully chosen wet lab confirmation can confirm or reject a large number of assignments, often removing ambiguities from tens if not hundreds of genes.

Pavel A. Pevzner (Department of Computer Science and Engineering, University of California at San Diego)  ppevzner@cs.ucsd.edu  http://www-cse.ucsd.edu/users/ppevzner/

Transforming Men into Mice: Lessons from Human and Mouse Genomic Sequences

Despite some differences in appearance and habits, men and mice are genetically very similar. In a pioneering paper, Nadeau and Taylor, 1984 estimated that surprisingly few genomic rearrangements (about 200) have happened since the divergence of human and mouse 75 million years ago.

The genomic sequences of human and mouse provide evidence for a larger number of rearrangements than previously thought and shed some light on previously unknown features of mammalian evolution. In particular, they provide evidence for extensive re-use of breakpoints from the same relatively short regions and reveals a great variability in the rate of micro-rearrangements along the genome. Our analysis also implies the existence of a large number of very short ``hidden'' synteny blocks that were invisible in comparative mapping data and were ignored in previous studies of chromosome evolution. These results suggest a new model of chromosome evolution that postulates that breakpoints are chosen from relatively short fragile regions that have much higher propensity for rearrangements than the rest of the genome.

This is a joint work with Glenn Tesler.

Ron Y. Pinter (Department of Computer Science, Technion, Israel Institute of Technology)  pinter@csa.cs.technion.ac.il

Evaluating a Class of Length-Sensitive Algorithms for Sorting by Reversal
Slides:    html    pdf    ps    ppt

Sorting by reversal (SBR) has been used extensively in comparative genomic studies [3]. Traditionally, bioinformaticians have been trying to minimize the number of reversals and they evaluate results by looking at the trace generated by the algorithm and asking whether it makes biological sense. We have introduced a length sensitive cost measure in an attempt to model the likelihood of reversals based on their length. In this model the cost f(x) of each reversal depends on the length, x, of the reversed sequence; the overall cost of the SBR process is the total of the individual reversals costs.

Initially [4] we looked at f(x)=x, offering a QuickSort-like algorithm which guarantees a provably good approximation of the minimal SBR cost (finding the minimal cost is NP-hard). In response, several biologists suggested we look at the family of functions f(x)=x**alpha. We have developed a class of algorithms [1] that find an approximate cost for any positive value of the exponent alpha, but the question of which value of alpha is best is of great interest.

We decided to make this evaluation by using the cost of sorting one genome to another as a distance between the genomes that is fed to a tool that builds phylogenetic trees, and then compare the results to evolutionary trees found using other methods. This gives rise to numerous methodical and algorithmic issues, such as:
- How many common genes are necessary to draw meaningful conclusions?
- How do we deal with duplicate genes?
- If the number of common genes for the whole dataset under study is too low
- how do we put together partial results (i.e. combining trees that were built on subsets of the sample) and how small can the subsets be?
- Do we really need to rebuild the whole tree or can we accumulate the scores of matches of the partial trees with the reference tree?
- What similarity score between trees is appropriate for this study?
- How do we cope with the fact that our algorithms produce only approximate costs?
But the ultimate question is - how do we scan for the best value of alpha?

The poster will describe the method and the results on two datasets, including the one from [2] which includes 15 genomes, and discuss their merits.

References

[1] Michael A. Bender, Dongdong Ge, Simai He, Haodong Hu, Ron Y. Pinter, Steven Skiena, and Firas Swidan. Improved Bounds on Sorting with Length-Weighted Reversals. To appear in the Proceedings of the ACM-SIAM Symposium on Discrete Algorithms (SODA'04), January 2004.

[2] William Martin, Tamas Rujan, Erik Richly, Andrea Hansen, Sabine Cornelsen, Thomas Lins, Dario Leister, Bettina Stoebe, Masami Hasegawa,| and David Penny. Evolutionary analysis of Arabidopsis, cyanobacterial, and chloroplast genomes reveals plastid phylogeny and thousands of cyanobacterial genes in the nucleus. Proc. Natl. Acad. Sci. USA. September 17, 2002; 99 (19): 12246^Ö12251.

[3] Pavel A. Pevzner: "Computational Molecular Biology - an Algorithmic Approach", MIT Press, 2000.

[4] Ron Y. Pinter and Steven Skiena. Genomic Sorting with Length-Weighted Reversals. Genome Informatics 13: 103-111 (2002).

Joint work with Michael A. Bender*, Yaniv Berliner**, Dongdong Ge*, Simai He*, Haodong Hu*, Michael Shmoish**, Meir Shoham**, Steven Skiena*, and Firas Swidan**.

* Dept. of Computer Science, SUNY Stony Brook, NY 11794-4400.
** Dept. of Computer Science, Technion, Israel Institute of Technology, Haifa 32000, Israel

Igor V. Sharakhov (Center for Tropical Disease Research and Training, University of Notre Dame, Notre Dame, IN 46556-0369, USA)  isharahk@nd.edu

High Rates of Genome Rearrangements in Malaria Mosquitoes, Anopheles gambiae and A. funestus
Slides:   html    pdf    ps    ppt

The rates of chromosomal evolution vary among different genomic segments and eukaryotic lineages [1]. A comparative genomic study between Drosophila melanogaster and Anopheles gambiae shows extensive reshuffling of gene order within chromosomes [2]. Genus Drosophila has a very high rate of paracentric inversions [3]. Our study determines rates of chromosomal rearrangement in genus Anopheles. Anopheles gambiae and A. funestus, important vectors of malaria in tropical Africa, are in the same subgenus and diverged about as recently as humans and chimpanzees (~5 million years ago) [4]. Using fluorescence in situ hybridization (FISH), we mapped A. funestus cDNA clones on the five arms of the polytene chromosome complement. Of 157 cDNAs used as probes, 116 mapped to single chromosomal locations on the A. funestus cytogenetic map, and the remainder hybridized in multiple locations. Those 116 cDNAs were mapped in silico to the completely sequenced A. gambiae genome. The relative positions of sequences with unique map locations in both species support the hypothesized chromosome arm homologies and the reciprocal whole arm translocation between 2L and 3R, postulated previously on the basis of relative length and banding pattern [5]. Correspondence between chromosome arms was contradicted by only two of the cDNAs examined in this study. Within corresponding arms, paracentric inversions have had a major impact on genome architecture since the divergence of these species. Gene order has not been preserved along the length of any chromosome arm, although there are conserved segments in some regions near centromeres where the rate of meiotic recombination may be reduced. Inversions have involved large as well as relatively small chromosomal segments. One of three small inversions at the distal end of 2R includes a rearrangement involving the 8C region in A. gambiae that contains the major Plasmodium-refractoriness locus Pen1 [6]. What has been the extent of rearrangement of gene order between these species? The number of inversion events can be estimated by considering the mean length of conserved segments, because this length decreases with each inversion fixed since the divergence of A. gambiae and A. funestus from a common ancestor. The method of Nadeau and Taylor [7] was applied to estimate mean lengths of all conserved segments in the genome, based on the nucleotide distance in A. gambiae between the outermost markers that defined the segments observed in our sample. An assumption of the method, that rearrangements fixed during evolution are randomly distributed in the genome, seems unlikely given the extraordinary concentration of polymorphic inversions on 2R in both lineages. Of eight polymorphic inversions described in A. gambiae, seven occur on chromosome 2R [8]. Similarly, 11 of 15 polymorphic inversions found in A. funestus involve 2R [9]. Accordingly, we assessed each arm independently. The estimated mean lengths of all conserved segments on each arm, defined with respect to A. gambiae, were X, 2.0 ± 0.2 megabases (Mb); 2R, 0.9 ± 0.2 Mb; 2L, 2.2 ± 0.4 Mb; 3R, 2.2 ± 1.0 Mb; and 3L, 1.1 ± 0.4 Mb. In a slight departure from Nadeau and Taylor [7], each rearrangement was assumed to be an inversion requiring two disruption events. Therefore, n inversions result in 2n + 1 conserved segments. The number of inversions on each arm was 5 ± 1, 36 ± 9, 11 ± 3, 11 ± 3, and 19 ± 5, respectively. Assuming a divergence time of 5 million years [4], the rate of fixation per My for each chromosome arm can be estimated as 0.5, 3.6, 1.1, 1.1, and 1.9, respectively (or 7 when estimated across the genome). When normalized to account for differences in chromosome length, the number of inversions per Mb per My for X, 2R, 2L, 3R, and 3L was estimated as 0.023, 0.057, 0.022, 0.021, and 0.044, respectively (0.031 genome-wide). This rate is even more extreme than the genome-wide estimate for Drosophila [3]. Moreover, our results indicate that 2R has a higher rate of rearrangement than other arms. It is clear that tightly linked genes in A. gambiae are unlikely to be similarly linked in A. funestus, particularly on 2R. The estimate of mean conserved segment length derived for each arm can be used to predict the probability of linkage in A. funestus, given the known distance between genes in A. gambiae and the assumption of random distribution of breakpoints [7]. As an example, the probability that genes 1 Mb apart on 2R in A. gambiae are linked on 2R in A. funestus is only 0.31. Polymorphic inversions on chromosome 2R are widespread within the A. gambiae and A. funestus and are believed to indicate adaptations to different environmental niches [8, 9]. Identification of genes encoded within these inversions could provide clues to factors determining mosquito behavior and vectorial capacity. Thus, the main features of genome rearrangements in malaria mosquitoes, A. gambiae and A. funestus, can be summarized as following: (1) the reciprocal whole arm translocation has preserved a synteny (the occurrence of genes) at the whole-arm level; (2) high rate of paracentric inversions, especially on 2R, have had a major impact on extensive gene order reshuffling. Our results suggest that the success of positional cloning or interspecific microarray experiments may be limited to either very closely related anopheline species or small genomic fragments. Further comparative studies of these two genomes will provide valuable insights into the mechanism and effects of chromosomal rearrangements. This study was supported by grants from NIH (AI48842) to N.J.B. and from the Indiana 21st Century Research & Technology Fund to F.H.C.

References:

1. E. Eichler, D. Sankoff, Science 301, 5634 (2003).
2. E. M. Zdobnov et al., Science 298, 149 (2002).
3. J. González, J. M. Ranz, A. Ruiz, Genetics 161, 1137 (2002)
4. I. V. Sharakhov et al., Science 298, 182 (2002).
5. I. V. Sharakhov, M. V. Sharakhova, C. M. Mbogo, L. L. Koekemoer, G. Yan, Genetics 159, 211 (2001)
6. L. Zheng, et al., Science 276, 425 (1997)
7. J. H. Nadeau and B. A. Taylor, Proc. Natl. Acad. Sci. U.S.A. 81, 814 (1984)
8. M. Coluzzi, A. Sabatini, V. Petrarca, M. A. Di Deco, Trans. R. Soc. Trop. Med. Hyg. 73, 483 (1979)
9. I. Dia, D. Boccolini, C. Antonio-Nkondjio, C. Costantini, D. Fontenille, Parassitologia 42, 227 (2000)

Joint work with Andrew C. Serazin (1), Olga G. Grushko (1), Ali Dana (1), Neil Lobo (1), Maureen E. Hillenmeyer (1), Richard Westerman (2), Jeanne Romero-Severson (3), Carlo Costantini (4), N'Fale Sagnon (4) Frank H. Collins (1), Nora J. Besansky (1)

(1) Center for Tropical Disease Research and Training, University of Notre Dame, Notre Dame, IN 46556-0369, USA.
(2)
Horticulture Department, Purdue University, West Lafayette, IN 47907-1159, USA.
(3)
Department of Forestry and Natural Resources, Purdue University, West Lafayette, IN 47907-1165, USA.
(4)
Centre National de Recherche et de Formation sur le Paludisme, Ouagadougou, Burkina Faso.

Amal A. Shervington (Biological Sciences Department, Forensic Sciences Department, University of Central Lancashire, Preston, PR1 2HE. UK)  aashervington@uclan.ac.uk

Induced CYP1A1 Gene Expression in Lung Cancer Cell Lines (poster session)

Joint work with Kulthum Mohammed.

The gene CYP1A1 (cytochrome P450, family A polypeptide 1), encodes a member of the cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are monooxygenases that catalyze numerous reactions involved in drug metabolism and synthesis of cholesterol, steroids and lipids. The enzyme is reported to be present predominantly in extrahepatic tissues in humans and in experimental animals (1). CYP1A1 is of toxicological importance because it catalyses the bioactivation of polyaromatic hydrocarbon (PHA) constituents e.g. Benzo[a]pyrene and other combustion products abundant in tobacco smoke to mutagens and canrcinogens (2).

Several studies of the oncogenic significance of CYP1A1 have found correlation between inducibility of the enzyme and lung cancer susceptibility in smokers (3). The expression and activity of CYP1A1 were examined using either peripheral blood lymphocytes as surrogate for lung cancer tissue (3) or lung biopsy specimens from human subjects. CYP1A1 transcripts were detected in lung cancer tissue either by reverse transcription polymerase chain reaction (RT-PCR) or northern blot hybridization (4).

In our laboratory we used four different lung cell lines: A549 Adenocarcinoma; H460 large cell carcinoma; COR-L23/5010 drug resistance large cell carcinoma and CCD-32Lu normal lung cells as a control. We measured the level of CYP1A1 transcript using the LightCycler (quantitative PCR). mRNA extracted from 106 cells using mRNA capture kit (Roche) were used to generate cDNA by Reverse Transcription System (Roche) with CYP1A1 primers (designed using primer3 web site) and amplified by the LightCycler using CYP1A1

The size of the CYP1A1 amplicon expected were 166bp, which was expressed at a highly induced level in the A549 Adenocarcinoma and to a less extent in the H460 large cell carcinoma. Very faint bands can be seen in L23/5010 drug resistance large cell carcinoma. No CYP1A1 can be detected in the normal lung cells. An amplicon of 300bp was amplified only in the control and not in the cancerous cell lines. Further work is required to characterise the 300bp band and to identify its significance.

Our results have shown an induced level of CYP1A1 in the adenocarcinoma cell line which is absent from the control, indicating that CYP1A1 is expressed at elevated level in some cancer cell line but not in the control.

Numerous citation have emphasised on the induction level of CYP1A1 in peripheral blood lymphocytes and lung cancer tissue but there have been no or few reports on the level of CYP1A1 in established cancer such as cancerous cell lines. Our study has shown elevated level of CYP1A1 in some of the cancerous cell lines, which may suggest an active role for the CYP1A1 in the maintenance of cancer.

Jijun Tang and Bernard M.E. Moret (Department of Computer Science, University of New Mexico, Albuquerque, NM 8713)  jtang@trucha01.hpc.unm.edu

Large-scale Phylogeny Reconstruction from Arbitrary Gene-order Data   pdf    ps

Slides:   pdf

Elisabeth R.M. Tillier (Ontario Cancer Institute / University of Toronto)  e.tillier@utoronto.ca  http://www.uhnres.utoronto.ca/tillier/

Models and Methods for Phylogenomics

I will present a number of new approaches to some fundamental problems in comparative genomics and sequence analysis:

1. Using gene order information to confirm orthologous identifications.

2. Using phylogenetic profiles for the phylogenetic analysis of whole genomes.

3. Development of substitution models for the analysis of protein and RNA sequences.

Li-San Wang (University of Pennsylvania)  lisan@cs.utexas.edu

Distance-Based Genome Rearrangement Phylogeny
Slides:
  html    pdf    ps    ppt

Evolution operates on whole genomes through mutations that change the order and strandedness of genes within the genomes. These events are examples of ``rare genomic changes,'' which have low frequency and high signal-to-noise ratio. Thus analyses of gene-order data present new opportunities for discoveries about deep evolutionary events, provided that sufficiently accurate methods can be developed to reconstruct evolutionary trees.

In this talk I will present our results in distance-based genome rearrangement phylogeny reconstruction. We approach the problem by developing new statistically-based true evolutionary distance estimators. These estimators are based on the distributions of genomic distances including breakpoint and inversion distances under Markov Models. In our simulation study, we obtain highly accurate trees by using these new distance estimators, even when the amount of evolution in the dataset is high.

This is joint work with Robert K. Jansen and Tandy Warnow at the University of Texas, and Bernard M.E. Moret at the University of New Mexico.

Derek E. Wildman (Center for Molecular Medicine & Genetics, Wayne St. University School of Medicine, Detroit, MI 48214)  dwildman@genetics.wayne.edu

An Objective View of Humankind's Place in Primate Evolution

Joint work with Monica Uddin, Guozhen Liu, Lawrence I. Grossman, and Morris Goodman.

In order to accurately place humankind in a phylogenetic classification of Primates it is necessary to know the phylogenetic relationships among all members of the order. We present the phylogenetic relationships and times of divergence for extant members of the order as determined by DNA nucleotide sequence data, and we focus particularly on the relationships within the family Hominidae. Local molecular clock analyses using fossil calibrations calculate that the time of origin for the order Primates as a crown group is 63 million years ago. Anthropoid primates (New World monkeys, Old World monkeys, and apes including humans) originated approximately 40 million years ago.

Phylogenetic and local molecular clock analyses from a sample of 97 genes show that humans and chimpanzees form a clade that most recently shared a common ancestor between 5 and 6 million years ago. These coding DNA data separate the human-chimpanzee clade from the gorilla clade between 6 and 7 million years ago. This African ape clade separated from the orangutan clade between 13 and 15 million years ago. We calculated the percent nonsynonymous DNA identity between humans and chimpanzees to be 99.4%, synonymous identity to be 98.4%, and total DNA sequence identity to be 99.1%. Interestingly, phylogenetic analysis grouped humans and chimpanzees together when only nonsynonymous sites were analyzed. This result suggests that at the protein level humans and chimpanzees are functionally more similar to each other than either taxon is to any other ape. Additionally, of these 97 genes, 30 show evidence of positive selection during the descent of catarrhine primates. An equal number (n=14) of these genes show elevated nonsynonymous rates of substitution on the human and chimpanzee lineages.

Divergences between humans and chimpanzees are placed in perspective by comparing their date of divergence with those found across the class Mammalia. The age of genus level crown groups for mammals ranged from 2 to 21 million years old. The mean crown group time of origin is approximately 8 million years ago, and the 95% confidence interval falls between 6.61 and 9.71 million years ago. Thus, humans and chimpanzees more recently share a common ancestor than do many congeneric groups of mammals.

Tiffani L. Williams (Department of Computer Science, University of New Mexico, Albuquerque, NM 87131)  tlw@cs.unm.edu

Searching for Optimal Trees Under Maximum Parsimony (poster session)    pdf    ps

Kenneth H. Wolfe (Department of Genetics, University of Dublin, Trinity College)  khwolfe@tcd.ie  http://www.gen.tcd.ie/khwolfe/

Genome Evolution and Sorting Out Ancient Polyploidy in Yeasts

Yeasts are a good model system for investigating gene order and chromosomal evolution because their genomes are compact and relatively eas ing in a metabolic pathway was put together during the evolution of species that can grow vigorously without oxygen.

Stacia K. Wyman (Department of Computer Sciences, University of Texas at Austin)  stacia@cs.utexas.edu  http://www.cs.utexas.edu/users/stacia

Comparative Chloroplast Genomics of Seed Plants: Annotation and Analysis of Genomic Sequences (poster session)

Joint work with Romey Haberle, Tim Chumley, Jeff Boore, and Robert Jansen .

Our research group is performing a comparative study of seed plant chloroplast genomes, which involves sequencing plastid genomes of 55 taxa representing all of the major lineages of seed plants, with more intensive sampling in groups with highly rearranged genomes. During the first two years we have completed sequencing or have nearly complete drafts for 10 plastid genomes and an additional 12 genomes are in various stages of progress. Most of the focus on the project so far has been on the highly rearranged chloroplast genomes of the angiosperm families Campanulaceae and Geraniaceae.

We have also developed an annotation program and we have designed and tested several new computational methods for using whole genomes for phylogeny reconstruction. DOGMA (Dual Organellar GenoMe Annotator) is a web-based program for annotating organellar (currently chloroplast and animal mitochondrial) genomes. Given a whole genome sequence (or fragment) in FASTA format, DOGMA semi-automates the annotation process. DOGMA uses a custom database of the complete set of genes for 16 green plants. Biological expertise is still needed in order to identify start and stop codons as well as intron boundaries. The result is an annotated genome which can be saved in Sequin format for direct submission to GenBank.

DOGMA, which is in the beta-testing phase, has already been used in the preliminary analysis of several sequenced chloroplast genomes. The complete sequences of the Trachelium (Campanulaceae) and Pelargonium (Geraniaceae) chloroplast genomes have identified numerous repeated sequences that are associated with extensive changes in gene order and they suggest that transposition may also be responsible for several genomic rearrangements in Trachelium.

Go