Manager, Bioinformatics & Pattern Discovery
Computational Biology Center
IBM T J Watson Reseach Center
One of the interesting problems in biological data analysis is the discovery of sub-sequences ("patterns") that are common to a given collection of related "streams." In the early days, the problem instantiated itself in the form of motif discovery in amino acid sequences. Recently, the concept has expanded into the discovery of patterns in structural data, gene expression data, scientific text, nucleotide sequences, gene marker expression data, etc. And new applications continue to be devised. In all such contexts, the discovered patterns reveal correlated elements that have an associated functional, structural or other significance.
In this talk, I will present and discuss applications that we have developed in my group for carrying out pattern discovery in the bioinformatics context. These applications span a large spectrum: unsupervised motif discovery, multiple sequence alignment, tandem-repeat discovery, gene expression analysis, functional annotation, local 3-dimensional structure characterization, and other. Also, specific examples taken from actual biological problems will be shown.
This is joint work with Aris Floratos, Laxmi Parida, Yuan Gao, Gustavo Stolovitzky, and Dan Platt.