HOME    »    PROGRAMS/ACTIVITIES    »    Annual Thematic Program
IMA Annual Program Year Workshop
Large Data Sets in Medical Informatics
November 14-18, 2011

Nevenka DimitrovaPhilips Research Laboratory
W. Clem KarlBoston University
Jean-Christophe Olivo-MarinInstitut Pasteur
Ahmed TewfikUniversity of Texas, Austin
Group Photo

Biomedical informatics is currently limited by two critical challenges: the need to process large data sets to make inferences and the small size of replicates that limits the confidence in these inferences. These factors impact several aspects of biomedical informatics, including its development and its applications, ranging from modern CT, MRI and microscopy imaging to neurological investigations and genomics.

For example, it is well known that modern CT and MRI images involve acquiring and processing large amounts of data. Less appreciated are the facts that these large amounts of data require hours of post-moaning to guide surgical interventions and that the time requirements of these methodologies limit their practical applications. To provide surgeons with real-time three-dimensional tracking of organ deformation—required in the newest proposed minimally invasive surgical procedures, such as NOTES—will require several orders of magnitude speed-ups in data processing and fusion across imaging modalities.

Similarly, neurological investigations are based on large data sets collected from arrays of electrodes implanted on or inside the brain, spinal cord, or severed limbs. Inferences result from the careful analysis of quantitative and qualitative patterns in these signals and their coherent or incoherent evolutions. The problem is further compounded by the nonstationary nature of the behavior-specific signals of interest and the presence of other interfering signals. In addition, learning is severely limited by the small size and lack of richness of the training data; researchers can typically collect data only from a small number of animals or human subjects. Thus, training data may not cover the spectrum of signal characteristics in a general population which limits the ability to construct accurate statistical models. Variations in surgical implantations further limit the quality of the data.

Finally, genomics is based on the analysis of large data sets corresponding to DNA copy number variations or gene expression levels. Experimental imperfections and the limited number of replicates have again hampered the confidence in the results of analyses. Indeed, most studies in the field cannot be reproduced. Recent studies also suggest that diseases are likely linked to large numbers of rare genetic variants that are seldom captured by most databases collected to date. Furthermore, the need to do in vivo nanoscale imaging of specific cells has led to a slew of complex challenges which includes the traditional challenges in both genomics and medical imaging.

This workshop will bring together mathematicians, statisticians, engineers, and scientists working on particular aspects of biomedical informatics or related areas. A careful look at the literature in any of the subfields of biomedical informatics reveals specialized approaches and philosophies combined with a lack of knowledge of other potentially useful methodologies that have been developed in other subfields. Results and methodologies discovered in pertinent areas of mathematics and statistics remain largely unknown in biomedical informatics. Conversely, researchers in real analysis, differential equations, algebraic geometry, and statistics are often unaware of the characteristics of the challenges in biomedical informatics that limit the applicability of generic approaches.


Connect With Us: