Foundations of Data Science: Mind the Gaps

Wednesday, September 14, 2016 - 11:10am - 12:00pm
Keller 3-180
Vasant Honavar (The Pennsylvania State University)
Scientific progress in many disciplines is increasingly enabled by our ability to examine natural phenomena through the computational lens, and our ability to acquire, share, integrate and analyze disparate types of data. Some have even suggested that the emergence of big data, has made the scientific method obsolete: We can analyze data without hypotheses about the underlying phenomena, and let statistical machine learning algorithms find correlations and patterns. This talk argues that the emergence of big data, instead of making the scientific method obsolete, underscores the need to develop a strong foundation for data science, including principled approaches to developing computational abstractions of not only the objects of scientific study, but also the scientific methods and processes; cognitive tools that complement and extend human intellect, in the form of computational artifacts (representations, processes, protocols, workflows, software) to partner with humans on all aspects of science (mapping the state of knowledge in a discipline and identifying gaps, formulating and prioritizing questions; designing, prioritizing and executing experiments; drawing inferences and constructing explanations and hypotheses from the literature, databases, knowledge bases, expressing and reasoning with scientific arguments of variable certainty and provenance; synthesizing findings from disparate observational and experimental studies; formulating new questions, in a closed-loop fashion); and integrating the resulting cognitive tools into collaborative human-machine systems and infrastructure to advance science, including tools for documentation, replication and communication of scientific studies, collaboration, team formation (incentivizing participants, decomposing tasks, combining results, engaging participants with different levels of expertise and abilities), communication (across disciplinary boundaries and across levels of abstraction), and tracking scientific progress and impact.