Using Simulations for Weak Supervised Learning: A ChemNet for Transferable Chemical Property Prediction

Thursday, March 8, 2018 - 10:30am - 11:30am
Lind 305
Garrett Goh (Pacific Northwest National Laboratory)
With access to large datasets, deep neural networks (DNN) have achieved human-level accuracy in image and speech recognition tasks. However, in chemistry, data is inherently small and fragmented. In this work, we develop various approaches of using rule-based models and physics-based simulations to train ChemNet, a transferable and generalizable pre-trained network for small-molecule property prediction that learns in a weak-supervised manner from large unlabeled chemical databases. We demonstrate that the ChemNet pre-training approach is equally effective on both CNN (Chemception) and RNN (SMILES2vec) models, indicating that it is network architecture agnostic and is effective across multiple data modalities. Furthermore, we demonstrate proof-of-concept of model interpretability, which when tested on the solubility dataset, identified specific parts of a chemical that is consistent with established first-principles knowledge with an accuracy of 88%. Our work demonstrates that neural networks can learn technically accurate chemical concept while providing state-of-the-art accuracy, making interpretable deep neural networks a useful tool for AI-driven chemical research.