This AI Paper from MIT Explores the Scaling of Deep Learning Models for Chemistry Research

Community

This AI Paper from MIT Explores the Scaling of Deep Learning Models for Chemistry Research

admin

November 17, 2023

This AI Paper from MIT Explores the Scaling of Deep Learning Models for Chemistry Research

Researchers from MIT investigated the scaling behavior of huge chemical language models, specializing in each generative pre-trained transformers (GPT) for chemistry (ChemGPT) and graph neural network force fields (GNNs). They introduce the concept of neural scaling, where the performance of models is characterised by empirical scaling laws, particularly by way of loss scaling as an influence law in regards to the variety of model parameters, dataset size, or compute resources. The study delves into the challenges and opportunities related to scaling large chemical models, aiming to supply insights into the optimal allocation of resources for improving pre-training loss.

For chemical language modeling, the researchers design ChemGPT, a GPT-3-style model based on GPT-Neo, with a tokenizer for self-referencing embedded strings (SELFIES) representations of molecules. The model is pre-trained on molecules from PubChem, and the study explores the impact of dataset and model size on pre-training loss.

Along with language models, the paper addresses graph neural network force fields (GNNs) for tasks requiring molecular geometry and three-dimensional structure. 4 forms of GNNs are considered, starting from models with internal layers manipulating only E(3) invariant quantities to those using E(3) equivariant quantities with increasing physics-informed model architectures. The authors evaluate the capability of those GNNs, defined by way of depth and width, during neural-scaling experiments.

To efficiently handle hyperparameter optimization (HPO) for deep chemical models, the paper introduces a method called Training Performance Estimation (TPE), adapting it from a way utilized in computer vision architectures. TPE utilizes training speed to enable performance estimation across different domains and model/dataset sizes. The paper details the experimental settings, including the usage of NVIDIA Volta V100 GPUs, PyTorch, and distributed data-parallel acceleration for model implementation and training.

Overall, the study provides a comprehensive exploration of neural scaling within the context of huge chemical language models, considering each generative pre-trained transformers and graph neural network force fields, and introduces an efficient method for hyperparameter optimization. The experimental results and insights contribute to understanding the resource efficiency of various model architectures in scientific deep learning applications.

Try the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to hitch our 33k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the most recent AI research news, cool AI projects, and more.

In case you like our work, you’ll love our newsletter..

We’re also on Telegram and WhatsApp.

Pragati Jhunjhunwala is a consulting intern at MarktechPost. She is currently pursuing her B.Tech from the Indian Institute of Technology(IIT), Kharagpur. She is a tech enthusiast and has a keen interest within the scope of software and data science applications. She is at all times reading concerning the developments in numerous field of AI and ML.

🔥 Join The AI Startup Newsletter To Learn About Latest AI Startups

LEAVE A REPLY Cancel reply