Home Community MIT scientists construct a system that may generate AI models for biology research

MIT scientists construct a system that may generate AI models for biology research

0
MIT scientists construct a system that may generate AI models for biology research

Is it possible to construct machine-learning models without machine-learning expertise?

Jim Collins, the Termeer Professor of Medical Engineering and Science within the Department of Biological Engineering at MIT and the life sciences faculty lead on the Abdul Latif Jameel Clinic for Machine Learning in Health (Jameel Clinic), together with quite a lot of colleagues decided to tackle this problem when facing an identical conundrum. An open-access paper on their proposed solution, called BioAutoMATED, was published on June 21 in .

Recruiting machine-learning researchers could be a time-consuming and financially costly process for science and engineering labs. Even with a machine-learning expert, choosing the suitable model, formatting the dataset for the model, then fine-tuning it may well dramatically change how the model performs, and takes quite a lot of work. 

“In your machine-learning project, how much time will you sometimes spend on data preparation and transformation?” asks a 2022 Google course on the Foundations of Machine Learning (ML). The 2 selections offered are either “Lower than half the project time” or “Greater than half the project time.” In case you guessed the latter, you could be correct; Google states that it takes over 80 percent of project time to format the info, and that’s not even considering the time needed to border the issue in machine-learning terms.

“It might take many weeks of effort to determine the suitable model for our dataset, and this can be a really prohibitive step for quite a lot of folks that need to use machine learning or biology,” says Jacqueline Valeri, a fifth-year PhD student of biological engineering in Collins’s lab who’s first co-author of the paper. 

BioAutoMATED is an automatic machine-learning system that may select and construct an appropriate model for a given dataset and even maintain the laborious task of knowledge preprocessing, whittling down a months-long process to only a couple of hours. Automated machine-learning (AutoML) systems are still in a comparatively nascent stage of development, with current usage primarily focused on image and text recognition, but largely unused in subfields of biology, points out first co-author and Jameel Clinic postdoc Luis Soenksen PhD ’20.

“The basic language of biology relies on sequences,” explains Soenksen, who earned his doctorate within the MIT Department of Mechanical Engineering. “Biological sequences comparable to DNA, RNA, proteins, and glycans have the amazing informational property of being intrinsically standardized, like an alphabet. A number of AutoML tools are developed for text, so it made sense to increase it to [biological] sequences.”

Furthermore, most AutoML tools can only explore and construct reduced forms of models. “But you may’t really know from the beginning of a project which model can be best on your dataset,” Valeri says. “By incorporating multiple tools under one umbrella tool, we actually allow a much larger search space than any individual AutoML tool could achieve by itself.”

BioAutoMATED’s repertoire of supervised ML models includes three types: binary classification models (dividing data into two classes), multi-class classification models (dividing data into multiple classes), and regression models (fitting continuous numerical values or measuring the strength of key relationships between variables). BioAutoMATED is even in a position to help determine how much data is required to appropriately train the chosen model.

“Our tool explores models which might be better-suited for smaller, sparser biological datasets in addition to more complex neural networks,” Valeri says. This is a bonus for research groups with latest data which will or will not be fitted to a machine learning problem.

“Conducting novel and successful experiments on the intersection of biology and machine learning can cost quite a lot of money,” Soenksen explains. “Currently, biology-centric labs need to take a position in significant digital infrastructure and AI-ML trained human resources before they’ll even see if their ideas are poised to pan out. We wish to lower these barriers for domain experts in biology.” With BioAutoMATED, researchers have the liberty to run initial experiments to evaluate if it’s worthwhile to rent a machine-learning expert to construct a special model for further experimentation. 

The open-source code is publicly available and, researchers emphasize, it is straightforward to run. “What we might like to see is for people to take our code, improve it, and collaborate with larger communities to make it a tool for all,” Soenksen says. “We wish to prime the biological research community and generate awareness related to AutoML techniques, as a seriously useful pathway that might merge rigorous biological practice with fast-paced AI-ML practice higher than it’s achieved today.”

Collins, the senior writer on the paper, can also be affiliated with the MIT Institute for Medical Engineering and Science, the Harvard-MIT Program in Health Sciences and Technology, the Broad Institute of MIT and Harvard, and the Wyss Institute. Additional MIT contributors to the paper include Katherine M. Collins ’21; Nicolaas M. Angenent-Mari PhD ’21; Felix Wong, a former postdoc within the Department of Biological Engineering, IMES, and the Broad Institute; and Timothy K. Lu, a professor of biological engineering and of electrical engineering and computer science.

This work was supported, partly, by a Defense Threat Reduction Agency grant, the Defense Advance Research Projects Agency SD2 program, the Paul G. Allen Frontiers Group, the Wyss Institute for Biologically Inspired Engineering of Harvard University; an MIT-Takeda Fellowship, a Siebel Foundation Scholarship, a CONACyT grant, an MIT-TATA Center fellowship, a Johnson & Johnson Undergraduate Research Scholarship, a Barry Goldwater Scholarship, a Marshall Scholarship, Cambridge Trust, and the National Institute of Allergy and Infectious Diseases of the National Institutes of Health. This work is a component of the Antibiotics-AI Project, which is supported by the Audacious Project, Flu Lab, LLC, the Sea Grape Foundation, Rosamund Zander and Hansjorg Wyss for the Wyss Foundation, and an anonymous donor.

LEAVE A REPLY

Please enter your comment!
Please enter your name here