Home Community Exploring the Frontiers of AI in Single-Cell Biology: A Critical Evaluation of Zero-Shot Foundation Models like Geneformer and scGPT

Exploring the Frontiers of AI in Single-Cell Biology: A Critical Evaluation of Zero-Shot Foundation Models like Geneformer and scGPT

0
Exploring the Frontiers of AI in Single-Cell Biology: A Critical Evaluation of Zero-Shot Foundation Models like Geneformer and scGPT

The appliance of foundational models in single-cell biology has been a recent topic of debate amongst researchers. Models like scGPT, GeneCompass, and Geneformer are a number of the promising tools for this field. Nevertheless, their efficacy has been a degree of concern, particularly in zero-shot settings, especially when this field involves exploratory experiments and a scarcity of clear labels for fine-tuning. This paper relies on this issue and rigorously assesses the zero-shot performance of those models.

Previously, there have been studies counting on fine-tuning these models on specific tasks, but its limitations grow to be quite evident when applied to the sphere of single-cell biology due to nature of this field, in addition to because of high computational requirements. Due to this fact, to deal with this challenge, Microsoft researchers evaluated the zero-shot performance of Geneformer and scGPT foundational models on multiple facets involving diverse datasets and an array of tasks just like the utility of embeddings for cell type clustering, batch effect correction, and the effectiveness of the models’ input reconstruction based on the pretraining objectives.

The rationale for selecting these two models is due to the provision of their pretrained weights (on the time of their assessment). For his or her evaluation, the researchers used five distinct human tissue datasets, each posing unique, relevant challenges to single-cell evaluation. For comparison, the researchers used a generative model called scVI, which was trained on each dataset. They used the next metrics for every task:

  • For evaluating cell embeddings, they used Average Silhouette Width (ASW) and the Average Bio (AvgBIO) scores to calculate the degree to which the cell types are unique throughout the embedding space.
  • For batch integration, they employed a variation of the AWS rating on a scale between 0 and 1, with 0 signifying complete separation of the batches and 1 signifying an ideal batch mixing.
  • For evaluating the performance of scGPT and Geneformer of their pretraining objective, they used mean squared error (MSE) and Pearson’s correlation, respectively.

scGPT and Geneformer performed worse than the baseline strategies for each metrics. Geneformer had a high variance for various datasets, and although scGPT performed higher than the bottom model scVI for one in all the datasets, it fell behind for 2 of them. Subsequently, the researchers evaluated the impact of the pretraining dataset on the model performance, focusing mainly on scGPT (4 variants of scGPT), and located an improvement within the median scores for all model variants.

When evaluated on batch effects, each models showed poor results, often lagging behind models like scVI, which suggests that they should not completely robust to batch effects in zero-shot environments. For the last set of evaluations, the researchers found that scGPT fails to reconstruct gene expressions, while Geneformer gives a greater performance. In comparison against a baseline, they observed that the baseline prediction outperformed all scGPT variants, and Geneformer performed higher than the common rankings in one in all the datasets.

In conclusion, the researchers extensively analyzed the zero-shot capabilities of scGPT and Geneformer when applied to single-cell biology, and their evaluation highlights the sub-par performance of those models. Their findings present that scGPT outperforms the Geneformer model in all evaluations. Lastly, the researchers also provided some insights as to where the longer term work must be focused, namely expressing the connection between pretraining task, pretraining dataset, and performance on downstream evaluation tasks.


Take a look at the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to hitch our 33k+ ML SubReddit, 41k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the newest AI research news, cool AI projects, and more.

If you happen to like our work, you’ll love our newsletter..


Asif Razzaq is the CEO of Marktechpost Media Inc.. As a visionary entrepreneur and engineer, Asif is committed to harnessing the potential of Artificial Intelligence for social good. His most up-to-date endeavor is the launch of an Artificial Intelligence Media Platform, Marktechpost, which stands out for its in-depth coverage of machine learning and deep learning news that’s each technically sound and simply comprehensible by a large audience. The platform boasts of over 2 million monthly views, illustrating its popularity amongst audiences.


[SPONSORED] Step by Step Tutorial on ‘The way to Construct LLM Apps that may See Hear Speak’

LEAVE A REPLY

Please enter your comment!
Please enter your name here