Foundation Models (for biology)

Technology

Large AI models trained on vast quantities of biological data (like DNA), which can be adapted for various tasks like predicting disease risk or protein structures. Evo2 is a key example.


entitydetail.created_at

7/26/2025, 2:51:49 AM

entitydetail.last_updated

7/26/2025, 2:55:10 AM

entitydetail.research_retrieved

7/26/2025, 2:55:10 AM

Summary

Foundation Models for biology represent a cutting-edge application of artificial intelligence, characterized by large-scale AI systems trained on vast, often unlabeled, biological datasets. Unlike traditional AI models designed for specific tasks, these foundation models, frequently built upon transformer architectures and large language models, are broadly applicable and can be fine-tuned for a wide array of biological challenges, including medical image analysis, biomolecule design, and the interpretation of complex multi-omics data. They integrate diverse data types—such as genomics, proteomics, imaging, and clinical records—to provide holistic insights, thereby accelerating drug discovery and optimizing research processes. A prominent example is Evo2, recognized as one of the largest foundation models for biology, which learns from diverse DNA. This field is actively advanced by entities like GenBio AI, co-founded by Emma Lundberg, which focuses on developing multiscale AI foundation models for biological applications, and research initiatives like the Arc Institute.

Referenced in 1 Document
Research Data
Extracted Attributes
  • Type

    AI Model

  • Field

    Biology, Biomedical Research, Cell Biology

  • Architecture

    Transformer architectures, Large Language Models (LLMs), Neural Networks, Self-supervised learning, Vision Transformers

  • Core Function

    Learn fundamental language of biology from vast, unlabeled datasets

  • Example Model

    Cancer Foundation Model

  • Application Areas

    Medical image analysis, biomolecule design, analyzing single-cell sequencing data, identifying patterns and relationships in cells, identifying primary cancer sites, accelerating drug discovery, automating experiments, optimizing research processes

  • Key Characteristic

    Self-supervised learning to infer patterns and relationships

  • Training Data Sources

    Massive unlabeled datasets, diverse DNA, multi-omics data (genomics, transcriptomics, proteomics), imaging, clinical trials, biological pathways, single-cell sequencing data, pathology, radiology, text reports, electronic health records

Timeline
  • Nature Biotechnology publishes an article titled 'Foundation models build on ChatGPT tech to learn the fundamental language of biology', discussing how these models interpret biological data to guide biomolecule design. (Source: nature.com)

    2024-09-13

Emma Lundberg (scientist)

Emma Lundberg is a Swedish cell biologist who is a professor at the KTH Royal Institute of Technology and Director of Cell Profiling at the Science for Life Laboratory. Her research focuses on spatial proteomics and cell biology, using antibody-based approaches to study fundamental aspects of human biology and the impact of protein variation on disease. She is currently an Associate Professor of Bioengineering and Pathology at Stanford University, and serves as Co-Founder and Chief Scientific Advisor at GenBio AI, an AI company developing multiscale AI foundation models for biology.

Web Search Results
  • What are Foundation Models in Biology and Healthcare?

    Foundation models are a category of AI models trained on massive amounts of unlabeled data, enabling them to handle a wide variety of tasks, such as text translation or medical image analysis. This is in contrast to earlier AI models, which were specifically trained for one task. Foundation models, often built on transformer architectures and large language models (LLMs), can be fine-tuned for numerous applications with minimal additional data or effort. [...] The introduction of foundation models and advanced AI in biomedical research fundamentally transforms this process by adopting a holistic, systems-based approach. Foundation models—trained on vast amounts of multi-omics data (genomics, transcriptomics, proteomics, etc.), imaging, clinical trials, and biological pathways—can analyze and generate insights from this data in an interconnected manner. Instead of focusing on one biological target, these models integrate various layers of biological [...] The technological backbone of foundation models is a carefully layered architecture that enables them to handle vast datasets, generalize across multiple domains, and solve tasks ranging from natural language processing (NLP) to computer vision. Central to the efficiency and versatility of these models is the integration of several key technologies, including neural networks, transformers, self-supervised learning, and computational infrastructure capable of managing the immense demands of

  • Foundation Models for Biomedical Research - Watershed Bio

    ‍ Biology comprises many different components – like genes, amino acids, proteins, chromatin, etc. – working closely together to orchestrate different processes. Foundation models allow researchers to integrate diverse data types from all of these different sources into a unified framework. By highlighting novel connections, these models can provide direction for unresolved research questions and form the basis for new hypotheses. ‍ ‍ [...] Foundation models are a subset of machine learning models trained on huge datasets in order to be broadly applicable in a variety of contexts. Most foundation model training data are unlabelled, making these models self-supervised – they do not need to be told what to look for, but rather infer patterns and relationships in the data by themselves. Because of their ability to learn contextually from these massive datasets, foundation models can make predictions about a variety of different [...] While many models have been introduced too recently to have extensive citations, several have already been quite impactful across application areas. From cancer to cardiovascular health, researchers are combining foundation models with novel analytical approaches to explore the pathology, progression, and treatment of various diseases in unprecedented detail. The following studies, most of which have been published in the last year, highlight some promising applications of biological foundation

  • Foundation models build on ChatGPT tech to learn the fundamental ...

    Reprints and permissions ## About this article Check for updates. Verify currency and authenticity via CrossMark ### Cite this article Eisenstein, M. Foundation models build on ChatGPT tech to learn the fundamental language of biology. Nat Biotechnol 42, 1323–1325 (2024). Download citation Published: 13 September 2024 Issue Date: September 2024 DOI: ### Share this article Anyone you share the following link with will be able to read this content: [...] Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript. Advertisement Advertisement Nature Biotechnology # Foundation models build on ChatGPT tech to learn the fundamental language of biology [...] Nature Biotechnology volume 42, pages 1323–1325 (2024)Cite this article 15k Accesses 4 Citations 27 Altmetric Metrics details This article has been updated Scientists are using ever more sophisticated AI algorithms trained on vast, unlabeled datasets to develop models that can ‘interpret’ biological data to help guide biomolecule design. This is a preview of subscription content, access via your institution ## Access options Access Nature and 54 other Nature Portfolio journals

  • How Foundation Models Are Shaping Biomedical Research

    At the heart of this research is single-cell sequencing, a technology that allows scientists to study individual cells rather than averaging data across entire tissues. While this method generates vast and complex datasets, foundation models have proven essential to unlocking their full potential. By leveraging AI, Theis’s team can process millions of individual cell profiles, identifying patterns, relationships, and transitions that would be impossible to detect manually. [...] Foundation models, such as GPT – the AI system behind ChatGPT – are large-scale artificial intelligence systems trained on diverse and extensive datasets, enabling them to perform a wide range of tasks with minimal task-specific tuning. Their ability to generalize across different domains makes them highly adaptable for applications spanning from language processing to problem-solving in complex fields. These models power various applications, including content generation, coding assistance, [...] Identifying the primary cancer site when metastases appear in the body is a significant challenge. To address this, Schnabel’s team is building a Cancer Foundation Model – an AI system that integrates diverse medical data sources, including pathology, radiology, text reports, and electronic health records. Combining Vision Transformers (AI models specialized in image analysis) with LLMs, they are creating a system capable of seamlessly analyzing both imaging and textual data to trace the source

  • The future of biological foundation models and value creation in AI ...

    Ultimately, there are massive opportunities for investors and builders across this entire field. The spaces outlined above only begin to scratch the surface of what is happening, and there are even greater moonshot ideas coming out of companies like Bioptimus and Somite, research labs like the Arc Institute, and various players targeting issues further downstream within the clinical trial process. These are all worth entire blog posts of their own to unpack. It is a field that is incredibly [...] However, the capabilities of today’s state of the art biology models enable far more than just exploring unproven discoveries or drug classes. Rather they are also about (a) accelerating bottlenecks that slow down every drug pipeline across the industry, (b) automating prohibitively expensive experiments and simulations that would have previously required outsourcing to third-party vendors or investing in wet lab infrastructure, and (c) optimizing existing processes and preventing costly [...] For the billion and near billion dollar fundraisers like Xaira and Isomorphic, all signs point towards full-on biotechs with internal pipelines. Other players like Cradle are gunning for a pure software play that integrates into the existing workflows and bets on a newly matured buyer base with the talent and infrastructure ready to adopt out-of-the box tooling. And somewhat down the middle, players like Latent Labs and Profluent are taking a more partnership-based approach.