Evo2

Technology

A large, open-source foundation model for biology developed by the Arc Institute. Trained on trillions of gene tokens, it can predict the harmfulness of human genetic mutations despite being trained mostly on non-human DNA.


entitydetail.created_at

7/26/2025, 2:51:49 AM

entitydetail.last_updated

8/2/2025, 5:39:09 PM

entitydetail.research_retrieved

8/2/2025, 5:39:09 PM

Summary

Evo2 is a groundbreaking foundation model in biology, designed to learn from and understand diverse DNA sequences across all domains of life. Developed collaboratively by the Arc Institute, NVIDIA, Stanford University, UC Berkeley, and UC San Francisco, it is the largest AI model in biology to date, trained on over 9.3 trillion nucleotides from more than 128,000 whole genomes. This open-source model, with 40 billion parameters, can identify disease-causing mutations in human genes and design new genomes, significantly accelerating biological discovery. Its rapid development was highlighted in a special episode of the All-In Podcast as an example of innovation driven by strict capital constraints, drawing parallels to Elon Musk's work on Grok 3.

Referenced in 1 Document
Research Data
Extracted Attributes
  • Type

    Foundation Model for Biology

  • Function

    Models and designs genetic code, identifies patterns in gene sequences, identifies disease-causing mutations in human genes, designs new genomes

  • Parameters

    40 billion

  • Resolution

    Single-nucleotide

  • Integration

    NVIDIA BioNeMo framework

  • Predecessor

    Evo 1

  • Availability

    Open source (training data, code, model weights)

  • Training Data

    Over 9.3 trillion nucleotides from over 128,000 whole genomes (bacterial, archaeal, phage, humans, plants, single-celled, multi-cellular species)

  • Context Length

    1 megabase

  • Associated Tool

    Evo Designer

Timeline
  • Evo2 was released, alongside a user-friendly interface, Evo Designer. (Source: Web Search Results)

    2025-02-19

Škoda Auto

Škoda Auto a.s. (Czech pronunciation: [ˈʃkoda] ), often shortened to Škoda, is a Czech automobile manufacturer established in 1925 as the successor to Laurin & Klement and headquartered in Mladá Boleslav, Czech Republic. Škoda Works became state owned in 1948. After the Velvet Revolution, it was gradually privatized starting in 1991, eventually becoming a wholly owned subsidiary of the German multinational conglomerate Volkswagen Group in 2000. Škoda automobiles are sold in over 100 countries, and in 2018, total global sales reached 1.25 million units, an increase of 4.4% from the previous year. The operating profit was €1.6 billion in 2017, an increase of 34.6% over the previous year. As of 2017, Škoda's profit margin was the second-highest of all Volkswagen AG brands after Porsche.

Web Search Results
  • AI can now model and design the genetic code for all domains of life ...

    Building on its predecessor Evo 1, which was trained entirely on single-cell genomes, Evo 2 is the largest artificial intelligence model in biology to date, trained on over 9.3 trillion nucleotides—the building blocks that make up DNA or RNA—from over 128,000 whole genomes as well as metagenomic data. In addition to an expanded collection of bacterial, archaeal, and phage genomes, Evo 2 includes information from humans, plants, and other single-celled and multi-cellular species in the [...] Arc Institute researchers have developed a machine learning model called Evo 2 that is trained on the DNA of over 100,000 species across the entire tree of life. Its deep understanding of biological code means that Evo 2 can identify patterns in gene sequences across disparate organisms that experimental researchers would need years to uncover. The model can accurately identify disease-causing mutations in human genes and is capable of designing new genomes that are as long as the genomes of [...] “Evo 2 has fundamentally advanced our understanding of biological systems,” says Anthony Costa (@anthonycosta), director of digital biology at NVIDIA. “By overcoming previous limitations in the scale of biological foundation models with a unique architecture and the largest integrated dataset of its kind, Evo 2 generalizes across more known biology than any other model to date — and by releasing these capabilities broadly, the Arc Institute has given scientists around the world a new partner in

  • Evo2 Demystified ~ The Ultimate Technical Guide to Genomic ...

    Evo2 is an autoregressive DNA language model that operates at single–nucleotide resolution. Its primary goal is to learn the statistical patterns of genomic sequences by predicting the next nucleotide in a sequence given all preceding nucleotides. In this section, I introduce the core ideas behind Evo2, its training procedure, and its objectives in mathematical detail. ## Autoregressive Modeling in Genomics Consider a DNA sequence represented as [...] Developed in collaboration with Arc Institute, NVIDIA Healthcare, Stanford University, University of California, Berkeley, and University of California, San Francisco, Evo2 is entirely open source, with its training data, code, and model weights freely available to the scientific community. [...] Trained on 9.3 trillion nucleotides drawn from over 128,000 archaeal, prokaryotic, and eukaryotic genomes, Evo2 has achieved an unprecedented understanding of biological code. Its capacity to discern intricate patterns in gene sequences across diverse organisms can uncover insights that might take experimental researchers years to reveal. Evo2 not only accurately identifies disease-causing mutations in human genes but also designs new genomes that rival the size of those in simple bacteria.

  • Evo 2: DNA Foundation Model - Arc Institute

    ## Evo 2: DNA Foundation Model Evo 2 is a genomic foundation model capable of generalist prediction and design tasks across DNA, RNA, and proteins. It uses a frontier deep learning architecture to enable modeling of biological sequences at single-nucleotide resolution with near-linear scaling of compute and memory relative to context length. Evo 2 is trained with 40 billion parameters and 1 megabase context length on over 9 trillion nucleotides of diverse eukaryotic and prokaryotic genomes.

  • Evo2: One Bio-AI Model to Rule Them All - SynBioBeta

    Developed by a team from Arc Institute and NVIDIA—with participation from Stanford University, UC Berkeley, and UC San Francisco—Evo 2 was released on February 19, 2025. Alongside it comes a user-friendly interface, Evo Designer. The underlying code rests on Arc Institute’s GitHub page and is integrated into the NVIDIA BioNeMo framework, a collaboration that aims to speed scientific discovery. Additionally, Arc Institute partnered with AI research lab Goodfire to create a mechanistic [...] Developed by a team from Arc Institute and NVIDIA—with participation from Stanford University, UC Berkeley, and UC San Francisco—Evo 2 was released on February 19, 2025. Alongside it comes a user-friendly interface, Evo Designer. The underlying code rests on Arc Institute’s GitHub page and is integrated into the NVIDIA BioNeMo framework, a collaboration that aims to speed scientific discovery. Additionally, Arc Institute partnered with AI research lab Goodfire to create a mechanistic [...] “Evo 2 has fundamentally advanced our understanding of biological systems,” says Anthony Costa, director of digital biology at NVIDIA. “By overcoming previous limitations in the scale of biological foundation models with a unique architecture and the largest integrated dataset of its kind, Evo 2 generalizes across more known biology than any other model to date — and by releasing these capabilities broadly, the Arc Institute has given scientists around the world a new partner in solving

  • Generative AI tool marks a milestone in biology and accelerates the ...

    The open-source, all-access tool, known as Evo 2, was developed by a multi-institutional team co-led by Stanford’s Brian Hie, an assistant professor of chemical engineering and a faculty fellow in Stanford Data Science. Evo 2 was trained on a dataset that includes all known living species, including humans, plants, bacteria, amoebas, and even a few extinct species. Stanford Report talked to Hie about Evo 2’s advanced capabilities, why the scientific world is so eager to get its hands on this [...] in real life. Then we go into the lab and synthesize the DNA and insert it into a living cell to test it using a gene editing technology like CRISPR. Essentially, Evo 2 is speeding up evolution, providing promising new genetic paths for us to explore. [...] ## How is Evo 2 like ChatGPT? In a natural language processor, like ChatGPT, you can prompt it with some text, and it will autocomplete the sentence based on patterns from previously written words. Evo 2 does this with DNA. If you want to design a new gene, you prompt the model with the beginning of a gene sequence of base pairs, and Evo 2 will autocomplete the gene.