PROTEIN LANGUAGE MODELS

OPEN CRISPR
https://profluent.bio/applications
https://github.com/Profluent-AI/OpenCRISPR
https://biorxiv.org/content/10.1101/2024.04.25.591003v1
https://nytimes.com/crispr-gene-editing-10-years-on
https://nytimes.com/generative-ai-gene-editing-crispr
Generative A.I. Arrives in the Gene Editing World of CRISPR
by Cade Metz  /  April 22, 2024

“Generative A.I. technologies can write poetry and computer programs or create images of teddy bears and videos of cartoon characters that look like something from a Hollywood movie. Now, new A.I. technology is generating blueprints for microscopic biological mechanisms that can edit your DNA, pointing to a future when scientists can battle illness and diseases with even greater precision and speed than they can today. Described in a research paper published on Monday by a Berkeley, Calif., startup called Profluent, the technology is based on the same methods that drive ChatGPT, the online chatbot that launched the A.I. boom after its release in 2022.


“CRISPR-GPT: LLM Agent for Automated Design of Gene-Editing Experiments”

The company is expected to present the paper next month at the annual meeting of the American Society of Gene and Cell Therapy. Much as ChatGPT learns to generate language by analyzing Wikipedia articles, books and chat logs, Profluent’s technology creates new gene editors after analyzing enormous amounts of biological data, including microscopic mechanisms that scientists already use to edit human DNA. These gene editors are based on Nobel Prize-winning methods involving biological mechanisms called CRISPR. Technology based on CRISPR is already changing how scientists study and fight illness and disease, providing a way of altering genes that cause hereditary conditions, such as sickle cell anemia and blindness.

“The physical structure of OpenCRISPR-1, a gene editor created
by A.I. technology from Profluent.” Video by Profluent Bio

Previously, CRISPR methods used mechanisms found in nature — biological material gleaned from bacteria that allows these microscopic organisms to fight off germs. “They have never existed on Earth,” said James Fraser, a professor and chair of the department of bioengineering and therapeutic sciences at the University of California, San Francisco, who has read Profluent’s research paper. “The system has learned from nature to create them, but they are new.” The hope is that the technology will eventually produce gene editors that are more nimble and more powerful than those that have been honed over billions of years of evolution. On Monday, Profluent also said that it had used one of these A.I.-generated gene editors to edit human DNA and that it was “open sourcing” this editor, called OpenCRISPR-1. That means it is allowing individuals, academic labs and companies to experiment with the technology for free. A.I. researchers often open source the underlying software that drives their A.I. systems, because it allows others to build on their work and accelerate the development of new technologies. But it is less common for biological labs and pharmaceutical companies to open source inventions like OpenCRISPR-1. Though Profluent is open sourcing the gene editors generated by its A.I. technology, it is not open sourcing the A.I. technology itself.

“A time-lapse of human cells edited by OpenCRISPR-1.”
Video by Joseph Gallagher Profluent Bio

The project is part of a wider effort to build A.I. technologies that can improve medical care. Scientists at the University of Washington, for instance, are using the methods behind chatbots like OpenAI’s ChatGPT and image generators like Midjourney to create entirely new proteins — the microscopic molecules that drive all human life — as they work to accelerate the development of new vaccines and medicines. (The New York Times has sued OpenAI and its partner, Microsoft, on claims of copyright infringement involving artificial intelligence systems that generate text.) Generative A.I. technologies are driven by what scientists call a neural network, a mathematical system that learns skills by analyzing vast amounts of data.

The image creator Midjourney, for example, is underpinned by a neural network that has analyzed millions of digital images and the captions that describe each of those images. The system learned to recognize the links between the images and the words. So when you ask it for an image of a rhinoceros leaping off the Golden Gate Bridge, it knows what to do. Profluent’s technology is driven by a similar A.I. model that learns from sequences of amino acids and nucleic acids — the chemical compounds that define the microscopic biological mechanisms that scientists use to edit genes. Essentially, it analyzes the behavior of CRISPR gene editors pulled from nature and learns how to generate entirely new gene editors. “These A.I. models learn from sequences — whether those are sequences of characters or words or computer code or amino acids,” said Profluent’s chief executive, Ali Madani, a researcher who previously worked in the A.I. lab at the software giant Salesforce.

Profluent has not yet put these synthetic gene editors through clinical trials, so it is not clear if they can match or exceed the performance of CRISPR. But this proof of concept shows that A.I. models can produce something capable of editing the human genome. Still, it is unlikely to affect health care in the short term. Fyodor Urnov, a gene editing pioneer and scientific director at the Innovative Genomics Institute at the University of California, Berkeley, said scientists had no shortage of naturally occurring gene editors that they could use to fight illness and disease. The bottleneck, he said, is the cost of pushing these editors through preclinical studies, such as safety, manufacturing and regulatory reviews, before they can be used on patients.

But generative A.I. systems often hold enormous potential because they tend to improve quickly as they learn from increasingly large amounts of data. If technology like Profluent’s continues to improve, it could eventually allow scientists to edit genes in far more precise ways. The hope, Dr. Urnov said, is that this could, in the long term, lead to a world where medicines and treatments are quickly tailored to individual people even faster than we can do today. “I dream of a world where we have CRISPR on demand within weeks,” he said. Scientists have long cautioned against using CRISPR for human enhancement because it is a relatively new technology that could potentially have undesired side effects, such as triggering cancer, and have warned against unethical uses, such as genetically modifying human embryos. This is also a concern with synthetic gene editors. But scientists already have access to everything they need to edit embryos. “A bad actor, someone who is unethical, is not worried about whether they use an A.I.-created editor or not,” Dr. Fraser said. “They are just going to go ahead and use what’s available.”


“model of CRISPR-Cas9 gene editing complex from Streptococcus pyogenes

PROTEIN LANGUAGE MODELS
https://arcinstitute.org/tools/evo
https://biorxiv.org/content/10.1101/2024.02.27.582234v2
https://biorxiv.org/content/10.1101/2024.04.22.590591v1
https://sciencedirect.com/science/article/abs/pii/S2405471223002727
https://nature.com/articles/d41586-024-01243-w
‘ChatGPT for CRISPR’ creates new gene-editing tools
by Ewen Callaway  /  29 April 2024

“In the never-ending quest to discover previously unknown CRISPR gene-editing systems, researchers have scoured microbes in everything from hot springs and peat bogs, to poo and even yogurt. Now, thanks to advances in generative artificial intelligence (AI), they might be able to design these systems with the push of a button. This week, researchers published details of how they used a generative AI tool called a protein language model — a neural network trained on millions of protein sequences — to design CRISPR gene-editing proteins, and were then able to show that some of these systems work as expected in the laboratory1.


“Generated nucleases function as gene editors in human cells”

And in February, another team announced that it had developed a model trained on microbial genomes, and used it to design fresh CRISPR systems, which are comprised of a DNA or RNA-cutting enzyme and RNA molecules that direct the molecular scissors as to where to cut2. “It’s really just scratching the surface. It’s showing that it’s possible to design these complex systems with machine-learning models,” says Ali Madani, a machine-learning scientist and chief executive of the biotechnology firm Profluent, based in Berkeley, California. Madani’s team reported what it says is “the first successful editing of the human genome by proteins designed entirely with machine learning” in a 22 April preprint1 on bioRxiv.org (which hasn’t been peer-reviewed).

Alan Wong, a synthetic biologist at the University of Hong Kong, whose team has used machine learning to optimize CRISPR3, says that naturally occurring gene-editing systems have limitations in terms of the sequences that they can target and the sort of changes that they can make. For some applications, therefore, it can be a challenge to find the right CRISPR. “Expanding the repertoire of editors, using AI, could help,” he says. Whereas chatbots such as ChatGPT are designed to handle language after being trained on existing text, the CRISPR-designing AIs were instead trained on vast troves of biological data in the form of protein or genome sequences. The goal of this ‘pre-training’ step is to imbue the models with insight into naturally occurring genetic sequences, such as which amino acids tend to go together.

This information can then be applied to tasks such as the creation of totally new sequences. Madani’s team previously used a protein language model they developed, called ProGen, to come up with new antibacterial proteins4. To devise new CRISPRs, his team retrained an updated version of ProGen with examples of millions of diverse CRISPR systems, which bacteria and other single-celled microbes called archaea use to fend off viruses. Because CRISPR gene-editing systems comprise not only proteins, but also RNA molecules that specify their target, Madani’s team developed another AI model to design these ‘guide RNAs’. The team then used the neural network to design millions of new CRISPR protein sequences that belong to dozens of different families of such proteins found in nature.


“Formation of the CRISPR-Cas Atlas”

To see whether AI-designed CRISPRs were bona fide gene editors, Madani’s team synthesized DNA sequences corresponding to more than 200 protein designs belonging to the CRISPR–Cas9 system that is now widely used in the laboratory. When they inserted these sequences — instructions for a Cas9 protein and a ‘guide RNA’ — into human cells, many of the gene editors were able to precisely cut their intended targets in the genome. The most promising Cas9 protein — a molecule they’ve named OpenCRISPR-1 — was just as efficient at cutting targeted DNA sequences as a widely used bacterial CRISPR–Cas9 enzyme, and it made far fewer cuts in the wrong place. The researchers also used the OpenCRISPR-1 design to create a base editor — a precision gene-editing tool that changes individual DNA ‘letters’ — and found that it, too, was as efficient as other base-editing systems, as well as less prone to errors.

Another team, led by Brian Hie, a computational biologist at Stanford University in California, and by bioengineer Patrick Hsu at the Arc Institute in Palo Alto, California, used an AI model capable of generating both protein and RNA sequences. Their model, called EVO, was trained on 80,000 genomes from bacteria and archaea, as well as other microbial sequences, amounting to 300 billion DNA letters. Hie and Hsu’s team has not yet tested its designs in the lab. But predicted structures of some of the CRISPR–Cas9 systems they designed resemble those of natural proteins. Their work was described in a preprint2 posted on bioRxiv.org, and has not been peer-reviewed.

“This is amazing,” says Noelia Ferruz Capapey, a computational biologist at the Molecular Biology Institute of Barcelona in Spain. She’s impressed by the fact that researchers can use the OpenCRISPR-1 molecule without restriction, unlike with some patented gene-editing tools. The ProGen2 model and ‘atlas’ of CRISPR sequences used to fine-tune it are also freely available. The hope is that AI-designed gene-editing tools could be better suited to medical applications than are existing CRISPRs, says Madani. Profluent, he adds, is hoping to partner with companies that are developing gene-editing therapies to test AI-generated CRISPRs. “It really necessitates precision and a bespoke design. And I think that just can’t be done by copying and pasting” from naturally-occurring CRISPR systems, he says.”

doi: https://doi.org/10.1038/d41586-024-01243-w

References
1. Ruffolo, J. A. et al. Preprint at bioRxiv https://doi.org/10.1101/2024.04.22.590591 (2024).
2. Nguyen, E. et al. Preprint at bioRxiv https://doi.org/10.1101/2024.02.27.582234 (2024).
3. Thean, D. G. L. et al. Nature Commun. 13, 2219 (2022). Article PubMed Scholar
4. Madani, A. et al. Nature Biotechnol. 41, 1099–1106 (2023). Article PubMed

PREVIOUSLY

RECREATIONAL GENETICS
http://spectrevision.net/2015/05/08/recreational-genetics/
SOFT EUGENICS
https://spectrevision.net/2016/01/22/designer-babies/
GENETICALLY ENHANCED ASTRONAUTS
https://spectrevision.net/2017/04/20/genetically-enhanced-astronauts/

Leave a Reply