Machine Learning & Systems Biology. ML Group Leader @arcinstitute. PhD @StanfordAILab

Palo Alto, CA
Joined May 2009
Cells are dynamic, messy and context dependent. Scaling models across diverse states needs flexibility to capture heterogeneity Introducing State, a transformer that predicts perturbation effects by training over sets of cells Team effort led by the unstoppable @abhinadduri
Yusuf Roohani retweeted
Incredibly proud of my student @_rishabhranjan_ and our collaboration with @SAP on this exciting work! 🚀 We’re bringing the power of Transformers beyond sequences—into the world of relational data that underpins enterprise applications. A great example of how foundational research meets real-world impact. 👇
Transformers are great for sequences, but most business-critical predictions (e.g. product sales, customer churn, ad CTR, in-hospital mortality) rely on highly-structured relational data where signal is scattered across rows, columns, linked tables and time. Excited to finally share what I have been working on over the last year: a Foundation Model architecture which brings the power of Transformers to relational domains, enabling large-scale pretraining and zero-shot generalization in enterprise settings. 🧵1/n
1
16
100
Yusuf Roohani retweeted
Published today in @NatureBiotech, @LaineGoudy. @LukeGilbertSF, Alex Marson, and colleagues report a new epigenetic editing platform that safely reprograms multiple genes in human T cells without many of the challenges & risks associated with traditional gene editing approaches.
4
65
5
314
Yusuf Roohani retweeted
Excited our Heimdall project will be presented at the upcoming #scverse2025 workshop hosted by @arcinstitute @scverse_team! Truly a team effort from many in the group: @ellie_haber @alam_shahul @SpencerKrieger @eigenNick. Look forward to community feedback - more details soon!
Register today for the Arc-scverse workshop on modeling cellular perturbation data! Talks ranging from data generation and screen design all the way to data loaders and ML models + demos of new packages from Arc, broader community + panel discussion on planning ahead for VCC2!
2
24
Register today for the Arc-scverse workshop on modeling cellular perturbation data! Talks ranging from data generation and screen design all the way to data loaders and ML models + demos of new packages from Arc, broader community + panel discussion on planning ahead for VCC2!
1
7
2
24
great to have you on board, excited to build together!
🧵Recently started at @ArcInstitute to do a 6-month research fellowship. My last job was building React frontends and coding agents at Retool, so friends are understandably confused by my pivot to bio. This notion though that tech and bio are disparate is, I believe, misconceived, so I decided to write about it (link in thread).
3
Yusuf Roohani retweeted
Today in @ScienceMagazine, we report a new DNA editing technology to seamlessly write massive changes into the right place in the human genome. The reason gene editing hasn't transformed human health is that current gene editing technologies like CRISPR are very limited. The problem with CRISPR is that it cuts up your DNA, and then hopes that unreliable cellular DNA repair will make the wanted edit. @geochurch famously called it genome vandalism. More precise versions of CRISPR only edit less than 100 bases - often only a single base. Therefore, it's not suited to make large changes safely. However, most diseases are not the result of mutations in one location. Instead, their causes are spread all across the 3 billion base pairs in the genome. We found bridge RNAs in bacterial “jumping genes” that allow us to make safe and arbitrary changes (insert, cut out, or flip) to every nucleotide within (up to) a 1 million bp sequence in your DNA. In the paper, we show that we can correct the disease-causing DNA repeats that cause Friedreich's ataxia (which is a rare neurological disease). The same approach could be applied to Huntington’s and other repeat expansion disorders. At @arcinstitute, we're working towards a full Turing machine for biology. Evo, our DNA foundation model, helps us design the optimal healthy DNA sequences. And Bridge recombination gives us the ability to seamlessly write these changes into the right place in the genome. This work was a wonderful collaboration with my @arcinstitute cofounder @SKonermann and led by the indefatigable @ntperry13, alongside our amazing bridge editing team: @BartieLiam @dhruvakatrekar @Gabogonzalez515 @mgdurrant @james_jw_pai @AlisonFanton Juliana Martins Masa Hiraizumi @chiaroscurale @hnisimasu
What if we could universally recombine, insert, delete, or invert any two pieces of DNA? In back-to-back @Nature papers, we report the discovery of bridge RNAs and 3 atomic structures of the first natural RNA-guided recombinase - a new mechanism for programmable genome design
74
675
81
2,939
Yusuf Roohani retweeted
#CELLFIE for CAR T screening, out in @Nature today—a new mRNA-based platform for screening primary cells. CAR + gRNA library are delivered by lentivirus, CRISPR modifiers as electroporated mRNA. That’s more flexible and effective than existing T cell screening methods. (1/7)
8
56
5
281
Yusuf Roohani retweeted
There's a bunch more cool stuff coming over the next few months.
largest biology models in the world, AI-generated genomes, and now de novo antibody design arc is the place to do research today
Yusuf Roohani retweeted
Today, we report Germinal, a method for efficient de novo antibody design, with @santimillef and @SynBioGaoLab. Germinal achieves success rates of 4-22% across diverse epitopes. We make the work fully open, without doing lame things like posting a preprint without methods. 🧵
Yusuf Roohani retweeted
Grateful to be at Arc, it was the best place for this work!
In a new preprint from @brianhie’s lab, the team reports the first generative design of viable bacteriophage genomes. Leveraging Evo 1 & Evo 2, they generated whole genome sequences, resulting in 16 viable phages with distinct genomic architectures.
5
7
140
Yusuf Roohani retweeted
Welcome to the age of generative genome design! In 1977, Sanger et al. sequenced the first genome—of phage ΦX174. Today, led by @samuelhking, we report the first AI-generated genomes. Using ΦX174 as a template, we made novel, high-fitness phages with genome language models. 🧵
33
212
31
1,000
Yusuf Roohani retweeted
What if we could manipulate biology as quickly and easily as we do software? That is the bet @pdhsu and @arcinstitute are making. Science is slow because biology is the most complex system we study and because we do not natively speak its language (DNA). Those constraints are not yet within our control. But science is also slow because research is fragmented, incentives are misaligned, and collaboration is rare - all of which we can control. Arc is rethinking this model by creating an environment that increases the “collision frequency” of scientists across disciplines so they can collaborate more naturally. They are also building computational tools like virtual cell models that could accelerate discovery the way AlphaFold transformed protein science. On the @a16z podcast, @eriktorenberg and I sat down with Patrick for what I believe is a definitive conversation on the future of science and biotech.
Yusuf Roohani retweeted
new colab notebooks! also all 200M+ single cells used in the STATE study are available on @huggingface. Embed your data with State Embedding: colab.research.google.com/dr… Run inference with pre-trained models on Tahoe-100M: colab.research.google.com/dr… Train a State Transition model for genetic perturbation prediction: colab.research.google.com/dr… Train STATE for VCC: colab.research.google.com/dr…
1
14
99
Yusuf Roohani retweeted
We put a blog post together to tell you about all that went behind the scenes for @arcinstitute's Virtual Cell Challenge: (1) the modality used for genetic perturbations, (2) choice of single cell RNA seq chemistry, (3) which cell line to use for the challenge, (4) which genes to perturb, (5) the quality considerations for the resulting dataset, and (6) how we defined performance metrics for perturbation predictions.
Yusuf Roohani retweeted
Palestinian man and his daughter amongst wild mustard flowers (Gaza,2014) ©️ Mohammed Abed
8
2,338
7
12,108
Yusuf Roohani retweeted
Computing x engineering biology is a playground for systems. Come join us for round three of the SF systems reading group with a focus on biotech. - Noam Teyssier (Arc Institute): BINSEQ: A Family of High Performance Binary Formats for Nucleotide Sequences ​- Aidan Abdullali (LatchBio): A Distributed Filesystem Built on Postgres and S3 - ​Abhinav Adduri (Arc Institute): Scaling Deep Learning to 1B+ Single Cells Presentations on design decisions and paper highlights. We'll read snippets of source code, learn from each other and vibe. 5:30ish on 8/20
12
20
252
Yusuf Roohani retweeted
Check out our notebook to train a STATE model for perturbation prediction: Also taking feature requests for more notebooks. We plan to release notebooks (soon): for using the SE-600M model, running inference on Tahoe, and data pre-processing
@abhinadduri walks through a @GoogleColab notebook tutorial showing how to train Arc's STATE model for context generalization in the Virtual Cell Challenge.
8
59
Yusuf Roohani retweeted
the most impactful things you can do with your life right now 1 accelerate ai 2 accelerate bio 3 end genocide
7
2
77