Jian Ma · Sep 2, 2025 · 8:28 PM UTC

Jian Ma

Ellie Haber retweeted

Jian Ma @jmuiuc

Sep 2

Final version of our L2G paper is now published at @TmlrOrg! Kudos to @WenduoC for leading this work. Paper: openreview.net/forum?id=5NM4…

L2G: Repurposing Language Models for Genomics Tasks

Pre-trained language models have transformed the field of natural language processing (NLP), and their success has inspired efforts in genomics to develop domain-specific foundation models (FMs)....

openreview.net

Jian Ma @jmuiuc

11 Dec 2024

Can we skip genomic Foundation Model pretraining? Our work L2G repurposes language LLMs for genomics via cross-modal transfer, matching fine-tuned genomic FMs. Kudos to @WenduoC & amazing collab w/ @atalwalkar. L2G, language to genome; L2G, life’s too good biorxiv.org/content/10.1101/…

Jian Ma · Jul 26, 2025 · 6:27 PM UTC

Ellie Haber retweeted

Jian Ma @jmuiuc

Jul 26

My amazing PhD student Wendy Yang @muyu_wendy_yang is graduating this summer & seeking industry R&D roles! She's published in ISMB, Nature Methods, and interned at Genentech. Strong in #AI/#ML for gene regulation. Looking for top AI+bio talent? Contact Wendy: muyuy@andrew.cmu.edu

Jian Ma @jmuiuc

Jun 27

Congrats to @muyu_wendy_yang on a successful PhD thesis defense today! Wendy developed a series of ML methods to study genome organization & function, and genome editing - expanding our toolkit for uncovering genome principles. Here is a photo with the happy thesis committee 🎉

Fahim Tajwar · May 28, 2025 · 7:52 PM UTC

Ellie Haber retweeted

Fahim Tajwar @FahimTajwar10

May 28

RL with verifiable reward has shown impressive results in improving LLM reasoning, but what can we do when we do not have ground truth answers? Introducing Self-Rewarding Training (SRT): where language models provide their own reward for RL training! 🧵 1/n

141

841

Jian Ma · May 24, 2025 · 5:00 AM UTC

Ellie Haber retweeted

Jian Ma @jmuiuc

May 24

We introduce EYKTHYR, a computational method that integrates gene expression and chromatin accessibility in a spatially aware model to identify transcription factors shaping spatial gene programs. Led by Lane Fellow @SpencerKrieger #spatialtranscriptomics biorxiv.org/content/10.1101/…

EYKTHYR reveals transcriptional regulators of spatial gene programs

Understanding how transcription factors (TFs) orchestrate gene regulatory networks that define complex tissue structures is central to uncovering tissue organization and disease mechanisms. Although...

biorxiv.org

117

Ellie Haber · May 14, 2025 · 1:02 PM UTC

Ellie Haber @ellie_haber

May 14

Check out this incredible work led by @alam_shahul 🥼

Jian Ma @jmuiuc

May 14

We introduce #POPARI, an interpretable, spatially-aware factor-based model for multi-sample #spatialtranscriptomics. Huge kudos to @alam_shahul for his incredible effort (yes, 80+ equations!). In collaboration with @immunoliugy and @insitubiology. biorxiv.org/content/10.1101/…

Jian Ma · May 14, 2025 · 5:45 AM UTC

Ellie Haber retweeted

Jian Ma @jmuiuc

May 14

In our lab, we take developing computational methods seriously - and we name them well ! Here are some of our #spatialtranscriptomics ML methods so far 👇More to come ..

123

Stephen Turner 🦋 @stephenturner.us · May 12, 2025 · 8:00 PM UTC

Ellie Haber retweeted

Stephen Turner 🦋 @stephenturner.us @strnr

May 12

Unified integration of spatial transcriptomics across platforms biorxiv.org/content/10.1101/… 🧬🖥️🧪 github.com/elliehaber07/LLOK…

Jian Ma · May 14, 2025 · 5:41 AM UTC

Ellie Haber retweeted

Jian Ma @jmuiuc

May 14

Popari: Modeling multisample variation in spatial transcriptomics

Integrating spatially-resolved transcriptomics (SRT) across biological samples is essential for understanding dynamic changes in tissue architecture and cell-cell interactions in situ . While tools...

biorxiv.org

Winter is Coming · Apr 8, 2025 · 3:56 PM UTC

Ellie Haber retweeted

Winter is Coming @WiCnet

Apr 8

George R.R. Martin holds the first new dire wolf born in 10,000 years

2,131

14,582

4,134

239,190

Lukas Valihrach · Apr 7, 2025 · 7:41 AM UTC

Ellie Haber retweeted

Lukas Valihrach @LukasValihrach

Apr 7

Unified integration of spatial transcriptomics across platforms biorxiv.org/content/10.1101/…

Jacob Springer · Mar 26, 2025 · 6:16 PM UTC

Ellie Haber retweeted

Jacob Springer @jacspringer

Mar 26

Training with more data = better LLMs, right? 🚨 False! Scaling language models by adding more pre-training data can decrease your performance after post-training! Introducing "catastrophic overtraining." 🥁🧵+arXiv 👇 1/9

177

820

Fahim Tajwar · Mar 7, 2025 · 1:41 PM UTC

Ellie Haber retweeted

Fahim Tajwar @FahimTajwar10

Mar 7

Interacting with the external world and reacting based on outcomes are crucial capabilities of agentic systems, but existing LLMs’ ability to do so is limited. Introducing Paprika 🌶️, our work on making LLMs general decision makers than can solve new tasks zero-shot. 🧵 1/n

464

Jian Ma · Jan 8, 2025 · 6:54 PM UTC

Ellie Haber retweeted

Jian Ma @jmuiuc

Jan 8

Proper and meaningful benchmark datasets are crucial for advancing genomic LLMs/FMs, and ML methods for genomics in general. Fantastic collab w/ @lileics's group. Amazing work led by @WenduoC @ZhenqiaoSong @zocean636

bioRxiv Bioinfo @biorxiv_bioinfo

Jan 8

DNALONGBENCH: A Benchmark Suite for Long-Range DNA Prediction Tasks biorxiv.org/cgi/content/shor… #biorxiv_bioinfo

Jian Ma · Dec 11, 2024 · 1:36 PM UTC

Ellie Haber retweeted

Jian Ma @jmuiuc

11 Dec 2024

L2G: Repurposing Language Models for Genomics Tasks

Pre-trained language models have transformed the field of natural language processing (NLP), and their success has inspired efforts in genomics to develop domain-specific foundation models (FMs)....

biorxiv.org

117

Yiding Jiang · Oct 21, 2024 · 11:25 PM UTC

Ellie Haber retweeted

Yiding Jiang

@yidingjiang

21 Oct 2024

Selecting good pretraining data is crucial, but rarely economical. Introducing ADO, an online solution to data selection with minimal overhead. 🧵 1/n

348

Jian Ma · Oct 2, 2024 · 6:15 PM UTC

Ellie Haber retweeted

Jian Ma @jmuiuc

2 Oct 2024

Thrilled to launch the AI4BIO Center @CarnegieMellon! Our goal is to tackle grand challenges in understanding how cells work using AI/ML. Excited to help recruit faculty and foster collaboration across @SCSatCMU and campus. There is truly no place like CMU cs.cmu.edu/news/2024/ai4bio

CMU Launches Center for AI-Driven Biomedical Research

The new Center for AI-Driven Biomedical Research (AI4BIO) will use novel artificial intelligence and machine learning methods to illuminate fundamental aspects of gene regulation, cellular function,...

cs.cmu.edu

246

Kevin Li · Aug 20, 2024 · 6:01 PM UTC

Ellie Haber retweeted

Kevin Li @kevinyli_

20 Aug 2024

Attention is all you need; at least the matrices are, if you want to distill Transformers into alternative architectures, like Mamba, with our new distillation method: MOHAWK! We also release a fully subquadratic, performant 1.5B model distilled from Phi-1.5 with only 3B tokens!

472

Jian Ma · Aug 9, 2024 · 3:37 PM UTC

Ellie Haber retweeted

Jian Ma @jmuiuc

9 Aug 2024

Excited to share our @NatureMethods paper on pitfalls & opportunities of using #interpretable #AI in comp bio in the #LLM era. Great collab w/ @atalwalkar's lab. Huge kudos to @valeriechen_ & @muyu_wendy_yang! @CMUCompBio @mldcmu @SCSatCMU @CarnegieMellon nature.com/articles/s41592-0…

Applying interpretable machine learning in computational biology—pitfalls, recommendations and...

Nature Methods - This Perspective discusses the methodologies, application and evaluation of interpretable machine learning (IML) approaches in computational biology, with particular focus on...

nature.com

137

Yiding Jiang · Apr 8, 2024 · 8:31 PM UTC

Ellie Haber retweeted

Yiding Jiang

@yidingjiang

8 Apr 2024

Models with different randomness make different predictions at test time even if they are trained on the same data. In our latest ICLR paper (oral), we investigate how models learn different features, and the effect this has on agreement and (potentially) calibration. 1/

142

Jian Ma · Apr 8, 2024 · 11:50 AM UTC

Ellie Haber retweeted

Jian Ma @jmuiuc

8 Apr 2024

Excited to share scGHOST, now published @NatureMethods, graph-based #ML identifying #3Dgenome subcompartments in single cells. Kudos to @KyleXiongCMU & @RuochiZhang, who worked so closely on this. Exciting time for single-cell epigenomics & multiomics! nature.com/articles/s41592-0…

204

GIF