Researcher in bioinformatics @institutpasteur and @CNRS. Tweets about methods for DNA sequencing data analysis, and genome assembly.

Joined September 2011
🌎👩‍🔬 For 15+ years biology has accumulated petabytes (million gigabytes) of🧬DNA sequencing data🧬 from the far reaches of our planet.🦠🍄🌵 Logan now democratizes efficient access to the world’s most comprehensive genetics dataset. Free and open. doi.org/10.1101/2024.07.30.6…
Rayan Chikhi retweeted
New preprint alert! 📣 We use Logan, a new database of SRA-wide genome assemblies, to look for an emerging virus in prawns. If you've ever spent too long trying to track down a known sequence in SRA data, this database is a real game changer. lnkd.in/e9S5GhPc
1
2
2
After years of research and continuous refinement, we’re thrilled to share that our paper on the MetaGraph framework — enabling Petabase-scale search across sequencing data — has been published today in Nature (nature.com/articles/s41586-0…).
Rayan Chikhi retweeted
Our trimodal pLM, ProTrek, is now in @NatureBiotech! Search protein by function, not just seq/struc: BLAST ➡️ 🧬 Seq Foldseek ➡️ 🏗️ Struc ProTrek ➡️ 💡 Func (Text) 🔗 Try: search-protrek.com/ 🐙 GitHub: github.com/westlake-repl/Pro… 📊 5B embeddings: protrek.westlake.edu.cn/
🚀We just released all embeddings for 6 BILLION proteins! ProTrek's tri-modal AI (seq + structure + function) is now fully open source. Deploy your own protein search server ! 🔬 Try it: search-protrek.com/ 📊 Data: protrek.westlake.edu.cn/ 💻 Code: github.com/westlake-repl/Pro…
6
30
3
134
Rayan Chikhi retweeted
#RECOMB2026 is now accepting submissions and we'd love to see your best work! 📌 Abstract registration: Nov 7, 2025 📌 Full paper submission: Nov 14, 2025 📜More info: recomb.org/recomb2026/call_f…
6
1
11
#RECOMB2026 will be in Thessaloniki, Greece on May 26-29, 2026. Satellites on May 24-25. Save the date! Το συνέδριο #RECOMB2026 θα πραγματοποιηθεί στη Θεσσαλονίκη, στις 26-29 Μαΐου 2026. Οι δορυφορικές εκδηλώσεις θα διεξαχθούν στις 24-25 Μαΐου 2026. Σημειώστε την ημερομηνία!
12
1
28
Rayan Chikhi retweeted
Do you know ~60% of human SVs fall in ~1% of GRCh38? See our new preprint: arxiv.org/abs/2509.23057 and the companion blog post on how we started this project and longdust: lh3.github.io/2025/09/29/on-…. Work with @QianAlvinQin1
3
89
2
329
Rayan Chikhi retweeted
Efficient sequence alignment against millions of prokaryotic genomes with LexicMap go.nature.com/3K09TgJ
Earth’s genetic diversity is a heritage of humanity. It has been an honour to explore this data with a team of dedicated scientists who shared our vision of making this data free and accessible to all 🌍🧬❤️ Thank you! Updated preprint: doi.org/10.1101/2024.07.30.6…
1
1
23
This is a new frontier for biological discovery and AI training data. Logan expands the universe of known proteins, plasmids, AMR, P4 satellites, and the newly discovered Obelisk RNA elements.
1
2
11
All Logan data is freely-available (cc0) right now. We show how Logan-Search (logan-search.org) can be used to uncover viral reactivation (HHV-6) in cell therapy products (TIL and CAR-T).
1
3
14
Logan rapidly accesses Life’s genetic diversity and can help solve global issues. To tackle the microplastic crisis, we searched Logan for new versions of the 213 known plastic-degrading enzymes. We identified 200+ million homologs🤯, including new high-efficiency enzymes🥤🔥
1
4
18
Logan enables minute-scale k-mer search, and hour-scale deep homology protein alignment search, across 100+ Billion proteins. logan-search.org
2
1
22
One year after our initial preprint, we're excited to post a major update to Logan. At its heart, Logan is the assembly of 27 million samples (50 Pbp) using a 6-day cloud-compute peaking at 2.2M vCPUs. This compresses the SRA 140x compared to raw FASTQs. github.com/IndexThePlanet/Lo…
1
7
1
35
Rayan Chikhi retweeted
We often talk about pangenome reference but what does "reference" mean? If it means the whole graph, what is a variant? My collaborators Pouria and Luke give a clean answer: the reference is a spanning tree including GRCh38 and variants are leftover edges. See preprint for more.
New preprint on a surprising question - with a pangenome reference, *what is a genetic variant?* biorxiv.org/content/10.1101/… With Pouria Salehi Nowbandani, Shenghan Zhang, Haoyang Hu, and Heng Li @lh3lh3
1
46
2
169
In 1965, Margaret Dayhoff published the Atlas of Protein Sequence and Structure, which collated the 65 proteins whose amino acid sequences were then known. Inspired by that Atlas, today we are releasing the Dayhoff Atlas of protein sequence data and protein language models.
6
88
3
297