Rayan Chikhi · Sep 3, 2025 · 8:28 AM UTC

Rayan Chikhi

Pinned Tweet

Rayan Chikhi @RayanChikhi

Sep 3

🌎👩‍🔬 For 15+ years biology has accumulated petabytes (million gigabytes) of🧬DNA sequencing data🧬 from the far reaches of our planet.🦠🍄🌵 Logan now democratizes efficient access to the world’s most comprehensive genetics dataset. Free and open. doi.org/10.1101/2024.07.30.6…

151

379

Chantelle Hooper · Aug 16, 2025 · 4:46 PM UTC

Rayan Chikhi retweeted

Chantelle Hooper @ChantelleHooper

Aug 16

New preprint alert! 📣 We use Logan, a new database of SRA-wide genome assemblies, to look for an emerging virus in prawns. If you've ever spent too long trying to track down a known sequence in SRA data, this database is a real game changer. lnkd.in/e9S5GhPc

Andre Kahles (@akkah21@genomic.social) · Oct 8, 2025 · 8:49 PM UTC

Rayan Chikhi retweeted

Andre Kahles (@akkah21@genomic.social) @akkah21

Oct 8

After years of research and continuous refinement, we’re thrilled to share that our paper on the MetaGraph framework — enabling Petabase-scale search across sequencing data — has been published today in Nature (nature.com/articles/s41586-0…).

Efficient and accurate search in petabase-scale sequence repositories

Nature - MetaGraph enables scalable indexing of large sets of DNA, RNA or protein sequences using annotated de Bruijn graphs.

nature.com

fajie yuan · Oct 2, 2025 · 2:15 PM UTC

Rayan Chikhi retweeted

fajie yuan @duguyuan

Oct 2

Our trimodal pLM, ProTrek, is now in @NatureBiotech! Search protein by function, not just seq/struc: BLAST ➡️ 🧬 Seq Foldseek ➡️ 🏗️ Struc ProTrek ➡️ 💡 Func (Text） 🔗 Try: search-protrek.com/ 🐙 GitHub: github.com/westlake-repl/Pro… 📊 5B embeddings: protrek.westlake.edu.cn/

fajie yuan @duguyuan

Aug 22

🚀We just released all embeddings for 6 BILLION proteins! ProTrek's tri-modal AI (seq + structure + function) is now fully open source. Deploy your own protein search server ! 🔬 Try it: search-protrek.com/ 📊 Data: protrek.westlake.edu.cn/ 💻 Code: github.com/westlake-repl/Pro…

134

RECOMB Conference Series · Oct 2, 2025 · 11:59 AM UTC

Rayan Chikhi retweeted

RECOMB Conference Series @RECOMBconf

Oct 2

#RECOMB2026 is now accepting submissions and we'd love to see your best work! 📌 Abstract registration: Nov 7, 2025 📌 Full paper submission: Nov 14, 2025 📜More info: recomb.org/recomb2026/call_f…

RECOMB Conference Series · Sep 26, 2025 · 2:35 PM UTC

Rayan Chikhi retweeted

RECOMB Conference Series @RECOMBconf

Sep 26

#RECOMB2026 will be in Thessaloniki, Greece on May 26-29, 2026. Satellites on May 24-25. Save the date! Το συνέδριο #RECOMB2026 θα πραγματοποιηθεί στη Θεσσαλονίκη, στις 26-29 Μαΐου 2026. Οι δορυφορικές εκδηλώσεις θα διεξαχθούν στις 24-25 Μαΐου 2026. Σημειώστε την ημερομηνία!

Heng Li · Sep 30, 2025 · 2:19 AM UTC

Rayan Chikhi retweeted

Heng Li @lh3lh3

Sep 30

Do you know ~60% of human SVs fall in ~1% of GRCh38? See our new preprint: arxiv.org/abs/2509.23057 and the companion blog post on how we started this project and longdust: lh3.github.io/2025/09/29/on-…. Work with @QianAlvinQin1

329

Nature Biotechnology · Sep 10, 2025 · 4:06 PM UTC

Rayan Chikhi retweeted

Nature Biotechnology

@NatureBiotech

Sep 10

Efficient sequence alignment against millions of prokaryotic genomes with LexicMap go.nature.com/3K09TgJ

Hormozdiari Lab · Sep 3, 2025 · 6:17 PM UTC

Rayan Chikhi retweeted

Hormozdiari Lab @DavisCompGen

Sep 3

Our preprint on the challenges of benchmarking Structural Variants (SVs) is out in BioRxiv biorxiv.org/content/10.1101/…

Anyone can be the best: Impact of diverse methodologies on the evaluation of structural variant...

Structural variants (SVs) are medium and large-scale genomic alterations that shape phenotypic diversity and disease risk. Numerous methods have been proposed for discovering SVs, however their...

biorxiv.org

Rayan Chikhi · Sep 3, 2025 · 8:28 AM UTC

Rayan Chikhi · Sep 3, 2025 · 8:28 AM UTC

Rayan Chikhi @RayanChikhi

Sep 3

Earth’s genetic diversity is a heritage of humanity. It has been an honour to explore this data with a team of dedicated scientists who shared our vision of making this data free and accessible to all 🌍🧬❤️ Thank you! Updated preprint: doi.org/10.1101/2024.07.30.6…

Rayan Chikhi · Sep 3, 2025 · 8:28 AM UTC

Rayan Chikhi @RayanChikhi

Sep 3

This is a new frontier for biological discovery and AI training data. Logan expands the universe of known proteins, plasmids, AMR, P4 satellites, and the newly discovered Obelisk RNA elements.

Rayan Chikhi · Sep 3, 2025 · 8:28 AM UTC

Rayan Chikhi @RayanChikhi

Sep 3

All Logan data is freely-available (cc0) right now. We show how Logan-Search (logan-search.org) can be used to uncover viral reactivation (HHV-6) in cell therapy products (TIL and CAR-T).

Rayan Chikhi · Sep 3, 2025 · 8:28 AM UTC

Rayan Chikhi @RayanChikhi

Sep 3

Logan rapidly accesses Life’s genetic diversity and can help solve global issues. To tackle the microplastic crisis, we searched Logan for new versions of the 213 known plastic-degrading enzymes. We identified 200+ million homologs🤯, including new high-efficiency enzymes🥤🔥

Rayan Chikhi · Sep 3, 2025 · 8:28 AM UTC

Rayan Chikhi @RayanChikhi

Sep 3

Logan enables minute-scale k-mer search, and hour-scale deep homology protein alignment search, across 100+ Billion proteins. logan-search.org

Rayan Chikhi · Sep 3, 2025 · 8:28 AM UTC

Rayan Chikhi @RayanChikhi

Sep 3

One year after our initial preprint, we're excited to post a major update to Logan. At its heart, Logan is the assembly of 27 million samples (50 Pbp) using a 6-day cloud-compute peaking at 2.2M vCPUs. This compresses the SRA 140x compared to raw FASTQs. github.com/IndexThePlanet/Lo…

Heng Li · Aug 7, 2025 · 4:55 PM UTC

Rayan Chikhi retweeted

Heng Li @lh3lh3

Aug 7

We often talk about pangenome reference but what does "reference" mean? If it means the whole graph, what is a variant? My collaborators Pouria and Luke give a clean answer: the reference is a spanning tree including GRCh38 and variants are leftover edges. See preprint for more.

Luke O'Connor @Luke0connor

Aug 7

New preprint on a surprising question - with a pangenome reference, *what is a genetic variant?* biorxiv.org/content/10.1101/… With Pouria Salehi Nowbandani, Shenghan Zhang, Haoyang Hu, and Heng Li @lh3lh3

169

Kevin K. Yang 楊凱筌 · Jul 25, 2025 · 8:14 PM UTC

Rayan Chikhi retweeted

Kevin K. Yang 楊凱筌 @KevinKaichuang

Jul 25

In 1965, Margaret Dayhoff published the Atlas of Protein Sequence and Structure, which collated the 65 proteins whose amino acid sequences were then known. Inspired by that Atlas, today we are releasing the Dayhoff Atlas of protein sequence data and protein language models.

297

Bernardo Rodriguez Martin · Jul 23, 2025 · 4:48 PM UTC

Rayan Chikhi retweeted

Bernardo Rodriguez Martin @BerniRdgz

Jul 23

*New Open-Access Long Read Resource*. We sequenced 1,019 genomes from the 1000 Genomes Project sample cohort using @nanopore. Sequencing data is available at bit.ly/4m8dlE2. @embl @HHU_de @IMPvienna @CRGenomica nature.com/articles/s41586-0… [1/8]

Structural variation in 1,019 diverse humans based on long-read sequencing

Nature - Intermediate-coverage long-read sequencing in 1,019 diverse humans from the 1000 Genomes Project, representing 26 populations, enables the generation of comprehensive population-scale...

nature.com

191

Sebastian Deorowicz · Jul 19, 2025 · 9:26 PM UTC

Rayan Chikhi retweeted

Sebastian Deorowicz @sdeorowicz

Jul 19

Interested in a tool that aligns millions of proteins in minutes with quality similar to or better than the state-of-the-art utilities? Please take a look at our FAMSA2 paper: biorxiv.org/content/10.1101/… and GH repo: github.com/refresh-bio/FAMSA

GitHub - refresh-bio/FAMSA: Algorithm for ultra-scale multiple sequence alignments (3M protein...

Algorithm for ultra-scale multiple sequence alignments (3M protein sequences in 5 minutes and 24 GB of RAM) - refresh-bio/FAMSA

github.com