🌎👩🔬 For 15+ years biology has accumulated petabytes (million gigabytes) of🧬DNA sequencing data🧬 from the far reaches of our planet.🦠🍄🌵
Logan now democratizes efficient access to the world’s most comprehensive genetics dataset. Free and open.
doi.org/10.1101/2024.07.30.6…
New preprint alert! 📣
We use Logan, a new database of SRA-wide genome assemblies, to look for an emerging virus in prawns. If you've ever spent too long trying to track down a known sequence in SRA data, this database is a real game changer.
lnkd.in/e9S5GhPc
After years of research and continuous refinement, we’re thrilled to share that our paper on the MetaGraph framework — enabling Petabase-scale search across sequencing data — has been published today in Nature (nature.com/articles/s41586-0…).
#RECOMB2026 is now accepting submissions and we'd love to see your best work!
📌 Abstract registration: Nov 7, 2025
📌 Full paper submission: Nov 14, 2025
📜More info: recomb.org/recomb2026/call_f…
#RECOMB2026 will be in Thessaloniki, Greece on May 26-29, 2026. Satellites on May 24-25. Save the date!
Το συνέδριο #RECOMB2026 θα πραγματοποιηθεί στη Θεσσαλονίκη, στις 26-29 Μαΐου 2026. Οι δορυφορικές εκδηλώσεις θα διεξαχθούν στις 24-25 Μαΐου 2026. Σημειώστε την ημερομηνία!
Earth’s genetic diversity is a heritage of humanity. It has been an honour to explore this data with a team of dedicated scientists who shared our vision of making this data free and accessible to all 🌍🧬❤️ Thank you!
Updated preprint: doi.org/10.1101/2024.07.30.6…
This is a new frontier for biological discovery and AI training data. Logan expands the universe of known proteins, plasmids, AMR, P4 satellites, and the newly discovered Obelisk RNA elements.
All Logan data is freely-available (cc0) right now. We show how Logan-Search (logan-search.org) can be used to uncover viral reactivation (HHV-6) in cell therapy products (TIL and CAR-T).
Logan rapidly accesses Life’s genetic diversity and can help solve global issues.
To tackle the microplastic crisis, we searched Logan for new versions of the 213 known plastic-degrading enzymes. We identified 200+ million homologs🤯, including new high-efficiency enzymes🥤🔥
One year after our initial preprint, we're excited to post a major update to Logan.
At its heart, Logan is the assembly of 27 million samples (50 Pbp) using a 6-day cloud-compute peaking at 2.2M vCPUs. This compresses the SRA 140x compared to raw FASTQs.
github.com/IndexThePlanet/Lo…
We often talk about pangenome reference but what does "reference" mean? If it means the whole graph, what is a variant? My collaborators Pouria and Luke give a clean answer: the reference is a spanning tree including GRCh38 and variants are leftover edges. See preprint for more.
New preprint on a surprising question - with a pangenome reference, *what is a genetic variant?*
biorxiv.org/content/10.1101/…
With Pouria Salehi Nowbandani, Shenghan Zhang, Haoyang Hu, and Heng Li @lh3lh3
In 1965, Margaret Dayhoff published the Atlas of Protein Sequence and Structure, which collated the 65 proteins whose amino acid sequences were then known.
Inspired by that Atlas, today we are releasing the Dayhoff Atlas of protein sequence data and protein language models.
Interested in a tool that aligns millions of proteins in minutes with quality similar to or better than the state-of-the-art utilities? Please take a look at our FAMSA2 paper: biorxiv.org/content/10.1101/…
and GH repo: github.com/refresh-bio/FAMSA