Scientist and Startup Founder Co-founder and CEO @EarthmoverHQ @pangeo_data steering council member ex-Professor @columbia @lamontEarth

New York, NY
Joined February 2011
It was super fun talking to Max about my journey through geospatial / climate and where I think I this field is heading.
Today's conversation is a deep dive on how we do scientific computing today, and how it could be better Talking to @rabernat about @pangeo_data, @zarr_dev and @EarthmoverHQ Watch: piped.video/3IWp-MuSm6w
Ryan Abernathey retweeted
🌎 Why do we need a cloud-native data lake for geospatial data? In the latest episode of The Infra Pod, @tnachen & @ianlivingstone chat with the cofounders of @EarthmoverHQ, @rabernat & @_jhamman, about the future of data in climate and earth sciences. 🎧 Link in 🧵
1
2
2
13
Ryan Abernathey retweeted
🌤️ #AMS2025 is just around the corner! We are taking AMS by storm with an exhibitor booth (booth 353), two talks from @_jhamman and @rabernat , and hosting a @pangeo_data Community Happy Hour (register here: lu.ma/ddtba5f5)!
1
6
12
Ryan Abernathey retweeted
@rabernat is rounding out the week of #AGU24 with his talk, “How can we make cloud computing actually accessible to all scientists?” on Friday at 5:10 PM agu.confex.com/agu/agu24/mee…
1
7
Ryan Abernathey retweeted
Monday through Thursday, I'll be hanging out with @rabernat at the @EarthmoverHQ booth in the exhibit hall. Swing by to say hello or to snag some swag/stickers/etc. We'll also be demoing #icechunk all week.
1
1
5
Ryan Abernathey retweeted
Will you be at @theAGU next week? Earthmover is exhibiting! @rabernat and @_jhamman are participating in panel discussions and giving talks👇.
1
2
2
Checked out the other site. 🟦 Seems much better. Gonna be over there more from now on. 👋
1
7
Ryan Abernathey retweeted
I take it all back. At #SatSummit in Lisbon we finally discovered the holy grail of cloud optimized file formats
That said, it isn't 100% clear that NASA's best move is to immediately convert 10000+ data sets into cutting edge ARCO formats. Kerchunk and Virtual Zarr offer benefits of ARCO while keeping data in the native formats.
3
25
Ryan Abernathey retweeted
🚀 New blog post: Nomadic Compute - The Future of Distributed AI Workloads 🌐 In a fast-paced AI world, flexibility & resilience are key. "Nomadic Compute" pattern lets workloads move dynamically across clouds for peak performance, cost, & availability: tigrisdata.com/blog/nomadic-…
1
1
3
Ryan Abernathey retweeted
Come learn about recent @xarray_dev GroupBy improvements at tomorrow's (Wed, Nov 13) Pangeo Showcase! discourse.pangeo.io/t/pangeo…
1
7
29
Most developers today wouldn't dream of not using version control for their code... However, the same principles can be applied to data! @EarthmoverHQ's new open source project--Icechunk--includes version control features built specifically for the @zarr_dev data model, brining powerful data version control to the world of massive multidimensional arrays. Features include * All updates occur atomically in isolated snapshots * Tags - immutable pointers to snapshots * Branches - mutable pointers to snapshots With Icechunk, you can safely experiment with changes to your data on a "dev" branch before propagating those changes to "main." You can publish an immutable version of your dataset (tag) while continuing to evolve towards the next version. Or you can simply revert incorrect changes back to an earlier version of your data. These capabilities make life so much easier for data scientists and teams using array data in production. I've been using data version control with Zarr for the past year via our Arraylake platform, and I'm thrilled that these capabilities are now fully open source. I can't imagine going back to the old way of working. Learn more at icechunk.io/
1
3
1
39
It’s a real honor to be part of this amazing collection of experts. Looking forward to helping spread the word about Cloud Native Geospatial!
Introducing the Founding CNG Editorial Board for the Cloud-Native Geospatial Forum (CNG)! This group of leaders from our community have graciously volunteered to guide our work. cloudnativegeo.org/blog/2024…
1
1
31
Ryan Abernathey retweeted
"Large GeoSpatial Benchmarks: First Pass" Last month we asked for TiB scale geo workloads to form a benchmark suite. We got strong response. Since then we've built out these into a public suite. This post goes over what's implemented and early results docs.coiled.io/blog/geospati…
1
4
27
Ryan Abernathey retweeted
It's been a blast learning rust and working with the @EarthmoverHQ team on Icechunk! Come see what it's all about. Absolutely worth it, I promise.
⚡️ Icechunk is fast! What does this mean for users? Reduced cost for all data-intensive compute jobs and enhanced productivity for the data scientists who work with data all day long. Icechunk, @EarthmoverHQ's new transactional cloud-native storage engine for array / tensor data, works together with @zarr_dev , augmenting the Zarr core data model with features that enhance performance, collaboration, and safety in a multi-user cloud-computing context. Reading data through Icechunk is 36x faster than trying to read HDF5 files from cloud object storage, 6x faster than regular Zarr alone, and 2.5x faster than regular Zarr + Dask. Most importantly, Icechunk can achieve throughput on par with the compute instance network bandwidth, the "hardware limit" for I/O bound workloads. Want to learn more about this benchmark? Come to our Icechunk informational webinar tomorrow, Tuesday, October 22nd from 12 - 1 PM EST. Registration link: share.hsforms.com/1SCOFqe2kT…
4
27
⚡️ Icechunk is fast! What does this mean for users? Reduced cost for all data-intensive compute jobs and enhanced productivity for the data scientists who work with data all day long. Icechunk, @EarthmoverHQ's new transactional cloud-native storage engine for array / tensor data, works together with @zarr_dev , augmenting the Zarr core data model with features that enhance performance, collaboration, and safety in a multi-user cloud-computing context. Reading data through Icechunk is 36x faster than trying to read HDF5 files from cloud object storage, 6x faster than regular Zarr alone, and 2.5x faster than regular Zarr + Dask. Most importantly, Icechunk can achieve throughput on par with the compute instance network bandwidth, the "hardware limit" for I/O bound workloads. Want to learn more about this benchmark? Come to our Icechunk informational webinar tomorrow, Tuesday, October 22nd from 12 - 1 PM EST. Registration link: share.hsforms.com/1SCOFqe2kT…
9
3
35
Ryan Abernathey retweeted
We’re hosting a webinar on Tuesday, October 22 from 12- 1 PM EST to discuss what Icechunk means for the scientific data community and answer questions from attendees. Register here: share.hsforms.com/1SCOFqe2kT…
🚀 We are thrilled to announce the release of the Icechunk storage engine, a new open-source library and specification for the storage of multidimensional array (a.k.a. tensor) data in cloud object storage. Read our blog post about Icechunk here: earthmover.io/blog/icechunk
8
11
Ryan Abernathey retweeted
There has been so much interest and investment in launching new EO satellites in the past 3-5 years. But, there has been relatively limited interest and investment into solving the boring problems of standardization, interoperability and analysis-ready data* in EO.