Ship data pipelines with extraordinary velocity. Dagster+: dagster.io/plus GitHub: github.com/dagster-io Slack: dagster.io/slack

☁️
Joined May 2019
Dagster+ delivers impressive ROI according to the latest Forrester TEI report: 432% ROI with data engineers shifting from maintenance to high-value work. As one executive noted: “Because we now have a full test suite that ensures everything is actually running as expected, we trust our code more. And now it’s trivial to go through the process of managing deployments and pull requests and managing releases and deployments of pipelines and code.” Get the full report today! Link in thread
4
3
13
Dagster retweeted
I learned this when I was 20. It's not hard to start. There are 2 types of queries: reads and writes. Most of your queries are reads, handle those first. Do this, in order: 1/ Reduce reads - Cache, cache, cache. Use read-thru and write-thru caching. Use Redis. 2/ Optimize slow reads - Add indexes. Fix your N+1 queries. Add limits. Sort responsibly. Alert on slow queries. 3/ Scale hardware - Add read replicas. Upgrade memory & I/O. 4/ Split up data - Shard when you have to. Partition writes when needed (you won’t need it). Understand “good enough”. A single write instance is fine for most people. And if it's not slow, leave it alone!
I have no understanding of database scaling. How many queries is too many, 1k/second? 10k/s?
39
191
7
2,819
Dagster retweeted
I recently conducted a case study with @RaysBaseball on how they utilize @Dagster to significantly enhance the velocity and quality of their data pipelines. Key results: - 50-70% faster pipeline execution: Processing reduced from hours to minutes - 15-minute data availability: Game data ready within 10 - 15 minutes vs 9 a.m. batch jobs - One-week onboarding: New data sources integrated 2-3x faster than the previous system - Zero-touch reliability: Critical pipelines run unattended nightly without intervention
1
2
8
Dagster retweeted
Our girl has me thinking about what a tasteful use of AI looks like. For me it comes down to 3 things: No one should feel like they are getting displaced, this kills momentum and adoption The implementation is well engineered. Vibe coded slop falls apart in production. Don't be this guy Meet people where they are. Don't ask your marketing girlie to use a CLI. If you can embed in your existing tools even better.
3
1
16
Dagster retweeted
🚨 New @dagster Ebook 🚨 Data teams face different challenges and growing pains as they scale up. Data platforms are like fashion in the sense that they are never finished. You are constantly adding new pipelines, migrating databases, and incorporating new tools and stakeholders. Dennis Hume and @coltonpadden wrote a fantastic eBook that goes through the challenges that you face as you scale up and how to evolve your platform in place to set yourself up for easy scaling. Check it out today! Link in thread.
2
1
19
Dagster retweeted
Today is day 1 of Small Data SF!!! We'll kick things off with a day of hand-on workshops. From Zero to Query: Building Your First Serverless Lakehouse with DuckLake - Jacob Matson from MotherDuck walks through creating a serverless lakehouse with DuckLake, covering ACID transactions, time travel, and schema evolution. Stop Measuring LLM Accuracy, Start Building Context - Tahlia DeMaio from Hex argues that context, not accuracy, is the real challenge in LLM systems and shows how to build context-aware analytical workflows. Keep it Simple and Scalable: pythonic ELT using dltHub - Thierry Jean from dltHub with Brian Douglas from Continue and elvis kahoro from Chalk teaches Python-based data ingestion and transformation pipelines. Composable Data Workflows: Building Pipelines That Just Work - Dennis Hume from Dagster Labs covers practical patterns for building reliable, modular pipelines that scale from laptop to production. Open Data Science Agent - Zain Hasan from Together AI shows how to build an autonomous data science agent using open-source models and the ReAct framework for end-to-end analysis tasks. Duck, duck, deploy: Building an AI-ready app in 2 hours - Russell Garner and Rebecca Bruggman from Omni start with a MotherDuck dataset and build a production-ready analytics app using Omni's semantic model and APIs. From Parsing Nightmares to Production - Upal Saha from bem demonstrates how to transform any unstructured input (PDFs, images, audio, etc.) into clean JSON and load it directly into MotherDuck. Just-in-Time Insights with Estuary - Zulfikar Qureshi from Estuary provides hands-on experience with real-time data streaming, including a lab exercise streaming live data into MotherDuck.
4
4
2
17
Dagster retweeted
So true for data engineering and why the orchestrator is the unsung hero of the data stack
Most of my job is actually just improving observability. About 80% of perf problems are so drop dead obvious as to HOW to fix, but engineers actually have no idea that they’re even happening.
2
1
14
Dagster retweeted
Dagster is the official Data Ops Platform for the Real American Virtuous Yoeman Farmer
2
1
1
11
Dagster retweeted
Datalight Saving time was last weekend for most of the US. Did your pipelines behave as planned? Dealing with unusual edge cases, such as DST, is the bane of the data engineer's existence. It happens once a year, making it easy to forget, and it's time-consuming to react to afterward. Orchestrators, like Dagster, make it easy to handle it for you. For any of your schedules, if you set an execution_timezone (see below), Dagster will automatically run the schedule. Spring forward? Your 2:30 a.m. job runs at 3:30 a.m. Fall back? It waits for the second occurrence. No manual fixes.
1
4
Dagster retweeted
Happy Friday Everyone DSPy Weekly Issue No 9 is out dspyweekly.com/newsletter/10… Highlights: 🔹 Articles: DSPy vs. LlamaBot, REFRAG implementation, BAML & Butter integrations, and the dissertation that started it all. 🔹 Videos: Using DSPy with Dagster & a playlist from DSPy Boston. 🔹 Projects: DSPy in Go (dsgo), a lightweight version (udspy), and dspy-bench. 🔹 Jobs: Fellowship at Harvard's Berkman Klein Center. #BAML #dspy @DSPyOSS @dagster
Dagster retweeted
In case you missed it, our deep dive on using both @dagster and @DSPyOSS is now on Youtube! We discussed DSPy and how its composability, evaluation, and optimization abstractions make it the best LLM development framework currently available. We also discuss how the framework integrates well with Dagster and how you can achieve improved observability and recoverability for your LLM workflows. Check out the full recording today!
Dagster retweeted
Honored to be included on the @OpenAI developer blog where I talk about how we use Codex for educational content at @dagster! developers.openai.com/blog/c…
3
9
106
Dagster retweeted
@dagster Pipes for Go is now working! github.com/wingyplus/dagster…
1
2
4
今朝、DagsterコミュニティのSlack内に常駐するAIデータアナリスト「Dagster Compass」がリリースされました! これは文章でお願いした内容に関して、ネットにあるデータから分析を試みるもので、試しに私が「世界のデータ分析企業top10を示して」とお願いしたところです。 「企業価値top10をプロットして」とお願いすると、「企業価値と言っても色々あるよね。具体的に価値と言っているのは何?」と、使用者の曖昧な言葉を補正してグラフしてくれました。
Your analysts are stuck waiting on data engineers. Your data engineers are drowning in ad-hoc requests. There's a better way. We built a self-service analytics platform Compass that let our team scale analysis without scaling headcount and shipped 2x faster. Check out the full blog!
2
6
Dagster retweeted
Does your data analyst support you speaking a little Chinese anon?
1
1
6
Your analysts are stuck waiting on data engineers. Your data engineers are drowning in ad-hoc requests. There's a better way. We built a self-service analytics platform Compass that let our team scale analysis without scaling headcount and shipped 2x faster. Check out the full blog!
2
1
4
Dagster retweeted
We ❤️ Dagster + Colton! One of our earliest customers and incredible partners
Thanks for the opportunity to talk at #allthingsopen today—everyone has been so welcoming and supportive. You can find the slides here: cmpadden.github.io/slides/at…
1
4
7
Dagster retweeted
Thanks for the opportunity to talk at #allthingsopen today—everyone has been so welcoming and supportive. You can find the slides here: cmpadden.github.io/slides/at…
Consolidations in the #dataengineering market are happening fast. Tools from the MDS get unified into unified data platforms. The latest: - Fivetran → dbt - Fivetran → SQLMesh - Soda → nannyML - Snowflake → Crunchy Data - Databricks → Neon - Fivetran → Census - dbt → SDF
6
6
1
30
Dagster retweeted
I got kids and shit man