🧮 Google just published DS-STAR
A data science agent that can automate a range of tasks — from statistical analysis to visualization and data wrangling.
Reads messy files, plans steps, writes and runs code, and verifies itself, reaching state of the art on tough multi file tasks.
It lifts accuracy to 45.2% on DABStep, 44.7% on KramaBench, and 38.5% on DA-Code, and holds first place on DABStep as of September-25.
Earlier agents lean on clean CSVs and struggle when answers are split across JSON, markdown, and free text.
DS-STAR begins by scanning a directory and producing a plain language summary of each file’s structure and contents that becomes shared context.
A Planner proposes steps, a Coder writes Python, a Verifier checks sufficiency, a Router fixes mistakes or adds steps, and the loop stops when it passes or reaches 10 rounds.
This setup handles heterogeneous data because the summaries surface schema, types, keys, and hints, so plans refer to real fields instead of guessing.
On benchmarks the gains are steady, moving from 41.0% to 45.2% on DABStep, 39.8% to 44.7% on KramaBench, and 37.0% to 38.5% on DA-Code.
Ablations explain the lift, removing the Data File Analyzer drops hard task accuracy on DABStep to 26.98%, and removing the Router also hurts across easy and hard tasks.
Refinement depth matches difficulty, hard tasks average 5.6 rounds, easy tasks average 3.0 rounds, and over 50% of easy tasks finish in 1 round.
The framework generalizes across base models, with a GPT-5 version doing better on easy items and a Gemini-2.5-Pro version doing better on hard items.
Net effect, DS-STAR reduces the gap between messy data and reliable answers across CSV, XLSX, JSON, markdown, and plain text.