Introducing RSA πŸŒ€ (Recursive Self-Aggregation): unlocking deep thinking with test-time scaling πŸ”₯ Qwen3-4B + RSA > DeepSeek-R1 πŸ“ˆ Gains across Qwen, Nemo, GPT-OSS πŸ† Benchmarks: Math β€’ Reasoning Gym β€’ Code ⚑ Aggregation-aware RL lets Qwen3-4B surpass o3-mini πŸš€

Sep 27, 2025 Β· 3:21 AM UTC

1
6
1
28
RSA (Recursive Self-Aggregation) = Sequential refinement πŸ”„ + Parallel exploration ⚑ β†’ Unified into a hybrid evolutionary loop for deeper reasoning. πŸ“„ Paper + website: rsa-llm.github.io/ 🧡 Details in the thread
1
5
RSA is simple πŸš€ 1️⃣ Generate a population of N reasoning chains in parallel 2️⃣ Subsample into N subsets of K chains 3️⃣ Prompt the model to aggregate each subset β†’ new improved population of CoTs 4️⃣ Repeat for T loops That’s the whole algorithm: Recursive Self-Aggregation
1
1
6
RSA scales sequentially & in parallel: more steps T or larger aggregation K jointly with population N β†’ better performance! πŸ”₯ Gains: AIME-25 47 β†’ 73% HMMT-25 29 β†’ 50% Reasoning Gym Games 55 β†’ 70% LiveCodeBench 49.5 β†’ 56.3% No verifiers ❌ No prompt opt ❌ No RL yet!
1
6
RSA boosts performance across all models and all reasoning tasks. Tested on Qwen, Nemo, GPT-OSS β€” thinking & non-thinking, MoE & dense, full-attention or hybrid SSM. Benchmarks: AIME, HMMT LiveCodeBench, Reasoning Gym πŸ“ˆ Gains are consistent and significant throughout.
1
6
RL makes RSA even stronger πŸ“ˆ Naive RL can hurt aggregation, but aggregation-aware RL βœ… πŸ‘‰ Generate K responses πŸ‘‰ Create aggregation prompts πŸ‘‰ Train the model to aggregate Boosts performance & generalizes to new tasks β€” RSA is worth it!
1
6
RSA beats all simple budget-matched test-time scaling baselines! Shoutout to concurrent work by @wzhao_nlp & team for AggLM (RL-trains single-step aggregators). We independently discovered this, but also find that sequential aggregation with larger population is key for scaling.
1
6