Kunhao Zheng · Apr 27, 2025 · 4:30 PM UTC

Kunhao Zheng

Kunhao Zheng @KunhaoZ

Apr 27

🚨 Your RL only improves 𝗽𝗮𝘀𝘀@𝟭, not 𝗽𝗮𝘀𝘀@𝗸? 🚨 That’s not a bug — it’s a 𝗳𝗲𝗮𝘁𝘂𝗿𝗲 𝗼𝗳 𝘁𝗵𝗲 𝗼𝗯𝗷𝗲𝗰𝘁𝗶𝘃𝗲 you’re optimizing. You get what you optimize for. If you want better pass@k, you need to optimize for pass@k at training time. 🧵 How?

133

833

Naresh R Shah · Apr 28, 2025 · 5:43 AM UTC

Naresh R Shah · Apr 28, 2025 · 5:43 AM UTC

Naresh R Shah

@nareshshah139

Apr 28

Replying to @KunhaoZ

Since pass @ K alignment is making sense (similar to human learning where having higher number of good quality students in a class makes overall performance more robust), I wonder if ordered preference during training by difficulty tiers has been tried yet? Aka - start of epoch batches should contain easy tasks and should progressively go towards difficult tasks.

Apr 28, 2025 · 5:43 AM UTC