Kunhao Zheng · Apr 27, 2025 · 4:30 PM UTC

Kunhao Zheng

Kunhao Zheng @KunhaoZ

Apr 27

🚨 Your RL only improves 𝗽𝗮𝘀𝘀@𝟭, not 𝗽𝗮𝘀𝘀@𝗸? 🚨 That’s not a bug — it’s a 𝗳𝗲𝗮𝘁𝘂𝗿𝗲 𝗼𝗳 𝘁𝗵𝗲 𝗼𝗯𝗷𝗲𝗰𝘁𝗶𝘃𝗲 you’re optimizing. You get what you optimize for. If you want better pass@k, you need to optimize for pass@k at training time. 🧵 How?

133

833

panyinxu · Apr 28, 2025 · 6:13 AM UTC

panyinxu @pnynx3

Apr 28

What if you set the loss = pass@1 objective + pass@k objective, will the pass@1 and pass@k increase together?

Kunhao Zheng · Apr 28, 2025 · 9:21 AM UTC

Kunhao Zheng · Apr 28, 2025 · 9:21 AM UTC

Kunhao Zheng @KunhaoZ

Apr 28

Replying to @pnynx3

Of course we can do this: looks like the most obvious way to try out. But that’s one of many combinations we can play with, for example, logsumexp as the “softmax” to bridge the 2 objectives.

Apr 28, 2025 · 9:21 AM UTC