๐Ÿšจ Your RL only improves ๐—ฝ๐—ฎ๐˜€๐˜€@๐Ÿญ, not ๐—ฝ๐—ฎ๐˜€๐˜€@๐—ธ? ๐Ÿšจ Thatโ€™s not a bug โ€” itโ€™s a ๐—ณ๐—ฒ๐—ฎ๐˜๐˜‚๐—ฟ๐—ฒ ๐—ผ๐—ณ ๐˜๐—ต๐—ฒ ๐—ผ๐—ฏ๐—ท๐—ฒ๐—ฐ๐˜๐—ถ๐˜ƒ๐—ฒ youโ€™re optimizing. You get what you optimize for. If you want better pass@k, you need to optimize for pass@k at training time. ๐Ÿงต How?
Awesome thread!!! Have you tried training with the pass@k objective and then training the pass@1 objective on top of this? Iโ€™m curious if that gets better pass@1 performance than just training pass@1
1
1
Replying to @saurabh_shah2
Nice suggestion! We havenโ€™t tried this ordering of switching the objective but definitely a good exp to run!

Apr 30, 2025 ยท 3:31 PM UTC

1