๐จ Your RL only improves ๐ฝ๐ฎ๐๐@๐ญ, not ๐ฝ๐ฎ๐๐@๐ธ? ๐จ
Thatโs not a bug โ itโs a ๐ณ๐ฒ๐ฎ๐๐๐ฟ๐ฒ ๐ผ๐ณ ๐๐ต๐ฒ ๐ผ๐ฏ๐ท๐ฒ๐ฐ๐๐ถ๐๐ฒ youโre optimizing.
You get what you optimize for. If you want better pass@k, you need to optimize for pass@k at training time.
๐งต How?
pretty interesting, sorry I'm not your domain, just wondering what is the Pass@1 and Pass@k?
Apr 28, 2025 ยท 7:48 AM UTC



