Had a fun chat about evals with Ben on The Chief AI Officer podcast. We discussed:
• Why many companies don't have a trustworthy eval stack today
• How to create great LLM-as-a-judge evals with an "unfair advantage"
• Why "100% accuracy" is often a red flag
Links below: