Had a fun chat about evals with Ben on The Chief AI Officer podcast. We discussed: • Why many companies don't have a trustworthy eval stack today • How to create great LLM-as-a-judge evals with an "unfair advantage" • Why "100% accuracy" is often a red flag Links below:
1
1
7