I'm excited to introduce Hieroglyph, a new benchmark for lateral reasoning. Hieroglyph measures a model's ability to identify the link between seemingly unrelated and often niche subjects.
On the 20-question set of the hardest Only Connect questions, no model scores above 50%.