Hacker News
- FrontierMath: A benchmark for evaluating advanced mathematical reasoning in AI https://epochai.org/frontiermath/the-benchmark 99 comments
Linking pages
- New secret math benchmark stumps AI models and PhDs alike - Ars Technica https://arstechnica.com/ai/2024/11/new-secret-math-benchmark-stumps-ai-models-and-phds-alike/ 15 comments
- Why are we using LLMs as calculators? • Buttondown https://newsletter.vickiboykis.com/archive/why-are-we-using-llms-as-calculators/ 2 comments
Related searches:
Search whole site: site:epochai.org
Search title: FrontierMath: Evaluating Advanced Mathematical Reasoning in AI | Epoch AI | Epoch AI
See how to search.