[2503.21934] Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad - discu.eu

Hacker News

Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad https://arxiv.org/abs/2503.21934 2 comments 31/3/2025

Linking pages

New study shows why simulated reasoning AI models don’t yet live up to their billing - Ars Technica https://arstechnica.com/ai/2025/04/new-study-shows-why-simulated-reasoning-ai-models-dont-yet-live-up-to-their-billing/ 4 comments
Reports of LLMs mastering math have been greatly exaggerated https://garymarcus.substack.com/p/reports-of-llms-mastering-math-have 0 comments

Related searches:

Search whole site: site:arxiv.org

Search title: [2503.21934] Proof or Bluff? Evaluating LLMs on 2025 USA Math Olympiad

See how to search.

Submit link to: