New secret math benchmark stumps AI models and PhDs alike - Ars Technica - discu.eu

Hacker News

New secret math benchmark stumps AI models and PhDs alike https://arstechnica.com/ai/2024/11/new-secret-math-benchmark-stumps-ai-models-and-phds-alike/ 2 comments 13/11/2024

Reddit

Researchers introduce FrontierMath, a benchmark of hundreds of original and unpublished mathematics problems crafted and vetted by expert mathematicians. Current state-of-the-art AI models can only solve under 2% of problems. This offers a rigorous test bed that can quantify progress of AI systems. https://arstechnica.com/ai/2024/11/new-secret-math-benchmark-stumps-ai-models-and-phds-alike/ 10 comments 16/11/2024 futurology
Research organization Epoch AI released FrontierMath, a new mathematics benchmark that has been turning heads in the AI world because it contains hundreds of expert-level problems that leading AI models solve less than 2 percent of the time https://arstechnica.com/ai/2024/11/new-secret-math-benchmark-stumps-ai-models-and-phds-alike/ 2 comments 13/11/2024 technology

Linking pages

Linked pages

Would you like to stay up to date with Computer science? Checkout Computer science Weekly.

Related searches:

Search whole site: site:arstechnica.com

Search title: New secret math benchmark stumps AI models and PhDs alike - Ars Technica

See how to search.

Submit link to: