[2103.03874] Measuring Mathematical Problem Solving With the MATH Dataset

Linking pages

AI now surpasses humans in almost all performance benchmarks https://newatlas.com/technology/ai-index-report-global-impact/ 743 comments
Minerva: Solving Quantitative Reasoning Problems with Language Models – Google AI Blog http://ai.googleblog.com/2022/06/minerva-solving-quantitative-reasoning.html 103 comments
Google AI Developed a Language Model to Solve Quantitative Reasoning Problems https://www.infoq.com/news/2022/07/google-ai-minerva/ 2 comments
Researchers find that large language models struggle with math | VentureBeat https://venturebeat.com/2021/03/09/researchers-find-that-large-language-models-struggle-with-math/ 0 comments
Huawei trained the Chinese-language equivalent of GPT-3 | VentureBeat https://venturebeat.com/2021/04/29/huawei-trained-the-chinese-language-equivalent-of-gpt-3/ 0 comments
Updates and Lessons from AI Forecasting https://bounded-regret.ghost.io/ai-forecasting/ 0 comments
Updates and Lessons from AI Forecasting – The Berkeley Artificial Intelligence Research Blog https://bair.berkeley.edu/blog/2021/10/14/forecasting/ 0 comments
Math | Everything I know https://wiki.nikiv.dev/math/ 0 comments
GitHub - openai/miniF2F: Formal to Formal Mathematics Benchmark https://github.com/openai/miniF2F 0 comments
GitHub - MLGroupJLU/LLM-eval-survey: The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models". https://github.com/MLGroupJLU/LLM-eval-survey 0 comments
GitHub - lmmlzn/Awesome-LLMs-Datasets: Summarize existing representative LLMs text datasets. https://github.com/lmmlzn/Awesome-LLMs-Datasets 0 comments
LLM Performance Benchmarks - Claude 3 Opus, GPT-4 and Gemini Ultra https://aisupremacy.substack.com/p/llm-performance-benchmarks-claude 0 comments
GitHub - elicit/machine-learning-list https://github.com/elicit/machine-learning-list 0 comments
GitHub - openai/simple-evals https://github.com/openai/simple-evals 0 comments