AI Models Are Getting Smarter. New Tests Are Racing to Catch Up

Linking pages

When AI Thinks It Will Lose, It Sometimes Cheats, Study Finds | TIME https://time.com/7259395/ai-chess-cheating-palisade-research/ 236 comments
Robot Dexterity Still Seems Hard - by Brian Potter https://www.construction-physics.com/p/robot-dexterity-still-seems-hard 42 comments
Inside the U.K.’s Bold Experiment in AI Safety | TIME https://time.com/7204670/uk-ai-safety-institute/ 3 comments
A Potential Path to Safer AI Development | TIME https://time.com/7283507/safer-ai-development/ 1 comment
How China Is Advancing in AI Despite U.S. Chip Restrictions | TIME https://time.com/7204164/china-ai-advances-chips/ 0 comments
Most-Cited Computer Expert Wants to Make AI More Trustworthy | TIME https://time.com/7290554/yoshua-bengio-launches-lawzero-for-safer-ai/ 0 comments

Linked pages

OpenAI o3 Breakthrough High Score on ARC-AGI-Pub https://arcprize.org/blog/oai-o3-pub-breakthrough 1773 comments
https://openai.com/index/learning-to-reason-with-llms/ 1525 comments
International Mathematical Olympiad https://www.imo-official.org 117 comments
FACT SHEET: Biden-Harris Administration Secures Voluntary Commitments from Leading Artificial Intelligence Companies to Manage the Risks Posed by AI | The White House https://www.whitehouse.gov/briefing-room/statements-releases/2023/07/21/fact-sheet-biden-harris-administration-secures-voluntary-commitments-from-leading-artificial-intelligence-companies-to-manage-the-risks-posed-by-ai/ 95 comments
AI Could One Day Engineer a Pandemic, Experts Warn | TIME https://time.com/7014800/ai-pandemic-bioterrorism/ 82 comments
Nobody Knows How to Safety-Test AI | TIME https://time.com/6958868/artificial-intelligence-safety-evaluations-risks/ 56 comments
Memorandum on Advancing the United States’ Leadership in Artificial Intelligence; Harnessing Artificial Intelligence to Fulfill National Security Objectives; and Fostering the Safety, Security, and Trustworthiness of Artificial Intelligence | The White House https://www.whitehouse.gov/briefing-room/presidential-actions/2024/10/24/memorandum-on-advancing-the-united-states-leadership-in-artificial-intelligence-harnessing-artificial-intelligence-to-fulfill-national-security-objectives-and-fostering-the-safety-security/ 17 comments
ARC Prize https://arcprize.org/ 5 comments
https://www.youtube.com/watch?v=SKBG1sqdyIU 5 comments
Terence Tao - Wikipedia http://en.wikipedia.org/wiki/Terence_Tao 4 comments
Announcing our updated Responsible Scaling Policy \ Anthropic https://www.anthropic.com/news/announcing-our-updated-responsible-scaling-policy 1 comment
AI Benchmarking Dashboard | Epoch AI https://epoch.ai/data/ai-benchmarking-dashboard 1 comment
New Tests Reveal AI's Capacity for Deception | TIME https://time.com/7202312/new-tests-reveal-ai-capacity-for-deception/ 1 comment
MMLU Benchmark (Multi-task Language Understanding) | Papers With Code https://paperswithcode.com/sota/multi-task-language-understanding-on-mmlu 0 comments
Introducing the Frontier Safety Framework - Google DeepMind https://deepmind.google/discover/blog/introducing-the-frontier-safety-framework/ 0 comments
Humanity's Last Exam https://agi.safe.ai/submit 0 comments
SimpleBench https://simple-bench.com/ 0 comments
https://openai.com/index/advancing-red-teaming-with-people-and-ai/ 0 comments
Evaluating frontier AI R&D capabilities of language model agents against human experts - METR https://metr.org/blog/2024-11-22-evaluating-r-d-capabilities-of-llms/ 0 comments
FrontierMath | Epoch AI https://epoch.ai/frontiermath 0 comments