Hacker News
- AI Models Are Getting Smarter. New Tests Are Racing to Catch Up https://time.com/7203729/ai-evaluations-safety/ 0 comments
Linked pages
- OpenAI o3 Breakthrough High Score on ARC-AGI-Pub https://arcprize.org/blog/oai-o3-pub-breakthrough 1772 comments
- https://openai.com/index/learning-to-reason-with-llms/ 1525 comments
- International Mathematical Olympiad https://www.imo-official.org 117 comments
- FACT SHEET: Biden-Harris Administration Secures Voluntary Commitments from Leading Artificial Intelligence Companies to Manage the Risks Posed by AI | The White House https://www.whitehouse.gov/briefing-room/statements-releases/2023/07/21/fact-sheet-biden-harris-administration-secures-voluntary-commitments-from-leading-artificial-intelligence-companies-to-manage-the-risks-posed-by-ai/ 95 comments
- AI Could One Day Engineer a Pandemic, Experts Warn | TIME https://time.com/7014800/ai-pandemic-bioterrorism/ 82 comments
- Nobody Knows How to Safety-Test AI | TIME https://time.com/6958868/artificial-intelligence-safety-evaluations-risks/ 56 comments
- Memorandum on Advancing the United States’ Leadership in Artificial Intelligence; Harnessing Artificial Intelligence to Fulfill National Security Objectives; and Fostering the Safety, Security, and Trustworthiness of Artificial Intelligence | The White House https://www.whitehouse.gov/briefing-room/presidential-actions/2024/10/24/memorandum-on-advancing-the-united-states-leadership-in-artificial-intelligence-harnessing-artificial-intelligence-to-fulfill-national-security-objectives-and-fostering-the-safety-security/ 17 comments
- https://www.youtube.com/watch?v=SKBG1sqdyIU 5 comments
- Terence Tao - Wikipedia http://en.wikipedia.org/wiki/Terence_Tao 4 comments
- ARC Prize https://arcprize.org/ 3 comments
- Announcing our updated Responsible Scaling Policy \ Anthropic https://www.anthropic.com/news/announcing-our-updated-responsible-scaling-policy 1 comment
- New Tests Reveal AI's Capacity for Deception | TIME https://time.com/7202312/new-tests-reveal-ai-capacity-for-deception/ 1 comment
- MMLU Benchmark (Multi-task Language Understanding) | Papers With Code https://paperswithcode.com/sota/multi-task-language-understanding-on-mmlu 0 comments
- Introducing the Frontier Safety Framework - Google DeepMind https://deepmind.google/discover/blog/introducing-the-frontier-safety-framework/ 0 comments
- Humanity's Last Exam https://agi.safe.ai/submit 0 comments
- SimpleBench https://simple-bench.com/ 0 comments
- https://openai.com/index/advancing-red-teaming-with-people-and-ai/ 0 comments
- Evaluating frontier AI R&D capabilities of language model agents against human experts - METR https://metr.org/blog/2024-11-22-evaluating-r-d-capabilities-of-llms/ 0 comments
- AI Benchmarking Dashboard | Epoch AI https://epoch.ai/data/ai-benchmarking-dashboard 0 comments
- FrontierMath | Epoch AI https://epoch.ai/frontiermath 0 comments
Would you like to stay up to date with Computer science? Checkout Computer science
Weekly.
Related searches:
Search whole site: site:time.com
Search title: AI Models Are Getting Smarter. New Tests Are Racing to Catch Up | TIME
See how to search.