- Chatbots Are Cheating on Their Benchmark Tests https://www.theatlantic.com/technology/archive/2025/03/chatbots-benchmark-tests/681929/ 5 comments technology
Linked pages
- https://openai.com/index/learning-to-reason-with-llms/ 1525 comments
- [2501.12948] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning https://arxiv.org/abs/2501.12948 1061 comments
- https://openai.com/index/introducing-gpt-4-5/ 990 comments
- OpenAI, Google and Anthropic Are Struggling to Build More Advanced AI - Bloomberg https://www.bloomberg.com/news/articles/2024-11-13/openai-google-and-anthropic-are-struggling-to-build-more-advanced-ai 810 comments
- Codeforces http://codeforces.com/ 104 comments
- The GPT Era Is Already Ending - The Atlantic https://www.theatlantic.com/technology/archive/2024/12/openai-o1-reasoning-models/680906/ 37 comments
- https://www.reuters.com/technology/artificial-intelligence/openai-rivals-seek-new-path-smarter-ai-current-methods-hit-limitations-2024-11-11/ 31 comments
- The Words That Stop ChatGPT in Its Tracks - The Atlantic https://www.theatlantic.com/technology/archive/2024/12/chatgpt-wont-say-my-name/681028/ 23 comments
- https://lmarena.ai/ 18 comments
- Google Gemini update: Sundar Pichai introduces Ultra 1.0 in Gemini Advanced https://blog.google/technology/ai/google-gemini-update-sundar-pichai-2024/ 11 comments
- A.I. Has a Measurement Problem - The New York Times https://www.nytimes.com/2024/04/15/technology/ai-models-measurement.html 7 comments
- https://www.theinformation.com/articles/openai-shifts-strategy-as-rate-of-gpt-ai-improvements-slows 1 comment
- Technology - The Atlantic https://www.theatlantic.com/technology/ 0 comments
- [2306.05685] Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena https://arxiv.org/abs/2306.05685 0 comments
- [2412.08905] Phi-4 Technical Report https://arxiv.org/abs/2412.08905 0 comments
Related searches:
Search whole site: site:www.theatlantic.com
Search title: Chatbots Are Cheating on Their Benchmark Tests - The Atlantic
See how to search.