Hacker News
- Chatbot Arena: Benchmarking LLMs in the Wild with Elo Ratings https://lmsys.org/blog/2023-05-03-arena/ 7 comments
Linking pages
- AI Canon | Andreessen Horowitz https://a16z.com/2023/05/25/ai-canon/ 219 comments
- “The king is dead”—Claude 3 surpasses GPT-4 on Chatbot Arena for the first time | Ars Technica https://arstechnica.com/information-technology/2024/03/the-king-is-dead-claude-3-surpasses-gpt-4-on-chatbot-arena-for-the-first-time/ 63 comments
- GitHub - lm-sys/FastChat: An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena. https://github.com/lm-sys/FastChat 4 comments
- Why Are Elo Ratings Everywhere Now? - The Atlantic https://www.theatlantic.com/technology/archive/2024/04/elo-ratings-are-everywhere/678129/ 1 comment
- Fine-tuning a Large Language Model using Metaflow, featuring LLaMA and LoRA | Outerbounds https://outerbounds.com/blog/llm-tuning-metaflow/ 0 comments
- Truth https://compphil.github.io/truth/ 0 comments
- GitHub - MLGroupJLU/LLM-eval-survey: The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models". https://github.com/MLGroupJLU/LLM-eval-survey 0 comments
- Speculations on Building Superintelligence https://blog.sshh.io/p/speculations-on-building-superintelligence 0 comments
- AI Pseudo Intelligence, brilliance without a brain? https://www.mindprison.cc/p/ai-pseudo-intelligence-brilliance 0 comments
Related searches:
Search whole site: site:lmsys.org
Search title: Chatbot Arena: Benchmarking LLMs in the Wild with Elo Ratings | LMSYS Org
See how to search.