Hacker News
- Chatbot Arena: Benchmarking LLMs in the Wild with Elo Ratings https://lmsys.org/blog/2023-05-03-arena/ 7 comments
Linking pages
- AI Canon | Andreessen Horowitz https://a16z.com/2023/05/25/ai-canon/ 219 comments
- “The king is dead”—Claude 3 surpasses GPT-4 on Chatbot Arena for the first time | Ars Technica https://arstechnica.com/information-technology/2024/03/the-king-is-dead-claude-3-surpasses-gpt-4-on-chatbot-arena-for-the-first-time/ 63 comments
- GitHub - lm-sys/FastChat: An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena. https://github.com/lm-sys/FastChat 4 comments
- Why Are Elo Ratings Everywhere Now? - The Atlantic https://www.theatlantic.com/technology/archive/2024/04/elo-ratings-are-everywhere/678129/ 1 comment
- Fine-tuning a Large Language Model using Metaflow, featuring LLaMA and LoRA | Outerbounds https://outerbounds.com/blog/llm-tuning-metaflow/ 0 comments
- Truth https://compphil.github.io/truth/ 0 comments
- GitHub - MLGroupJLU/LLM-eval-survey: The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models". https://github.com/MLGroupJLU/LLM-eval-survey 0 comments
- Speculations on Building Superintelligence https://blog.sshh.io/p/speculations-on-building-superintelligence 0 comments
- AI Pseudo Intelligence, brilliance without a brain? https://www.mindprison.cc/p/ai-pseudo-intelligence-brilliance 0 comments
- GitHub - alopatenko/LLMEvaluation: A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use cases, promote the adoption of best practices in LLM assessment, and critically assess the effectiveness of these evaluation methods. https://github.com/alopatenko/LLMEvaluation 0 comments
Related searches:
Search whole site: site:lmsys.org
Search title: Chatbot Arena: Benchmarking LLMs in the Wild with Elo Ratings | LMSYS Org
See how to search.