Chatbot Arena: Benchmarking LLMs in the Wild with Elo Ratings | LMSYS Org - discu.eu

Hacker News

Chatbot Arena: Benchmarking LLMs in the Wild with Elo Ratings https://lmsys.org/blog/2023-05-03-arena/ 7 comments 3/5/2023

Linking pages

AI Canon | Andreessen Horowitz https://a16z.com/2023/05/25/ai-canon/ 219 comments
“The king is dead”—Claude 3 surpasses GPT-4 on Chatbot Arena for the first time | Ars Technica https://arstechnica.com/information-technology/2024/03/the-king-is-dead-claude-3-surpasses-gpt-4-on-chatbot-arena-for-the-first-time/ 63 comments
GitHub - lm-sys/FastChat: An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena. https://github.com/lm-sys/FastChat 4 comments
Why Are Elo Ratings Everywhere Now? - The Atlantic https://www.theatlantic.com/technology/archive/2024/04/elo-ratings-are-everywhere/678129/ 1 comment
Fine-tuning a Large Language Model using Metaflow, featuring LLaMA and LoRA | Outerbounds https://outerbounds.com/blog/llm-tuning-metaflow/ 0 comments
Truth https://compphil.github.io/truth/ 0 comments
GitHub - MLGroupJLU/LLM-eval-survey: The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models". https://github.com/MLGroupJLU/LLM-eval-survey 0 comments
Speculations on Building Superintelligence https://blog.sshh.io/p/speculations-on-building-superintelligence 0 comments
AI Pseudo Intelligence, brilliance without a brain? https://www.mindprison.cc/p/ai-pseudo-intelligence-brilliance 0 comments
GitHub - alopatenko/LLMEvaluation: A comprehensive guide to LLM evaluation methods designed to assist in identifying the most suitable evaluation techniques for various use cases, promote the adoption of best practices in LLM assessment, and critically assess the effectiveness of these evaluation methods. https://github.com/alopatenko/LLMEvaluation 0 comments

Related searches:

Search whole site: site:lmsys.org

Search title: Chatbot Arena: Benchmarking LLMs in the Wild with Elo Ratings | LMSYS Org

See how to search.

Submit link to: