[2306.05685] Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena

Linking pages

10 Noteworthy AI Research Papers of 2023 https://magazine.sebastianraschka.com/p/10-ai-research-papers-2023 24 comments
GitHub - TonicAI/tvalmetrics: Metrics to evaluate the quality of responses of your Retrieval Augmented Generation (RAG) systems. https://github.com/TonicAI/tvalmetrics 17 comments
GitHub - mlabonne/llm-course: Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks. https://github.com/mlabonne/llm-course 10 comments
LLAMA 2: an incredible open-source LLM - by Nathan Lambert https://www.interconnects.ai/p/llama-2-from-meta 5 comments
Aman's AI Journal • Primers • Overview of Large Language Models https://aman.ai/primers/ai/LLM/ 1 comment
Everything about Distributed Training and Efficient Finetuning | Sumanth's Personal Website https://sumanthrh.com/post/distributed-and-efficient-finetuning/ 1 comment
The Shift from Models to Compound AI Systems – The Berkeley Artificial Intelligence Research Blog https://bair.berkeley.edu/blog/2024/02/18/compound-ai-systems/ 1 comment
Introducing world's largest synthetic open-source Text-to-SQL dataset https://gretel.ai/blog/synthetic-text-to-sql-dataset 1 comment
The problem with how we evaluate LLMs - Conrado Miranda https://conradomiranda.substack.com/p/the-problem-with-how-we-evaluate 1 comment
LLM Collection | Prompt Engineering Guide https://www.promptingguide.ai/models/collection 0 comments
GitHub - MLGroupJLU/LLM-eval-survey: The official GitHub page for the survey paper "A Survey on Evaluation of Large Language Models". https://github.com/MLGroupJLU/LLM-eval-survey 0 comments
Fine-tuning ChatGPT: Surpassing GPT-4 Summarization Performance–A 63% Cost Reduction and 11x Speed Enhancement using Synthetic Data and LangSmith https://blog.langchain.dev/fine-tuning-chatgpt-surpassing-gpt-4-summarization/ 0 comments
Research Papers (October 2023) - by Sebastian Raschka, PhD https://magazine.sebastianraschka.com/p/research-papers-october-2023 0 comments
RLHF progress: Scaling DPO to 70B, DPO vs PPO update, Tülu 2, Zephyr-β, meaningful evaluation, data contamination https://www.interconnects.ai/p/rlhf-progress-scaling-dpo-to-70b 0 comments
GitHub - mddunlap924/LangChain-SynData-RAG-Eval: LangChain, Llama2-Chat, and zero- and few-shot prompting are used to generate synthetic datasets for IR and RAG system evaluation https://github.com/mddunlap924/LangChain-SynData-RAG-Eval 0 comments
Variational autoencoder for design of synthetic viral vector serotypes | Nature Machine Intelligence https://www.nature.com/articles/s42256-023-00787-2 0 comments
An introduction to evaluating LLMs https://generatingconversation.substack.com/p/an-introduction-to-evaluating-llms 0 comments
LLM Data Sales: A Market for Lemons? - by Alex Izydorczyk https://magis.substack.com/p/llm-data-sales-a-market-for-lemons 0 comments
GitHub - lmmlzn/Awesome-LLMs-Datasets: Summarize existing representative LLMs text datasets. https://github.com/lmmlzn/Awesome-LLMs-Datasets 0 comments
Introducing world's largest synthetic open-source Text-to-SQL dataset https://gretel-ai.webflow.io/blog/synthetic-text-to-sql-dataset 0 comments