- How Good is Hugging Face's BLOOM? Human Evaluation of Large Language Models [D] https://www.surgehq.ai/blog/how-good-is-hugging-faces-bloom-a-real-world-human-evaluation-of-language-models 28 comments machinelearning
Linking pages
- Evaluating ChatGPT vs. Google on 500 Search Queries https://www.surgehq.ai/blog/googles-existential-threat-chatgpt-matches-googles-performance-on-informational-search-queries-and-smashes-it-on-coding 21 comments
- HellaSwag or HellaBad? 36% of this popular LLM benchmark contains errors https://www.surgehq.ai/blog/hellaswag-or-hellabad-36-of-this-popular-llm-benchmark-contains-errors 14 comments
- Evaluating Generative AI: Did Astral Codex Ten Win His Bet on AI Progress? https://www.surgehq.ai/blog/dall-e-vs-imagen-and-evaluating-astral-codex-tens-3000-ai-bet 6 comments
- AI Red Teams for Adversarial Training: Making ChatGPT and LLMs Adversarially Robust https://www.surgehq.ai/blog/ai-red-teams-for-adversarial-training-making-chatgpt-and-large-language-models-adversarially-robust 0 comments
Linked pages
- 30% of Google's Emotions Dataset is Mislabeled https://www.surgehq.ai/blog/30-percent-of-googles-reddit-emotions-dataset-is-mislabeled 280 comments
- BLOOM https://bigscience.huggingface.co/blog/bloom 46 comments
- [2204.02311] PaLM: Scaling Language Modeling with Pathways https://arxiv.org/abs/2204.02311 0 comments
- How Surge AI Built OpenAI's GSM8K Dataset of 8,500 Math Problems https://www.surgehq.ai/blog/how-we-built-it-openais-gsm8k-dataset-of-8500-math-problems 0 comments
Would you like to stay up to date with Computer science? Checkout Computer science
Weekly.
Related searches:
Search whole site: site:www.surgehq.ai
Search title: Human Evaluation of Large Language Models: How Good is Hugging Face's BLOOM?
See how to search.