Hacker News
Linked pages
- [2402.17764] The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits https://arxiv.org/abs/2402.17764 575 comments
- [2401.11817] Hallucination is Inevitable: An Innate Limitation of Large Language Models https://arxiv.org/abs/2401.11817 493 comments
- [2401.04088] Mixtral of Experts https://arxiv.org/abs/2401.04088 151 comments
- [2404.14219] Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone https://arxiv.org/abs/2404.14219 130 comments
- [2403.04652] Yi: Open Foundation Models by 01.AI https://arxiv.org/abs/2403.04652 81 comments
- [2401.10020] Self-Rewarding Language Models https://arxiv.org/abs/2401.10020 68 comments
- [2403.09611] MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training https://arxiv.org/abs/2403.09611 63 comments
- [2402.12354] LoRA+: Efficient Low Rank Adaptation of Large Models https://arxiv.org/abs/2402.12354 47 comments
- [2402.13753] LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens https://arxiv.org/abs/2402.13753 46 comments
- [2401.02385] TinyLlama: An Open-Source Small Language Model https://arxiv.org/abs/2401.02385 44 comments
- [2404.07143] Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention https://arxiv.org/abs/2404.07143 40 comments
- [2402.19427] Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models https://arxiv.org/abs/2402.19427 32 comments
- [2403.17297] InternLM2 Technical Report https://arxiv.org/abs/2403.17297 24 comments
- [2401.05566] Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training https://arxiv.org/abs/2401.05566 18 comments
- [2401.01335] Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models https://arxiv.org/abs/2401.01335 12 comments
- [2404.03715] Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences https://arxiv.org/abs/2404.03715 11 comments
- [2403.18814] Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models https://arxiv.org/abs/2403.18814 7 comments
- [2403.19887] Jamba: A Hybrid Transformer-Mamba Language Model https://arxiv.org/abs/2403.19887 7 comments
- [2401.14196] DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence https://arxiv.org/abs/2401.14196 4 comments
- [2402.04792] Direct Language Model Alignment from Online AI Feedback https://arxiv.org/abs/2402.04792 4 comments
Related searches:
Search whole site: site:thebestnlppapers.com
Search title: The best NLP papers of 2024 - The best NLP papers
See how to search.