- [P] Curated list of LLM papers 2024 https://magazine.sebastianraschka.com/p/llm-research-papers-the-2024-list 11 comments machinelearning
Linking pages
Linked pages
- [2402.17764] The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits https://arxiv.org/abs/2402.17764 575 comments
- [2410.05229] GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models https://arxiv.org/abs/2410.05229 267 comments
- [2410.01201] Were RNNs All We Needed? https://arxiv.org/abs/2410.01201 260 comments
- [2406.05587] Creativity Has Left the Chat: The Price of Debiasing Language Models https://arxiv.org/abs/2406.05587 251 comments
- [2410.21333] Mind Your Step (by Step): Chain-of-Thought can Reduce Performance on Tasks where Thinking Makes Humans Worse https://arxiv.org/abs/2410.21333 250 comments
- [2410.05258] Differential Transformer https://arxiv.org/abs/2410.05258 218 comments
- [2402.05120] More Agents Is All You Need https://arxiv.org/abs/2402.05120 206 comments
- [2407.02678] Reasoning in Large Language Models: A Geometric Perspective https://arxiv.org/abs/2407.02678 170 comments
- [2402.04494] Grandmaster-Level Chess Without Search https://arxiv.org/abs/2402.04494 168 comments
- [2401.04088] Mixtral of Experts https://arxiv.org/abs/2401.04088 150 comments
- [2410.02707] LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations https://arxiv.org/abs/2410.02707 140 comments
- [2404.19737] Better & Faster Large Language Models via Multi-token Prediction https://arxiv.org/abs/2404.19737 132 comments
- [2404.14219] Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone https://arxiv.org/abs/2404.14219 130 comments
- [2410.00907] Addition is All You Need for Energy-efficient Language Models https://arxiv.org/abs/2410.00907 127 comments
- [2403.04732] How Far Are We from Intelligent Visual Deductive Reasoning? https://arxiv.org/abs/2403.04732 118 comments
- [2403.05440] Is Cosine-Similarity of Embeddings Really About Similarity? https://arxiv.org/abs/2403.05440 115 comments
- [2405.04517] xLSTM: Extended Long Short-Term Memory https://arxiv.org/abs/2405.04517 114 comments
- [2404.02258] Mixture-of-Depths: Dynamically allocating compute in transformer-based language models https://arxiv.org/abs/2404.02258 103 comments
- [2401.12070] Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text https://arxiv.org/abs/2401.12070 99 comments
- GitHub - rasbt/LLMs-from-scratch: Implementing a ChatGPT-like LLM from scratch, step by step https://github.com/rasbt/LLMs-from-scratch 98 comments
Would you like to stay up to date with Computer science? Checkout Computer science
Weekly.
Related searches:
Search whole site: site:magazine.sebastianraschka.com
Search title: LLM Research Papers: The 2024 List
See how to search.