Research Papers in January 2024 - by Sebastian Raschka, PhD

Linking pages

Linked pages

Brave Leo, the AI browser assistant, now features Mixtral for improved performance | Brave https://brave.com/leo-mixtral/ 178 comments
[2401.04088] Mixtral of Experts https://arxiv.org/abs/2401.04088 150 comments
[1706.03762] Attention Is All You Need https://arxiv.org/abs/1706.03762 145 comments
Phi-2: The surprising power of small language models - Microsoft Research https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/ 121 comments
GitHub - rasbt/LLMs-from-scratch: Implement a ChatGPT-like LLM in PyTorch from scratch, step by step https://github.com/rasbt/LLMs-from-scratch 98 comments
[2401.10020] Self-Rewarding Language Models https://arxiv.org/abs/2401.10020 67 comments
GitHub - jzhang38/TinyLlama https://github.com/jzhang38/TinyLlama 60 comments
[2401.02994] Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM https://arxiv.org/abs/2401.02994 46 comments
[2401.02385] TinyLlama: An Open-Source Small Language Model https://arxiv.org/abs/2401.02385 44 comments
[2401.04081] MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts https://arxiv.org/abs/2401.04081 39 comments
[2401.05566] Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training https://arxiv.org/abs/2401.05566 18 comments
LLM Training: RLHF and Its Alternatives https://magazine.sebastianraschka.com/p/llm-training-rlhf-and-its-alternatives 14 comments
[2401.01335] Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models https://arxiv.org/abs/2401.01335 12 comments
[2209.14981] Stop Wasting My Time! Saving Days of ImageNet and BERT Training with Latest Weight Averaging https://arxiv.org/abs/2209.14981 10 comments
[2401.06104] Transformers are Multi-State RNNs https://arxiv.org/abs/2401.06104 9 comments
[2106.09685] LoRA: Low-Rank Adaptation of Large Language Models https://arxiv.org/abs/2106.09685 8 comments
[2401.02412] LLM Augmented LLMs: Expanding Capabilities through Composition https://arxiv.org/abs/2401.02412 3 comments
[2401.02415] LLaMA Pro: Progressive LLaMA with Block Expansion https://arxiv.org/abs/2401.02415 1 comment
[2401.08406] RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture https://arxiv.org/abs/2401.08406 1 comment
[2401.16380] Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling https://arxiv.org/abs/2401.16380 1 comment