Linking pages
- Research Papers in February 2024 https://sebastianraschka.com/blog/2024/research-papers-in-february-2024.html 7 comments
- New LLM Pre-training and Post-training Paradigms https://magazine.sebastianraschka.com/p/new-llm-pre-training-and-post-training 2 comments
- How Good Are the Latest Open LLMs? And Is DPO Better Than PPO? https://magazine.sebastianraschka.com/p/how-good-are-the-latest-open-llms 1 comment
- New LLM Pre-training and Post-training Paradigms https://sebastianraschka.com/blog/2024/new-llm-pre-training-and-post-training.html 0 comments
Linked pages
- Brave Leo, the AI browser assistant, now features Mixtral for improved performance | Brave https://brave.com/leo-mixtral/ 178 comments
- [2401.04088] Mixtral of Experts https://arxiv.org/abs/2401.04088 151 comments
- [1706.03762] Attention Is All You Need https://arxiv.org/abs/1706.03762 145 comments
- Phi-2: The surprising power of small language models - Microsoft Research https://www.microsoft.com/en-us/research/blog/phi-2-the-surprising-power-of-small-language-models/ 121 comments
- GitHub - rasbt/LLMs-from-scratch: Implementing a ChatGPT-like LLM from scratch, step by step https://github.com/rasbt/LLMs-from-scratch 98 comments
- [2401.10020] Self-Rewarding Language Models https://arxiv.org/abs/2401.10020 68 comments
- GitHub - jzhang38/TinyLlama https://github.com/jzhang38/TinyLlama 60 comments
- [2401.02994] Blending Is All You Need: Cheaper, Better Alternative to Trillion-Parameters LLM https://arxiv.org/abs/2401.02994 46 comments
- [2401.02385] TinyLlama: An Open-Source Small Language Model https://arxiv.org/abs/2401.02385 44 comments
- [2401.04081] MoE-Mamba: Efficient Selective State Space Models with Mixture of Experts https://arxiv.org/abs/2401.04081 39 comments
- [2401.05566] Sleeper Agents: Training Deceptive LLMs that Persist Through Safety Training https://arxiv.org/abs/2401.05566 18 comments
- LLM Training: RLHF and Its Alternatives https://magazine.sebastianraschka.com/p/llm-training-rlhf-and-its-alternatives 14 comments
- [2401.01335] Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models https://arxiv.org/abs/2401.01335 12 comments
- [2209.14981] Stop Wasting My Time! Saving Days of ImageNet and BERT Training with Latest Weight Averaging https://arxiv.org/abs/2209.14981 10 comments
- [2401.06104] Transformers are Multi-State RNNs https://arxiv.org/abs/2401.06104 9 comments
- [2106.09685] LoRA: Low-Rank Adaptation of Large Language Models https://arxiv.org/abs/2106.09685 8 comments
- [2401.02412] LLM Augmented LLMs: Expanding Capabilities through Composition https://arxiv.org/abs/2401.02412 3 comments
- [2401.02415] LLaMA Pro: Progressive LLaMA with Block Expansion https://arxiv.org/abs/2401.02415 1 comment
- [2401.08406] RAG vs Fine-tuning: Pipelines, Tradeoffs, and a Case Study on Agriculture https://arxiv.org/abs/2401.08406 1 comment
- [2401.16380] Rephrasing the Web: A Recipe for Compute and Data-Efficient Language Modeling https://arxiv.org/abs/2401.16380 1 comment
Related searches:
Search whole site: site:magazine.sebastianraschka.com
Search title: Research Papers in January 2024 - by Sebastian Raschka, PhD
See how to search.