- [P] Research Papers in February 2024 — A Potential LoRA Successor, Small Finetuned LLMs Vs Generalist LLMs, and Transparent LLM Research https://sebastianraschka.com/blog/2024/research-papers-in-february-2024.html 7 comments machinelearning
Linked pages
- [2402.17764] The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits https://arxiv.org/abs/2402.17764 575 comments
- [2402.05120] More Agents Is All You Need https://arxiv.org/abs/2402.05120 206 comments
- [2402.04494] Grandmaster-Level Chess Without Search https://arxiv.org/abs/2402.04494 168 comments
- [1706.03762] Attention Is All You Need https://arxiv.org/abs/1706.03762 145 comments
- GitHub - rasbt/LLMs-from-scratch: Implementing a ChatGPT-like LLM from scratch, step by step https://github.com/rasbt/LLMs-from-scratch 98 comments
- [2402.13144] Neural Network Diffusion https://arxiv.org/abs/2402.13144 86 comments
- [2402.06184] The boundary of neural network trainability is fractal https://arxiv.org/abs/2402.06184 65 comments
- Understanding, Using, and Finetuning Gemma - a Lightning Studio by sebastian https://lightning.ai/lightning-ai/studios/understanding-using-and-finetuning-gemma 48 comments
- [2402.12354] LoRA+: Efficient Low Rank Adaptation of Large Models https://arxiv.org/abs/2402.12354 47 comments
- [2402.13753] LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens https://arxiv.org/abs/2402.13753 46 comments
- [2402.19427] Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models https://arxiv.org/abs/2402.19427 32 comments
- [2402.03885] MOMENT: A Family of Open Time-series Foundation Models https://arxiv.org/abs/2402.03885 5 comments
- [2402.04792] Direct Language Model Alignment from Online AI Feedback https://arxiv.org/abs/2402.04792 4 comments
- [2402.13446] Large Language Models for Data Annotation: A Survey https://arxiv.org/abs/2402.13446 4 comments
- [2402.15391] Genie: Generative Interactive Environments https://arxiv.org/abs/2402.15391 2 comments
- [2402.03300] DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models https://arxiv.org/abs/2402.03300 1 comment
- [2402.03902] A phase transition between positional and semantic learning in a solvable model of dot-product attention https://arxiv.org/abs/2402.03902 1 comment
- [2402.07896] Suppressing Pink Elephants with Direct Principle Feedback https://arxiv.org/abs/2402.07896 1 comment
- LLMs-from-scratch/ch02/03_bonus_embedding-vs-matmul/embeddings-and-linear-layers.ipynb at main · rasbt/LLMs-from-scratch · GitHub https://github.com/rasbt/LLMs-from-scratch/blob/main/ch02/03_bonus_embedding-vs-matmul/embeddings-and-linear-layers.ipynb 1 comment
- [2402.10986] FinTral: A Family of GPT-4 Level Multimodal Financial Large Language Models https://arxiv.org/abs/2402.10986 1 comment
Would you like to stay up to date with Computer science? Checkout Computer science
Weekly.
Related searches:
Search whole site: site:sebastianraschka.com
Search title: Research Papers in February 2024
See how to search.