Research Papers in February 2024 - discu.eu

Reddit

[P] Research Papers in February 2024 — A Potential LoRA Successor, Small Finetuned LLMs Vs Generalist LLMs, and Transparent LLM Research https://sebastianraschka.com/blog/2024/research-papers-in-february-2024.html 7 comments 3/3/2024 machinelearning

Linked pages

[2402.17764] The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits https://arxiv.org/abs/2402.17764 575 comments
[2402.05120] More Agents Is All You Need https://arxiv.org/abs/2402.05120 206 comments
[2402.04494] Grandmaster-Level Chess Without Search https://arxiv.org/abs/2402.04494 168 comments
[1706.03762] Attention Is All You Need https://arxiv.org/abs/1706.03762 145 comments
GitHub - rasbt/LLMs-from-scratch: Implement a ChatGPT-like LLM in PyTorch from scratch, step by step https://github.com/rasbt/LLMs-from-scratch 98 comments
[2402.13144] Neural Network Diffusion https://arxiv.org/abs/2402.13144 86 comments
[2402.06184] The boundary of neural network trainability is fractal https://arxiv.org/abs/2402.06184 65 comments
Understanding, Using, and Finetuning Gemma - a Lightning Studio by sebastian https://lightning.ai/lightning-ai/studios/understanding-using-and-finetuning-gemma 48 comments
[2402.12354] LoRA+: Efficient Low Rank Adaptation of Large Models https://arxiv.org/abs/2402.12354 47 comments
[2402.13753] LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens https://arxiv.org/abs/2402.13753 46 comments
[2402.19427] Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models https://arxiv.org/abs/2402.19427 32 comments
Build a Large Language Model (From Scratch) https://www.manning.com/books/build-a-large-language-model-from-scratch 6 comments
[2402.03885] MOMENT: A Family of Open Time-series Foundation Models https://arxiv.org/abs/2402.03885 5 comments
[2402.04792] Direct Language Model Alignment from Online AI Feedback https://arxiv.org/abs/2402.04792 4 comments
[2402.13446] Large Language Models for Data Annotation: A Survey https://arxiv.org/abs/2402.13446 4 comments
[2402.03300] DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models https://arxiv.org/abs/2402.03300 2 comments
[2402.15391] Genie: Generative Interactive Environments https://arxiv.org/abs/2402.15391 2 comments
[2402.03902] A phase transition between positional and semantic learning in a solvable model of dot-product attention https://arxiv.org/abs/2402.03902 1 comment
[2402.07896] Suppressing Pink Elephants with Direct Principle Feedback https://arxiv.org/abs/2402.07896 1 comment
LLMs-from-scratch/ch02/03_bonus_embedding-vs-matmul/embeddings-and-linear-layers.ipynb at main · rasbt/LLMs-from-scratch · GitHub https://github.com/rasbt/LLMs-from-scratch/blob/main/ch02/03_bonus_embedding-vs-matmul/embeddings-and-linear-layers.ipynb 1 comment

Would you like to stay up to date with Computer science? Checkout Computer science Weekly.

Related searches:

Search whole site: site:sebastianraschka.com

Search title: Research Papers in February 2024

See how to search.

Submit link to: