Hacker News
- Noteworthy AI Research Papers of 2024 (Part One) https://magazine.sebastianraschka.com/p/ai-research-papers-2024-part-1 0 comments
- Noteworthy AI Research Papers of 2024 (Part One) https://magazine.sebastianraschka.com/p/ai-research-papers-2024-part-1 0 comments
- [P] Noteworthy AI Research Papers of 2024 (Part One) https://magazine.sebastianraschka.com/p/ai-research-papers-2024-part-1 8 comments machinelearning
Linked pages
- [2401.04088] Mixtral of Experts https://arxiv.org/abs/2401.04088 150 comments
- Common Crawl https://commoncrawl.org/ 85 comments
- [2405.09673] LoRA Learns Less and Forgets Less https://arxiv.org/abs/2405.09673 60 comments
- [2407.21075] Apple Intelligence Foundation Language Models https://arxiv.org/abs/2407.21075 42 comments
- Build a Large Language Model (From Scratch): Raschka, Sebastian: 9781633437166: Amazon.com: Books https://www.amazon.com/dp/1633437167/ 38 comments
- Practical Tips for Finetuning LLMs Using LoRA (Low-Rank Adaptation) https://magazine.sebastianraschka.com/p/practical-tips-for-finetuning-llms 37 comments
- LLM Training: RLHF and Its Alternatives https://magazine.sebastianraschka.com/p/llm-training-rlhf-and-its-alternatives 14 comments
- LLM Research Papers: The 2024 List https://magazine.sebastianraschka.com/p/llm-research-papers-the-2024-list 11 comments
- Improving LoRA: Implementing Weight-Decomposed Low-Rank Adaptation (DoRA) from Scratch https://magazine.sebastianraschka.com/p/lora-and-dora-from-scratch 10 comments
- [2404.19756] KAN: Kolmogorov-Arnold Networks https://arxiv.org/abs/2404.19756 8 comments
- LLMs-from-scratch/ch07/04_preference-tuning-with-dpo/dpo-from-scratch.ipynb at main · rasbt/LLMs-from-scratch · GitHub https://github.com/rasbt/LLMs-from-scratch/blob/main/ch07/04_preference-tuning-with-dpo/dpo-from-scratch.ipynb 5 comments
- HuggingFaceFW/fineweb · Datasets at Hugging Face https://huggingface.co/datasets/HuggingFaceFW/fineweb 4 comments
- New LLM Pre-training and Post-training Paradigms https://magazine.sebastianraschka.com/p/new-llm-pre-training-and-post-training 2 comments
- DeepSeek-V3/DeepSeek_V3.pdf at main · deepseek-ai/DeepSeek-V3 · GitHub https://github.com/deepseek-ai/DeepSeek-V3/blob/main/DeepSeek_V3.pdf 2 comments
- [2403.08763] Simple and Scalable Strategies to Continually Pre-train Large Language Models https://arxiv.org/abs/2403.08763 1 comment
- [2203.15556] Training Compute-Optimal Large Language Models https://arxiv.org/abs/2203.15556 0 comments
- GitHub - togethercomputer/RedPajama-Data: The RedPajama-Data repository contains code for preparing large datasets for training large language models. https://github.com/togethercomputer/RedPajama-Data 0 comments
- Tips for LLM Pretraining and Evaluating Reward Models https://magazine.sebastianraschka.com/p/tips-for-llm-pretraining-and-evaluating-rms 0 comments
- [2407.21783] The Llama 3 Herd of Models https://arxiv.org/abs/2407.21783 0 comments
Would you like to stay up to date with Computer science? Checkout Computer science
Weekly.
Related searches:
Search whole site: site:magazine.sebastianraschka.com
Search title: Noteworthy AI Research Papers of 2024 (Part One)
See how to search.