- Collection of research papers relevant for AI Engineers (Large Language Models specifically) https://github.com/InterviewReady/ai-engineering-resources 0 comments learnmachinelearning
Linked pages
- [2402.17764] The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits https://arxiv.org/abs/2402.17764 575 comments
- https://gwern.net/doc/psychology/linguistics/2024-fedorenko.pdf 391 comments
- Introducing the Model Context Protocol \ Anthropic https://www.anthropic.com/news/model-context-protocol 269 comments
- [2402.09171] Automated Unit Test Improvement using Large Language Models at Meta https://arxiv.org/abs/2402.09171 188 comments
- [2305.13048] RWKV: Reinventing RNNs for the Transformer Era https://arxiv.org/abs/2305.13048 171 comments
- [1701.06538] Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer https://arxiv.org/abs/1701.06538 125 comments
- [2412.06769] Training Large Language Models to Reason in a Continuous Latent Space https://arxiv.org/abs/2412.06769 114 comments
- GitHub - openai/swarm: Framework for building, orchestrating and deploying multi-agent systems. Managed by OpenAI Solutions team. Experimental framework. https://github.com/openai/swarm 106 comments
- [2501.04682] Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though https://arxiv.org/abs/2501.04682 75 comments
- [2502.05171] Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach https://arxiv.org/abs/2502.05171 57 comments
- [2501.00663] Titans: Learning to Memorize at Test Time https://arxiv.org/abs/2501.00663 52 comments
- [2312.00752] Mamba: Linear-Time Sequence Modeling with Selective State Spaces https://arxiv.org/abs/2312.00752 42 comments
- [2006.16668] GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding https://arxiv.org/abs/2006.16668 35 comments
- [1810.04805] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding https://arxiv.org/abs/1810.04805 25 comments
- Solving olympiad geometry without human demonstrations | Nature https://www.nature.com/articles/s41586-023-06747-5 23 comments
- [2412.09871] Byte Latent Transformer: Patches Scale Better Than Tokens https://arxiv.org/abs/2412.09871 22 comments
- [1712.05889] Ray: A Distributed Framework for Emerging AI Applications https://arxiv.org/abs/1712.05889 15 comments
- https://arxiv.org/pdf/2103.00020.pdf 11 comments
- [2407.08608] FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision https://arxiv.org/abs/2407.08608 6 comments
- [1503.02531] Distilling the Knowledge in a Neural Network https://arxiv.org/abs/1503.02531 5 comments
Would you like to stay up to date with Computer science? Checkout Computer science
Weekly.
Related searches:
Search whole site: site:github.com
Search title: GitHub - InterviewReady/ai-engineering-resources: Research papers and blogs to transition to AI Engineering
See how to search.