GitHub - InterviewReady/ai-engineering-resources: Research papers and blogs to transition to AI Engineering - discu.eu

Reddit

Collection of research papers relevant for AI Engineers (Large Language Models specifically) https://github.com/InterviewReady/ai-engineering-resources 0 comments 14/5/2025 learnmachinelearning

Linked pages

[2402.17764] The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits https://arxiv.org/abs/2402.17764 575 comments
https://gwern.net/doc/psychology/linguistics/2024-fedorenko.pdf 391 comments
Introducing the Model Context Protocol \ Anthropic https://www.anthropic.com/news/model-context-protocol 269 comments
[2402.09171] Automated Unit Test Improvement using Large Language Models at Meta https://arxiv.org/abs/2402.09171 188 comments
[2305.13048] RWKV: Reinventing RNNs for the Transformer Era https://arxiv.org/abs/2305.13048 171 comments
[1701.06538] Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer https://arxiv.org/abs/1701.06538 125 comments
[2412.06769] Training Large Language Models to Reason in a Continuous Latent Space https://arxiv.org/abs/2412.06769 114 comments
GitHub - openai/swarm: Framework for building, orchestrating and deploying multi-agent systems. Managed by OpenAI Solutions team. Experimental framework. https://github.com/openai/swarm 106 comments
[2501.04682] Towards System 2 Reasoning in LLMs: Learning How to Think With Meta Chain-of-Though https://arxiv.org/abs/2501.04682 75 comments
[2502.05171] Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach https://arxiv.org/abs/2502.05171 57 comments
[2501.00663] Titans: Learning to Memorize at Test Time https://arxiv.org/abs/2501.00663 52 comments
[2312.00752] Mamba: Linear-Time Sequence Modeling with Selective State Spaces https://arxiv.org/abs/2312.00752 42 comments
[2006.16668] GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding https://arxiv.org/abs/2006.16668 35 comments
[1810.04805] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding https://arxiv.org/abs/1810.04805 25 comments
Solving olympiad geometry without human demonstrations | Nature https://www.nature.com/articles/s41586-023-06747-5 23 comments
[2412.09871] Byte Latent Transformer: Patches Scale Better Than Tokens https://arxiv.org/abs/2412.09871 22 comments
[1712.05889] Ray: A Distributed Framework for Emerging AI Applications https://arxiv.org/abs/1712.05889 15 comments
https://arxiv.org/pdf/2103.00020.pdf 11 comments
[2407.08608] FlashAttention-3: Fast and Accurate Attention with Asynchrony and Low-precision https://arxiv.org/abs/2407.08608 6 comments
[1503.02531] Distilling the Knowledge in a Neural Network https://arxiv.org/abs/1503.02531 5 comments

Would you like to stay up to date with Computer science? Checkout Computer science Weekly.

Related searches:

Search whole site: site:github.com

Search title: GitHub - InterviewReady/ai-engineering-resources: Research papers and blogs to transition to AI Engineering

See how to search.

Submit link to: