- [P] Curated a list of 70+ Research Papers for Serious Deep Dive https://github.com/JUSTSUJAY/ML-Research-Papers 9 comments machinelearning
Linked pages
- [2005.14165] Language Models are Few-Shot Learners https://arxiv.org/abs/2005.14165 201 comments
- [2302.04761] Toolformer: Language Models Can Teach Themselves to Use Tools https://arxiv.org/abs/2302.04761 153 comments
- [1706.03762] Attention Is All You Need https://arxiv.org/abs/1706.03762 145 comments
- [2305.14314] QLoRA: Efficient Finetuning of Quantized LLMs https://arxiv.org/abs/2305.14314 129 comments
- [1701.06538] Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer https://arxiv.org/abs/1701.06538 125 comments
- [2307.14334] Towards Generalist Biomedical AI https://arxiv.org/abs/2307.14334 87 comments
- [2305.11206] LIMA: Less Is More for Alignment https://arxiv.org/abs/2305.11206 44 comments
- https://proceedings.neurips.cc/paper_files/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf 43 comments
- [2006.16668] GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding https://arxiv.org/abs/2006.16668 35 comments
- https://arxiv.org/pdf/2006.11239.pdf 25 comments
- [1810.04805] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding https://arxiv.org/abs/1810.04805 25 comments
- [2310.11453] BitNet: Scaling 1-bit Transformers for Large Language Models https://arxiv.org/abs/2310.11453 21 comments
- [2103.14030] Swin Transformer: Hierarchical Vision Transformer using Shifted Windows https://arxiv.org/abs/2103.14030 20 comments
- [2303.09752] CoLT5: Faster Long-Range Transformers with Conditional Computation https://arxiv.org/abs/2303.09752 20 comments
- [2108.12409] Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation https://arxiv.org/abs/2108.12409 17 comments
- [2309.06180] Efficient Memory Management for Large Language Model Serving with PagedAttention https://arxiv.org/abs/2309.06180 16 comments
- [2205.01917] CoCa: Contrastive Captioners are Image-Text Foundation Models https://arxiv.org/abs/2205.01917 14 comments
- [2002.08909] REALM: Retrieval-Augmented Language Model Pre-Training https://arxiv.org/abs/2002.08909 13 comments
- https://arxiv.org/pdf/2112.04426.pdf 12 comments
- [2108.07258] On the Opportunities and Risks of Foundation Models https://arxiv.org/abs/2108.07258 11 comments
Would you like to stay up to date with Computer science? Checkout Computer science
Weekly.
Related searches:
Search whole site: site:github.com
Search title: GitHub - JUSTSUJAY/ML-Research-Papers
See how to search.