Linking pages
- GitHub - EleutherAI/lm-evaluation-harness: A framework for few-shot evaluation of autoregressive language models. https://github.com/EleutherAI/lm-evaluation-harness 0 comments
- Top 9 Libraries to Accelerate LLM Building - by Avi Chawla https://www.blog.aiport.tech/p/top-9-libraries-to-accelerate-llm 0 comments
Linked pages
- NVIDIA A100 | NVIDIA https://www.nvidia.com/en-us/data-center/a100/ 280 comments
- [2005.14165] Language Models are Few-Shot Learners https://arxiv.org/abs/2005.14165 201 comments
- [1810.04805] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding https://arxiv.org/abs/1810.04805 25 comments
- [2112.04426] Improving language models by retrieving from trillions of tokens https://arxiv.org/abs/2112.04426 9 comments
- PyTorch | NVIDIA NGC https://ngc.nvidia.com/catalog/containers/nvidia:pytorch 3 comments
- GitHub - HazyResearch/flash-attention: Fast and memory-efficient exact attention https://github.com/HazyResearch/flash-attention 3 comments
- [2104.04473] Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM https://arxiv.org/abs/2104.04473 1 comment
- [1910.10683] Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer https://arxiv.org/abs/1910.10683 1 comment
- GitHub - jcpeterson/openwebtext: Open clone of OpenAI's unreleased WebText dataset scraper. This version uses pushshift.io files instead of the API for speed. https://github.com/jcpeterson/openwebtext 0 comments
Related searches:
Search whole site: site:github.com
Search title: GitHub - microsoft/Megatron-DeepSpeed: Ongoing research training transformer language models at scale, including: BERT & GPT-2
See how to search.