GitHub - microsoft/Megatron-DeepSpeed: Ongoing research training transformer language models at scale, including: BERT & GPT-2 - discu.eu

Linking pages

GitHub - EleutherAI/lm-evaluation-harness: A framework for few-shot evaluation of autoregressive language models. https://github.com/EleutherAI/lm-evaluation-harness 0 comments
Top 9 Libraries to Accelerate LLM Building - by Avi Chawla https://www.blog.aiport.tech/p/top-9-libraries-to-accelerate-llm 0 comments

Linked pages

NVIDIA A100 | NVIDIA https://www.nvidia.com/en-us/data-center/a100/ 280 comments
[2005.14165] Language Models are Few-Shot Learners https://arxiv.org/abs/2005.14165 201 comments
[1810.04805] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding https://arxiv.org/abs/1810.04805 25 comments
[2112.04426] Improving language models by retrieving from trillions of tokens https://arxiv.org/abs/2112.04426 9 comments
PyTorch | NVIDIA NGC https://ngc.nvidia.com/catalog/containers/nvidia:pytorch 3 comments
GitHub - HazyResearch/flash-attention: Fast and memory-efficient exact attention https://github.com/HazyResearch/flash-attention 3 comments
[2104.04473] Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM https://arxiv.org/abs/2104.04473 1 comment
[1910.10683] Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer https://arxiv.org/abs/1910.10683 1 comment
GitHub - jcpeterson/openwebtext: Open clone of OpenAI's unreleased WebText dataset scraper. This version uses pushshift.io files instead of the API for speed. https://github.com/jcpeterson/openwebtext 0 comments

Related searches:

Search whole site: site:github.com

Search title: GitHub - microsoft/Megatron-DeepSpeed: Ongoing research training transformer language models at scale, including: BERT & GPT-2

See how to search.

Submit link to: