Linking pages
- GPT-2: 6-Month Follow-Up https://openai.com/blog/gpt-2-6-month-follow-up/ 96 comments
- Deep learning has a size problem. Shifting from state-of-the-art accuracy… | by Jameson Toole | Heartbeat https://heartbeat.fritz.ai/deep-learning-has-a-size-problem-ea601304cd8 46 comments
- Transformers are Graph Neural Networks https://thegradient.pub/transformers-are-graph-neural-networks/ 25 comments
- Techniques for Training Large Neural Networks https://openai.com/blog/techniques-for-training-large-neural-networks/ 23 comments
- Transformers are Graph Neural Networks | NTU Graph Deep Learning Lab https://graphdeeplearning.github.io/post/transformers-are-gnns/ 19 comments
- Can the planet really afford the exorbitant power demands of machine learning? | John Naughton | The Guardian https://www.theguardian.com/commentisfree/2019/nov/16/can-planet-afford-exorbitant-power-demands-of-machine-learning 2 comments
- Nvidia breaks records in training and inference for real-time conversational AI • TechCrunch https://techcrunch.com/2019/08/13/nvidia-breaks-records-in-training-and-inference-for-real-time-conversational-ai/ 0 comments
Linked pages
- [1810.04805] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding https://arxiv.org/abs/1810.04805 25 comments
- GitHub - mattilyra/LSH: Locality Sensitive Hashing using MinHash in Python/Cython to detect near duplicate text documents https://github.com/mattilyra/LSH 2 comments
- https://d4mucfpksywv.cloudfront.net/better-language-models/language_models_are_unsupervised_multitask_learners.pdf 1 comment
- [1811.02084] Mesh-TensorFlow: Deep Learning for Supercomputers https://arxiv.org/abs/1811.02084 0 comments
- [1811.06965] GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism https://arxiv.org/abs/1811.06965 0 comments
- [1604.06174] Training Deep Nets with Sublinear Memory Cost https://arxiv.org/abs/1604.06174 0 comments
Related searches:
Search whole site: site:nv-adlr.github.io
Search title: MegatronLM: Training Billion+ Parameter Language Models Using GPU Model Parallelism - NVIDIA ADLR
See how to search.