Hacker News
- GShard: Scaling giant models with conditional computation and automatic sharding https://arxiv.org/abs/2006.16668 35 comments
Linking pages
- Introducing Gemini 1.5, Google's next-generation AI model https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/ 715 comments
- Google Research: Themes from 2021 and Beyond – Google AI Blog https://ai.googleblog.com/2022/01/google-research-themes-from-2021-and.html 52 comments
- How to Train Really Large Models on Many GPUs? | Lil'Log https://lilianweng.github.io/posts/2021-09-25-train-large/ 33 comments
- 10 Noteworthy AI Research Papers of 2023 https://magazine.sebastianraschka.com/p/10-ai-research-papers-2023 24 comments
- Techniques for Training Large Neural Networks https://openai.com/blog/techniques-for-training-large-neural-networks/ 23 comments
- GitHub - arpita8/Awesome-Mixture-of-Experts-Papers: Survey: A collection of AWESOME papers and resources on the latest research in Mixture of Experts. https://github.com/arpita8/Awesome-Mixture-of-Experts-Papers 17 comments
- GitHub - JUSTSUJAY/ML-Research-Papers https://github.com/JUSTSUJAY/ML-Research-Papers 10 comments
- Google wins MLPerf benchmark contest with fastest ML training supercomputer | Google Cloud Blog https://cloud.google.com/blog/products/ai-machine-learning/google-breaks-ai-performance-records-in-mlperf-with-worlds-fastest-training-supercomputer 1 comment
- How to Go beyond Data Parallelism and Model Parallelism: Starting from GShard | by OneFlow | Medium https://oneflow2020.medium.com/how-to-go-beyond-data-parallelism-and-model-parallelism-talking-from-gshard-a45e20c1975d 1 comment
- Alpa: Automated Model-Parallel Deep Learning – Google AI Blog https://ai.googleblog.com/2022/05/alpa-automated-model-parallel-deep.html 1 comment
- A decade in deep learning, and what's next https://blog.google/technology/ai/decade-deep-learning-and-whats-next/ 1 comment
- GitHub - tensorflow/lingvo: Lingvo https://github.com/tensorflow/lingvo 0 comments
- Google Research: Looking Back at 2020, and Forward to 2021 – Google AI Blog https://ai.googleblog.com/2021/01/google-research-looking-back-at-2020.html 0 comments
- Deep Learning applications for COVID-19 | Journal of Big Data | Full Text https://journalofbigdata.springeropen.com/articles/10.1186/s40537-020-00392-9 0 comments
- Jan 2021 Gwern.net Newsletter - Gwern.net Newsletter https://gwern.substack.com/p/jan-2021-gwernnet-newsletter 0 comments
- Google wins MLPerf benchmarks with TPU v4 | Google Cloud Blog https://cloud.google.com/blog/products/ai-machine-learning/google-wins-mlperf-benchmarks-with-tpu-v4 0 comments
- General and Scalable Parallelization for Neural Networks – Google AI Blog https://ai.googleblog.com/2021/12/general-and-scalable-parallelization.html 0 comments
- LIMoE: Learning Multiple Modalities with One Sparse Mixture-of-Experts Model – Google AI Blog https://ai.googleblog.com/2022/06/limoe-learning-multiple-modalities-with.html 0 comments
- Alpa: Automated Model-Parallel Deep Learning – Google AI Blog https://ai.googleblog.com/2022/05/alpa-automated-model-parallel-deep.html?m=1 0 comments
- TensorFlow DTensor: Unified API for Distributed Deep Network Training https://www.infoq.com/news/2022/05/tensorflow-dtensor/ 0 comments
Related searches:
Search whole site: site:arxiv.org
Search title: [2006.16668] GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
See how to search.