Hacker News
- Google breaks the trillion-parameter ceiling with the Switch Transformer https://arxiv.org/abs/2101.03961 4 comments
Linking pages
- Non-determinism in GPT-4 is caused by Sparse MoE - 152334H https://152334h.github.io/blog/non-determinism-in-gpt-4/ 181 comments
- Google Gemini Eats The World – Gemini Smashes GPT-4 By 5X, The GPU-Poors https://www.semianalysis.com/p/google-gemini-eats-the-world-gemini 113 comments
- Google Open-Sources Trillion-Parameter AI Language Model Switch Transformer https://www.infoq.com/news/2021/02/google-trillion-parameter-ai/ 95 comments
- GPT-4's Secret Has Been Revealed - by Alberto Romero https://thealgorithmicbridge.substack.com/p/gpt-4s-secret-has-been-revealed 62 comments
- Google Research: Themes from 2021 and Beyond – Google AI Blog https://ai.googleblog.com/2022/01/google-research-themes-from-2021-and.html 52 comments
- Minority Voices 'Filtered' Out of Google Natural Language Processing Models - Unite.AI https://www.unite.ai/minority-voices-filtered-out-of-google-natural-language-processing-models/ 34 comments
- How to Train Really Large Models on Many GPUs? | Lil'Log https://lilianweng.github.io/posts/2021-09-25-train-large/ 33 comments
- Google trained a trillion-parameter AI language model | VentureBeat https://venturebeat.com/2021/01/12/google-trained-a-trillion-parameter-ai-language-model/ 30 comments
- GitHub - huggingface/transformers: 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX. https://github.com/huggingface/transformers 26 comments
- Techniques for Training Large Neural Networks https://openai.com/blog/techniques-for-training-large-neural-networks/ 23 comments
- Wu Dao 2.0: A Monster of 1.75 Trillion Parameters | by Alberto Romero | Medium | Towards Data Science https://towardsdatascience.com/gpt-3-scared-you-meet-wu-dao-2-0-a-monster-of-1-75-trillion-parameters-832cd83db484 10 comments
- Chinese AI lab challenges Google, OpenAI with a model of 1.75 trillion parameters - PingWest https://en.pingwest.com/a/8693 6 comments
- GPT-4 architecture: what we can deduce from research literature | Kirill Gadjello's personal blog and website https://kir-gadjello.github.io/posts/gpt4-some-technical-hypotheses/ 6 comments
- Meta is building an AI supercomputer | CNN Business https://edition.cnn.com/2022/01/24/tech/meta-supercomputer/index.html 4 comments
- Code Interpreter == GPT 4.5 (w/ Simon Willison & Alex Volkov) https://www.latent.space/p/code-interpreter 4 comments
- Hosting Hugging Face models on AWS Lambda for serverless inference | AWS Compute Blog https://aws.amazon.com/blogs/compute/hosting-hugging-face-models-on-aws-lambda/ 3 comments
- Cerebras Proposes AI Megacluster with Billions of AI Compute Cores https://www.hpcwire.com/2022/09/14/cerebras-proposes-ai-megacluster-with-billions-of-ai-compute-cores/ 1 comment
- Extrapolating to Unnatural Language Processing with GPT-3’s In-context Learning: The Good, the Bad, and the Mysterious | SAIL Blog https://ai.stanford.edu/blog/in-context-learning/ 1 comment
- Google Brain’s Switch Transformer Language Model Packs 1.6-Trillion Parameters | Synced https://syncedreview.com/2021/01/14/google-brains-switch-transformer-language-model-packs-1-6-trillion-parameters/ 1 comment
- Why Release a Large Language Model? | EleutherAI Blog https://blog.eleuther.ai/why-release-a-large-language-model/ 0 comments
Related searches:
Search whole site: site:arxiv.org
Search title: Google breaks the trillion-parameter ceiling with the Switch Transformer
See how to search.