Google breaks the trillion-parameter ceiling with the Switch Transformer - discu.eu

Hacker News

Google breaks the trillion-parameter ceiling with the Switch Transformer https://arxiv.org/abs/2101.03961 4 comments 18/1/2021

Linking pages

Introducing Gemini 1.5, Google's next-generation AI model https://blog.google/technology/ai/google-gemini-next-generation-model-february-2024/ 715 comments
Non-determinism in GPT-4 is caused by Sparse MoE - 152334H https://152334h.github.io/blog/non-determinism-in-gpt-4/ 181 comments
Google Gemini Eats The World – Gemini Smashes GPT-4 By 5X, The GPU-Poors https://www.semianalysis.com/p/google-gemini-eats-the-world-gemini 113 comments
Google Open-Sources Trillion-Parameter AI Language Model Switch Transformer https://www.infoq.com/news/2021/02/google-trillion-parameter-ai/ 95 comments
GitHub - VikParuchuri/marker: Convert PDF to markdown quickly with high accuracy https://github.com/VikParuchuri/marker 95 comments
GPT-4's Secret Has Been Revealed - by Alberto Romero https://thealgorithmicbridge.substack.com/p/gpt-4s-secret-has-been-revealed 62 comments
Google Research: Themes from 2021 and Beyond – Google AI Blog https://ai.googleblog.com/2022/01/google-research-themes-from-2021-and.html 52 comments
Minority Voices 'Filtered' Out of Google Natural Language Processing Models - Unite.AI https://www.unite.ai/minority-voices-filtered-out-of-google-natural-language-processing-models/ 34 comments
How to Train Really Large Models on Many GPUs? | Lil'Log https://lilianweng.github.io/posts/2021-09-25-train-large/ 33 comments
Google trained a trillion-parameter AI language model | VentureBeat https://venturebeat.com/2021/01/12/google-trained-a-trillion-parameter-ai-language-model/ 30 comments
GitHub - huggingface/transformers: 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX. https://github.com/huggingface/transformers 26 comments
10 Noteworthy AI Research Papers of 2023 https://magazine.sebastianraschka.com/p/10-ai-research-papers-2023 24 comments
Techniques for Training Large Neural Networks https://openai.com/blog/techniques-for-training-large-neural-networks/ 23 comments
Wu Dao 2.0: A Monster of 1.75 Trillion Parameters | by Alberto Romero | Medium | Towards Data Science https://towardsdatascience.com/gpt-3-scared-you-meet-wu-dao-2-0-a-monster-of-1-75-trillion-parameters-832cd83db484 10 comments
Chinese AI lab challenges Google, OpenAI with a model of 1.75 trillion parameters - PingWest https://en.pingwest.com/a/8693 6 comments
GPT-4 architecture: what we can deduce from research literature | Kirill Gadjello's personal blog and website https://kir-gadjello.github.io/posts/gpt4-some-technical-hypotheses/ 6 comments
GitHub - pjlab-sys4nlp/llama-moe: ⛷️ LLaMA-MoE: Building Mixture-of-Experts from LLaMA with Continual Pre-training https://github.com/pjlab-sys4nlp/llama-moe 6 comments
Meta is building an AI supercomputer | CNN Business https://edition.cnn.com/2022/01/24/tech/meta-supercomputer/index.html 4 comments
Code Interpreter == GPT 4.5 (w/ Simon Willison & Alex Volkov) https://www.latent.space/p/code-interpreter 4 comments
Hosting Hugging Face models on AWS Lambda for serverless inference | AWS Compute Blog https://aws.amazon.com/blogs/compute/hosting-hugging-face-models-on-aws-lambda/ 3 comments

Related searches:

Search whole site: site:arxiv.org

Search title: Google breaks the trillion-parameter ceiling with the Switch Transformer

See how to search.

Submit link to: