- [D] Same param count for GPT4 from Nvidia GTC24 as the leak we got from Semianalysis https://www.semianalysis.com/p/gpt-4-architecture-infrastructure 10 comments machinelearning
Linking pages
- Non-determinism in GPT-4 is caused by Sparse MoE - 152334H https://152334h.github.io/blog/non-determinism-in-gpt-4/ 181 comments
- China AI & Semiconductors Rise: US Sanctions Have Failed https://www.semianalysis.com/p/china-ai-and-semiconductors-rise 122 comments
- Google Gemini Eats The World – Gemini Smashes GPT-4 By 5X, The GPU-Poors https://www.semianalysis.com/p/google-gemini-eats-the-world-gemini 113 comments
- Microsoft Infrastructure - AI & CPU Custom Silicon Maia 100, Athena, Cobalt 100 https://www.semianalysis.com/p/microsoft-infrastructure-ai-and-cpu?microsoft= 14 comments
- Why GPT-3.5 is (mostly) cheaper than Llama 2 https://www.cursor.so/blog/llama-inference 10 comments
- Microsoft Infrastructure - AI & CPU Custom Silicon Maia 100, Athena, Cobalt 100 https://www.semianalysis.com/p/microsoft-infrastructure-ai-and-cpu 5 comments
- 100k H100 Clusters: Power, Network Topology, Ethernet vs InfiniBand, Reliability, Failures, Checkpointing https://www.semianalysis.com/p/100000-h100-clusters-power-network 3 comments
- Is GPT-4 getting worse over time? https://www.aisnakeoil.com/p/is-gpt-4-getting-worse-over-time 1 comment
- Knowing Enough About MoE to Explain Dropped Tokens in GPT-4 - 152334H https://152334h.github.io/blog/knowing-enough-about-moe/ 1 comment
- Nvidia Blackwell Perf TCO Analysis - B100 vs B200 vs GB200NVL72 https://www.semianalysis.com/p/nvidia-blackwell-perf-tco-analysis 0 comments
- OpenAI Is Doomed - Et tu, Microsoft? https://www.semianalysis.com/p/openai-is-doomed-et-tu-microsoft 0 comments
- Leopold Aschenbrenner - China/US Super Intelligence Race, 2027 AGI, & The Return of History https://www.dwarkeshpatel.com/p/leopold-aschenbrenner 0 comments
Linked pages
- GitHub - NVIDIA/FasterTransformer: Transformer related optimization, including BERT, GPT https://github.com/NVIDIA/FasterTransformer/ 1 comment
- The AI Brick Wall – A Practical Limit For Scaling Dense Transformer Models, and How GPT 4 Will Break Past It https://www.semianalysis.com/p/the-ai-brick-wall-a-practical-limit 0 comments
- On Device AI – Double-Edged Sword https://www.semianalysis.com/p/on-device-ai-double-edged-sword 0 comments
Would you like to stay up to date with Computer science? Checkout Computer science
Weekly.
Related searches:
Search whole site: site:semianalysis.com
Search title: GPT-4 Architecture, Infrastructure, Training Dataset, Costs, Vision, MoE
See how to search.