- 100k H100 Clusters: Power, Network Topology, Ethernet vs InfiniBand, Reliability, Failures, Checkpointing https://www.semianalysis.com/p/100000-h100-clusters-power-network 2 comments hardware
Linking pages
- $2 H100s: How the GPU Bubble Burst - by Eugene Cheah https://www.latent.space/p/gpu-bubble 289 comments
- Calculating the Cost of a Google Deepmind Paper - 152334H https://152334h.github.io/blog/scaling-exponents/ 164 comments
- Multi-Datacenter Training: OpenAI's Ambitious Plan To Beat Google's Infrastructure https://www.semianalysis.com/p/multi-datacenter-training-openais 1 comment
- GB200 Hardware Architecture - Component Supply Chain & BOM https://www.semianalysis.com/p/gb200-hardware-architecture-and-component 0 comments
- Datacenter Anatomy Part 1: Electrical Systems https://www.semianalysis.com/p/datacenter-anatomy-part-1-electrical 0 comments
- The Future of Compute: NVIDIA's Crown is Slipping https://mohitdagarwal.substack.com/p/from-dominance-to-dilemma-nvidia 0 comments
- Meta’s Next Llama AI Models Are Training on a GPU Cluster ‘Bigger Than Anything’ Else | WIRED https://www.wired.com/story/meta-llama-ai-gpu-training/ 0 comments
Linked pages
- GPT-4 Architecture, Infrastructure, Training Dataset, Costs, Vision, MoE https://www.semianalysis.com/p/gpt-4-architecture-infrastructure 10 comments
- https://openai.com/index/openai-board-forms-safety-and-security-committee/ 3 comments
- Strangely, Matrix Multiplications on GPUs Run Faster When Given "Predictable" Data! [short] https://www.thonking.ai/p/strangely-matrix-multiplications 2 comments
- Nvidia’s InfiniBand Problem - Spectrum-X AI Fabric, Tomahawk-5, Jericho-3AI, Quantum-2 https://www.semianalysis.com/p/nvidias-infiniband-problem-qmx-ai 0 comments
- Accelerating PyTorch Model Training https://magazine.sebastianraschka.com/p/accelerating-pytorch-model-training 0 comments
- AI Datacenter Energy Dilemma - Race for AI Datacenter Space https://www.semianalysis.com/p/ai-datacenter-energy-dilemma-race 0 comments
- Nvidia’s Optical Boogeyman – NVL72, Infiniband Scale Out, 800G & 1.6T Ramp https://www.semianalysis.com/p/nvidias-optical-boogeyman-nvl72-infiniband 0 comments
- Nvidia Blackwell Perf TCO Analysis - B100 vs B200 vs GB200NVL72 https://www.semianalysis.com/p/nvidia-blackwell-perf-tco-analysis 0 comments
Related searches:
Search whole site: site:www.semianalysis.com
Search title: 100k H100 Clusters: Power, Network Topology, Ethernet vs InfiniBand, Reliability, Failures, Checkpointing
See how to search.