Linking pages
- Groq Inference Tokenomics: Speed, But At What Cost? https://www.semianalysis.com/p/groq-inference-tokenomics-speed-but 1 comment
- The Four Wars of the AI Stack (Dec 2023 Recap) https://www.latent.space/p/dec-2023 0 comments
- The Four Wars of the AI Stack (Dec 2023 Recap) https://www.latent.space/i/140396949/mixtral-sparks-a-gpuinference-war 0 comments
- Cloud Intelligence at the speed of 5000 tok/s - with Ce Zhang and Vipul Ved Prakash of Together AI https://www.latent.space/p/together 0 comments
- Nvidia Blackwell Perf TCO Analysis - B100 vs B200 vs GB200NVL72 https://www.semianalysis.com/p/nvidia-blackwell-perf-tco-analysis 0 comments
- OpenAI Is Doomed - Et tu, Microsoft? https://www.semianalysis.com/p/openai-is-doomed-et-tu-microsoft 0 comments
- Chips all the way down https://press.airstreet.com/p/chips-all-the-way-down 0 comments
- A Deep Dive on AI Inference Startups - by Kevin Zhang https://eastwind.substack.com/p/a-deep-dive-on-ai-inference-startups?r=5j48v 0 comments
- A Deep Dive on AI Inference Startups - by Kevin Zhang https://eastwind.substack.com/p/a-deep-dive-on-ai-inference-startups 0 comments
Linked pages
- Competitive performance claims and industry leadin... - AMD Community https://community.amd.com/t5/instinct-accelerators/competitive-performance-claims-and-industry-leading-inference/ba-p/652304 82 comments
- ByteDance is secretly using OpenAI’s tech to build a competitor - The Verge https://www.theverge.com/2023/12/15/24003151/bytedance-china-openai-microsoft-competitor-llm 58 comments
- Achieving Top Inference Performance with the NVIDIA H100 Tensor Core GPU and NVIDIA TensorRT-LLM | NVIDIA Technical Blog https://developer.nvidia.com/blog/achieving-top-inference-performance-with-the-nvidia-h100-tensor-core-gpu-and-nvidia-tensorrt-llm/ 29 comments
- Google Cloud Platform https://console.cloud.google.com/freetrial/signup/tos 6 comments
- GitHub - pytorch-labs/gpt-fast: Simple and efficient pytorch-native transformer text generation in <1000 LOC of python. https://github.com/pytorch-labs/gpt-fast 1 comment
- https://docs.endpoints.anyscale.com/ 0 comments
- GPU Cloud Economics Explained – The Hidden Truth https://www.semianalysis.com/p/gpu-cloud-economics-explained-the 0 comments
- Announcing Together Inference Engine â the fastest inference available https://www.together.ai/blog/together-inference-engine-v1 0 comments
Related searches:
Search whole site: site:www.semianalysis.com
Search title: Inference Race To The Bottom - Make It Up On Volume?
See how to search.