Hacker News
- DeepSeek V3 and the cost of frontier AI models https://www.interconnects.ai/p/deepseek-v3-and-the-actual-cost-of 9 comments
Linking pages
- AI research team claims to reproduce DeepSeek core technologies for $30 — relatively small R1-Zero model has remarkable problem-solving abilities | Tom's Hardware https://www.tomshardware.com/tech-industry/artificial-intelligence/ai-research-team-claims-to-reproduce-deepseek-core-technologies-for-usd30-relatively-small-r1-zero-model-has-remarkable-problem-solving-abilities 278 comments
- Making the U.S. the home for open-source AI https://www.interconnects.ai/p/making-the-us-the-home-for-open-source 3 comments
- Why DeepSeek Could Change What Silicon Valley Believe About A.I. - The New York Times https://www.nytimes.com/2025/01/28/technology/why-deepseek-could-change-what-silicon-valley-believes-about-ai.html 1 comment
- Mixture-of-Experts (MoE) LLMs - by Cameron R. Wolfe, Ph.D. https://cameronrwolfe.substack.com/p/moe-llms 0 comments
- DeepSeek: The Greatest Growth Hack of All Times meets its David in a Chinese Quant. https://centreforaileadership.org/resources/deepseeks_narrative_attack/ 0 comments
- Who’s Winning the AI War: 2025 (DeepSeek?) Edition https://weightythoughts.com/p/whos-winning-the-ai-war-2025-deepseek 0 comments
Linked pages
- [2404.19737] Better & Faster Large Language Models via Multi-token Prediction https://arxiv.org/abs/2404.19737 132 comments
- Why DeepSeek's new AI model thinks it's ChatGPT | TechCrunch https://techcrunch.com/2024/12/27/why-deepseeks-new-ai-model-thinks-its-chatgpt/ 43 comments
- DeepSeek https://chat.deepseek.com/ 11 comments
- https://openai.com/index/introducing-swe-bench-verified/ 10 comments
- deepseek-ai/DeepSeek-V3 · Hugging Face https://huggingface.co/deepseek-ai/DeepSeek-V3 9 comments
- Introducing Llama 3.1: Our most capable models to date https://ai.meta.com/blog/meta-llama-3-1/ 7 comments
- DeepSeek-V3/LICENSE-MODEL at main · deepseek-ai/DeepSeek-V3 · GitHub https://github.com/deepseek-ai/DeepSeek-V3/blob/main/LICENSE-MODEL 6 comments
- Attention? Attention! | Lil'Log https://lilianweng.github.io/posts/2018-06-24-attention/ 2 comments
- Scaling Laws – O1 Pro Architecture, Reasoning Training Infrastructure, Orion and Claude 3.5 Opus “Failures” – SemiAnalysis https://semianalysis.com/2024/12/11/scaling-laws-o1-pro-architecture-reasoning-training-infrastructure-orion-and-claude-3-5-opus-failures/ 2 comments
- [1911.02150] Fast Transformer Decoding: One Write-Head is All You Need https://arxiv.org/abs/1911.02150 1 comment
- [2203.15556] Training Compute-Optimal Large Language Models https://arxiv.org/abs/2203.15556 0 comments
- [2305.13245] GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints https://arxiv.org/abs/2305.13245 0 comments
- OLMoE and the hidden simplicity in training better foundation models https://www.interconnects.ai/p/olmoe-and-building-better-llms 0 comments
- ð DeepSeek-R1-Lite-Preview is now live: unleashing supercharged reasoning power! | DeepSeek API Docs https://api-docs.deepseek.com/news/news1120 0 comments
- deepseek-ai/DeepSeek-V3-Base · Hugging Face https://huggingface.co/deepseek-ai/DeepSeek-V3-Base 0 comments
- [2405.04434] DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model https://arxiv.org/abs/2405.04434 0 comments
- [2412.19437] DeepSeek-V3 Technical Report https://arxiv.org/abs/2412.19437 0 comments
Would you like to stay up to date with Computer science? Checkout Computer science
Weekly.
Related searches:
Search whole site: site:www.interconnects.ai
Search title: DeepSeek V3 and the cost of frontier AI models
See how to search.