SWE-Bench Verified - discu.eu

Hacker News

SWE-Bench Verified https://openai.com/index/introducing-swe-bench-verified/ 10 comments 13/8/2024

Linking pages

Cutting-edge Chinese “reasoning” model rivals OpenAI o1—and it’s free to download - Ars Technica https://arstechnica.com/ai/2025/01/china-is-catching-up-with-americas-best-reasoning-ai-models/ 139 comments
The 2025 AI Engineering Reading List - Latent Space https://www.latent.space/p/2025-papers 69 comments
DeepSeek-V3 Technical Report https://arxiv.org/html/2412.19437v1 42 comments
DeepSeek V3 and the cost of frontier AI models https://www.interconnects.ai/p/deepseek-v3-and-the-actual-cost-of 9 comments
Is finetuning GPT4o worth it? - Latent Space https://www.latent.space/p/cosine 0 comments
Gru.ai Ranks First in OpenAI’s Latest SWE-Bench Verified Evaluation https://gru.ai/blog/Gru-Rank-First/ 0 comments
The new Claude 3.5 Sonnet, Computer Use, and Building SOTA Agents — with Erik Schluntz, Anthropic https://www.latent.space/p/claude-sonnet 0 comments
AI #97: 4 - by Zvi Mowshowitz - Don't Worry About the Vase https://thezvi.substack.com/p/ai-97-4 0 comments
Scaling Laws for LLMs: From GPT-3 to o3 https://cameronrwolfe.substack.com/p/llm-scaling-laws 0 comments
Modal Sandboxes are generally available | Modal Blog https://modal.com/blog/sandbox-launch 0 comments
Demystifying Reasoning Models - by Cameron R. Wolfe, Ph.D. https://cameronrwolfe.substack.com/p/demystifying-reasoning-models 0 comments
Introducing Docent | Transluce AI https://transluce.org/introducing-docent 0 comments
Google’s digging a moat - by Omar - Kilo Code blog https://blog.kilocode.ai/p/googles-digging-a-moat 0 comments
GitHub - THUDM/GLM-4: GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型 https://github.com/THUDM/GLM-4 0 comments

Related searches:

Search whole site: site:openai.com

Search title: SWE-Bench Verified

See how to search.

Submit link to: