Linking pages
- AI leaderboards are no longer useful. It's time to switch to Pareto curves. https://www.aisnakeoil.com/p/ai-leaderboards-are-no-longer-useful 14 comments
- GitHub - mrconter1/PullRequestBenchmark: Evaluating LLMs performance in PR reviews as an indicator for their capability in creating PRs. https://github.com/mrconter1/PullRequestBenchmark 3 comments
- Sholto Douglas & Trenton Bricken - How to Build & Understand GPT-7's Mind https://www.dwarkeshpatel.com/p/sholto-douglas-trenton-bricken 3 comments
- SWE-bench: Can Language Models Resolve Real-World GitHub Issues? | Princeton Language and Intelligence https://pli.princeton.edu/blog/2023/swe-bench-can-language-models-resolve-real-world-github-issues 1 comment
- GitHub - stitionai/devika: Devika is an Agentic AI Software Engineer that can understand high-level human instructions, break them down into steps, research relevant information, and write code to achieve the given objective. Devika aims to be a competitive open-source alternative to Devin by Cognition AI. https://github.com/stitionai/devika 1 comment
- We Can Beat Devin | Mender.AI https://mender.ai/blog/we-can-beat-devin/ 0 comments
- Introducing OpenDevin CodeAct 1.0, a new State-of-the-art in Coding Agents https://xwang.dev/blog/2024/opendevin-codeact-1.0-swebench/ 0 comments
- AI benchmarks should be like unit tests | Andy’s Notes https://andykonwinski.com/2024/05/08/ai-benchmarks-should-be-like-unit-tests.html 0 comments
Related searches:
Search whole site: site:www.swebench.com
Search title: SWE-bench
See how to search.