Linking pages
- 𝜏-bench: Benchmarking AI Agents for the Real-World | Sierra https://sierra.ai/blog/benchmarking-ai-agents 0 comments
- Language Agents: From Reasoning to Acting - Latent Space https://www.latent.space/p/shunyu 0 comments
- The new Claude 3.5 Sonnet, Computer Use, and Building SOTA Agents — with Erik Schluntz, Anthropic https://www.latent.space/p/claude-sonnet 0 comments
Related searches:
Search whole site: site:arxiv.org
Search title: [2406.12045] $τ$-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains
See how to search.