Hacker News
Linked pages
- Humanity's Last Exam https://agi.safe.ai/ 40 comments
- https://lmarena.ai/ 18 comments
- [2107.03374] Evaluating Large Language Models Trained on Code https://arxiv.org/abs/2107.03374 8 comments
- [2210.03629] ReAct: Synergizing Reasoning and Acting in Language Models https://arxiv.org/abs/2210.03629#google 3 comments
- [2009.03300] Measuring Massive Multitask Language Understanding https://arxiv.org/abs/2009.03300 0 comments
- [2406.12045] $τ$-bench: A Benchmark for Tool-Agent-User Interaction in Real-World Domains https://arxiv.org/abs/2406.12045 0 comments
Related searches:
Search whole site: site:ysymyth.github.io
Search title: The Second Half – Shunyu Yao – 姚顺雨
See how to search.