Hacker News
Linked pages
- https://openai.com/index/learning-to-reason-with-llms/ 1525 comments
- [1712.01815] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm https://arxiv.org/abs/1712.01815 573 comments
- [2403.09629] Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking https://arxiv.org/abs/2403.09629 271 comments
- [2409.12917] Training Language Models to Self-Correct via Reinforcement Learning https://arxiv.org/abs/2409.12917 93 comments
- [2410.09918] Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces https://arxiv.org/abs/2410.09918 16 comments
- [2203.14465] STaR: Bootstrapping Reasoning With Reasoning https://arxiv.org/abs/2203.14465 5 comments
- [2312.06585] Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models https://arxiv.org/abs/2312.06585 5 comments
- [2410.08146] Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning https://arxiv.org/abs/2410.08146 5 comments
- [2112.00114] Show Your Work: Scratchpads for Intermediate Computation with Language Models https://arxiv.org/abs/2112.00114 3 comments
- [2112.09332] WebGPT: Browser-assisted question-answering with human feedback https://arxiv.org/abs/2112.09332 3 comments
- [2305.10601] Tree of Thoughts: Deliberate Problem Solving with Large Language Models https://arxiv.org/abs/2305.10601 3 comments
- [2305.20050] Let's Verify Step by Step https://arxiv.org/abs/2305.20050 3 comments
- [2407.13692] Prover-Verifier Games improve legibility of LLM outputs https://arxiv.org/abs/2407.13692 3 comments
- [2410.10630] Thinking LLMs: General Instruction Following with Thought Generation https://arxiv.org/abs/2410.10630 2 comments
- [2404.03683] Stream of Search (SoS): Learning to Search in Language https://arxiv.org/abs/2404.03683 1 comment
- [2404.17546] Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo https://arxiv.org/abs/2404.17546 1 comment
- [2408.03314] Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters https://arxiv.org/abs/2408.03314 1 comment
- [2211.14275] Solving math word problems with process- and outcome-based feedback https://arxiv.org/abs/2211.14275#deepmind 0 comments
- [2406.16838] From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models https://arxiv.org/abs/2406.16838 0 comments
- [2407.21787] Large Language Monkeys: Scaling Inference Compute with Repeated Sampling https://arxiv.org/abs/2407.21787 0 comments
Related searches:
Search whole site: site:github.com
Search title: GitHub - srush/awesome-o1: A bibliography and survey of the papers surrounding o1
See how to search.