GitHub - srush/awesome-o1: A bibliography and survey of the papers surrounding o1

Linking pages

Demystifying Reasoning Models - by Cameron R. Wolfe, Ph.D. https://cameronrwolfe.substack.com/p/demystifying-reasoning-models 0 comments

Linked pages

https://openai.com/index/learning-to-reason-with-llms/ 1525 comments
[1712.01815] Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm https://arxiv.org/abs/1712.01815 573 comments
[2403.09629] Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking https://arxiv.org/abs/2403.09629 271 comments
[2409.12917] Training Language Models to Self-Correct via Reinforcement Learning https://arxiv.org/abs/2409.12917 93 comments
[2410.09918] Dualformer: Controllable Fast and Slow Thinking by Learning with Randomized Reasoning Traces https://arxiv.org/abs/2410.09918 16 comments
[2203.14465] STaR: Bootstrapping Reasoning With Reasoning https://arxiv.org/abs/2203.14465 5 comments
[2312.06585] Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models https://arxiv.org/abs/2312.06585 5 comments
[2410.08146] Rewarding Progress: Scaling Automated Process Verifiers for LLM Reasoning https://arxiv.org/abs/2410.08146 5 comments
https://www.youtube.com/watch?v=eaAonE58sLU 4 comments
[2112.00114] Show Your Work: Scratchpads for Intermediate Computation with Language Models https://arxiv.org/abs/2112.00114 3 comments
[2112.09332] WebGPT: Browser-assisted question-answering with human feedback https://arxiv.org/abs/2112.09332 3 comments
[2305.10601] Tree of Thoughts: Deliberate Problem Solving with Large Language Models https://arxiv.org/abs/2305.10601 3 comments
[2305.20050] Let's Verify Step by Step https://arxiv.org/abs/2305.20050 3 comments
[2407.13692] Prover-Verifier Games improve legibility of LLM outputs https://arxiv.org/abs/2407.13692 3 comments
[2410.10630] Thinking LLMs: General Instruction Following with Thought Generation https://arxiv.org/abs/2410.10630 2 comments
[2404.03683] Stream of Search (SoS): Learning to Search in Language https://arxiv.org/abs/2404.03683 1 comment
[2404.17546] Probabilistic Inference in Language Models via Twisted Sequential Monte Carlo https://arxiv.org/abs/2404.17546 1 comment
[2408.03314] Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters https://arxiv.org/abs/2408.03314 1 comment
[2211.14275] Solving math word problems with process- and outcome-based feedback https://arxiv.org/abs/2211.14275#deepmind 0 comments
[2406.16838] From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models https://arxiv.org/abs/2406.16838 0 comments