- Confusion of hyperparameters in ppo https://arxiv.org/abs/1707.06347 3 comments reinforcementlearning
Linking pages
- Competitive Self-Play https://blog.openai.com/competitive-self-play/ 138 comments
- MLGO: A Machine Learning Framework for Compiler Optimization – Google AI Blog http://ai.googleblog.com/2022/07/mlgo-machine-learning-framework-for.html 81 comments
- Finetuning Large Language Models - by Sebastian Raschka https://magazine.sebastianraschka.com/p/finetuning-large-language-models 72 comments
- Understanding Large Language Models - by Sebastian Raschka https://magazine.sebastianraschka.com/p/understanding-large-language-models 53 comments
- Reinforcement Learning with Prediction-Based Rewards https://blog.openai.com/reinforcement-learning-with-prediction-based-rewards/ 38 comments
- GitHub - andri27-ts/Reinforcement-Learning: Learn Deep Reinforcement Learning in 60 days! Lectures & Code in Python. Reinforcement Learning + Deep Learning https://github.com/andri27-ts/60_Days_RL_Challenge 22 comments
- GitHub - google-research/seed_rl: SEED RL: Scalable and Efficient Deep-RL with Accelerated Central Inference. Implements IMPALA and R2D2 algorithms in TF2 with SEED's architecture. https://github.com/google-research/seed_rl 20 comments
- GitHub - higgsfield/RL-Adventure-2: PyTorch0.4 implementation of: actor critic / proximal policy optimization / acer / ddpg / twin dueling ddpg / soft actor critic / generative adversarial imitation learning / hindsight experience replay https://github.com/higgsfield/RL-Adventure-2 20 comments
- GitHub - lcswillems/torch-ac: Recurrent and multi-process PyTorch implementation of deep reinforcement Actor-Critic algorithms A2C and PPO https://github.com/lcswillems/torch-ac 15 comments
- LLM Training: RLHF and Its Alternatives https://magazine.sebastianraschka.com/p/llm-training-rlhf-and-its-alternatives 14 comments
- Speeding Up Reinforcement Learning with a New Physics Simulation Engine – Google AI Blog https://ai.googleblog.com/2021/07/speeding-up-reinforcement-learning-with.html 13 comments
- Introducing SafeLife: Safety Benchmarks for Reinforcement Learning - Partnership on AI https://www.partnershiponai.org/safelife 12 comments
- baselines/baselines/ppo2 at master · openai/baselines · GitHub https://github.com/openai/baselines/tree/master/baselines/ppo2 12 comments
- Reinforcement Learning (PPO) with TorchRL Tutorial — torchrl main documentation https://pytorch.org/rl/tutorials/coding_ppo.html 11 comments
- GitHub - marload/DeepRL-TensorFlow2: 🐋 Simple implementations of various popular Deep Reinforcement Learning algorithms using TensorFlow2 https://github.com/marload/deep-rl-tf2 10 comments
- RAdam: A New State-of-the-Art Optimizer for RL? | by Chris Nota | Autonomous Learning Library | Medium https://medium.com/autonomous-learning-library/radam-a-new-state-of-the-art-optimizer-for-rl-442c1e830564 10 comments
- An autonomous laboratory for the accelerated synthesis of novel materials | Nature https://www.nature.com/articles/s41586-023-06734-w 10 comments
- GitHub - mlabonne/llm-course: Course to get into Large Language Models (LLMs) with roadmaps and Colab notebooks. https://github.com/mlabonne/llm-course 10 comments
- GitHub - keiohta/tf2rl: TensorFlow2 Reinforcement Learning https://github.com/keiohta/tf2rl 9 comments
- The 32 Implementation Details of Proximal Policy Optimization (PPO) Algorithm https://costa.sh/blog-the-32-implementation-details-of-ppo.html 9 comments
Related searches:
Search whole site: site:arxiv.org
Search title: Confusion of hyperparameters in ppo
See how to search.