- A Closer Look at Invalid Action Masking in Policy Gradient Algorithms https://costa.sh/blog-a-closer-look-at-invalid-action-masking-in-policy-gradient-algorithms.html 24 comments reinforcementlearning
- The 32 Implementation Details of Proximal Policy Optimization (PPO) Algorithm https://costa.sh/blog-the-32-implementation-details-of-ppo.html 9 comments reinforcementlearning
- Understanding why there isn't a log probability in TRPO and PPO's objective https://costa.sh/blog-understanding-why-there-isn't-a-log-probability-in-trpo-and-ppo's-objective.html 10 comments reinforcementlearning