site:costa.sh - discu.eu

A Closer Look at Invalid Action Masking in Policy Gradient Algorithms https://costa.sh/blog-a-closer-look-at-invalid-action-masking-in-policy-gradient-algorithms.html 24 comments 1/7/2020 reinforcementlearning

The 32 Implementation Details of Proximal Policy Optimization (PPO) Algorithm https://costa.sh/blog-the-32-implementation-details-of-ppo.html 9 comments 11/6/2020 reinforcementlearning

Understanding why there isn't a log probability in TRPO and PPO's objective https://costa.sh/blog-understanding-why-there-isn't-a-log-probability-in-trpo-and-ppo's-objective.html 10 comments 12/4/2020 reinforcementlearning