| Feature | PPO | TRPO | DDPG | A2C (Advantage Actor-Critic) | SAC (Soft Actor-Critic) |
|---|---|---|---|---|---|
| Algorithm Type | On-policy | On-policy | Off-policy | On-policy | Off-policy |
| Core Idea | Clipped surrogate objective | Trust region constraint (KL divergence) | Actor-Critic + Q-learning (for continuous actions) | Synchronous advantage estimation | Maximum entropy (exploration) + off-policy |
| Stability | Very stable | Very stable | Can be unstable | Stable but can be sensitive to hyperparams | Very stable |
| Sample Efficiency | Moderate | Moderate | High (due to replay buffer) | Moderate (on-policy) | High (off-policy, replay buffer) |
| Complexity | Simple to implement | Complex (requires conjugate gradient) | Moderate to Complex | Complex (requires conjugate gradient) | Moderate to Complex |
| Action Space | Both discrete & continuous | Both discrete & continuous | Continuous only | Both discrete & continuous | Both discrete & continuous (SAC is especially good for continuous) |
| Computation | Multiple epochs per batch | Solves constrained optimization per update | Requires target networks | Simpler, single update per step | More complex, but very powerful |
| Use Case | General-purpose, good baseline | High-stakes environments where stability is critical | Continuous control (robotics, autonomous driving) | Simpler on-policy tasks | Complex continuous control, real-world robotics |
| Hyperparameter Sensitivity | Low | Low | High | Medium | Medium to High |
| Exploration | Through entropy or noise | Through entropy or noise | Through noise (e.g., Ornstein-Uhlenbeck) | Through entropy bonus | Intrinsic (via entropy maximization) |
Created
November 8, 2025 14:41
-
-
Save kardesyazilim/c491c87167ce30fb868d348aad199a8c to your computer and use it in GitHub Desktop.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment