Created
May 9, 2020 17:41
-
-
Save qgallouedec/4e3f6b80bc3b84cef95a9414db71d2a0 to your computer and use it in GitHub Desktop.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| for t in range(len(states)): | |
| # Compute the dicounted reward Gt from time t | |
| # Gt = rewards[t] + gamma*Gt | |
| Gt = compute_gain(rewards, t, gamma) | |
| # \delta_t = G_t - Q(S_t, A_t) | |
| delta_t = Gt - Q[states[t]][actions[t]] | |
| # Add pair state-action to the counter | |
| N[states[t]][actions[t]] += 1 | |
| # Add delta_t to the current value function | |
| # Q(S_t, A_t) += \frac{\delta_t}{N(S_t, A_t)} | |
| Q[states[t]][actions[t]] += delta_t/N[states[t]][actions[t]] |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment