Skip to content

Instantly share code, notes, and snippets.

@qgallouedec
Created May 9, 2020 17:41
Show Gist options
  • Select an option

  • Save qgallouedec/4e3f6b80bc3b84cef95a9414db71d2a0 to your computer and use it in GitHub Desktop.

Select an option

Save qgallouedec/4e3f6b80bc3b84cef95a9414db71d2a0 to your computer and use it in GitHub Desktop.
for t in range(len(states)):
# Compute the dicounted reward Gt from time t
# Gt = rewards[t] + gamma*Gt
Gt = compute_gain(rewards, t, gamma)
# \delta_t = G_t - Q(S_t, A_t)
delta_t = Gt - Q[states[t]][actions[t]]
# Add pair state-action to the counter
N[states[t]][actions[t]] += 1
# Add delta_t to the current value function
# Q(S_t, A_t) += \frac{\delta_t}{N(S_t, A_t)}
Q[states[t]][actions[t]] += delta_t/N[states[t]][actions[t]]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment