-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating

Python Deep Learning
By :

All RL algorithms we discussed until now have tried to learn the state- or action-value functions. For example, in Q-learning we usually follow an ε-greedy policy, which has no parameters (OK, it has one parameter) and relies on the value function instead. In this section, we'll discuss something new: how to approximate the policy itself with the help of policy gradient methods. We'll follow a similar approach as in Chapter 8, Reinforcement Learning Theory, in the Value function approximation section.
There, we introduced a value approximation function, which is described by a set of parameters w (neural net weights). Here, we'll introduce a parameterized policy
, which is described by a set of parameters θ. As with value function approximation, θcould be the weights of a neural network.
Recall that we use the