Distributional Reinforcement Learning
Notes from the book "Distributional Reinforcement Learning" by Marc G. Bellemare, Will Dabney and Mark Rowland [pdf] [direct.mit.edu]
Reward - My thoughts #
- Reward is central
- When there is very sparse reward, most of the learning has to be done in the absence of reward
- i.e. it is understanding the world and the mechanics, exploring the world, …
- in other world, the objective is to optimize for being able to optimize any reward function we get later on
Probability Metrics #
- When the value function was a scalar function measuring error was just absolute difference
But with probability distribution, we need a distance (probability metric) to quantify difference in learned reward distribution and actual distribution.
Some probability metric are better than other for distributional reinforcement learning, but none is a "natural" metric.
Choices #
- Probability metric
- Representation of Distribution:
- Categorical
- Quantile
- Loss function and Bootstrapping
- Behaviour: Optimize expected reward or also be risk-sensitive
Martingale: #
- Is a betting strategy popular in 18th century
- Strategy: Double the ante until a profit is made
The total winnings at any time \(t\) is given by:
\begin{align} X_t = \begin{cases} 2^t - 1 & \text{with probability } 2^{-t} \\ -1 & \text{with probability } 1 - 2^{-t} \end{cases} \end{align}- Show gambler is assured to eventually make profit (as \(2^{-t} \to 0 \text{ as } t \to \infty\))
- But the expected gain is zero \(\mathbb{E}[X_t] = 0\)
- So this is not a good strategy.
Role of RL #
- RL provides a computation model of how an agent learns
- to predit the reward (value estimation)
- and then using those predictions how it might best control its environment
Continue from Chapter 4.