Distributional Reinforcement Learning

Notes from the book "Distributional Reinforcement Learning" by Marc G. Bellemare, Will Dabney and Mark Rowland [pdf] [direct.mit.edu]

Reward - My thoughts #

Reward is central
When there is very sparse reward, most of the learning has to be done in the absence of reward
i.e. it is understanding the world and the mechanics, exploring the world, …
in other world, the objective is to optimize for being able to optimize any reward function we get later on

Probability Metrics #

When the value function was a scalar function measuring error was just absolute difference
But with probability distribution, we need a distance (probability metric) to quantify difference in learned reward distribution and actual distribution.

Some probability metric are better than other for distributional reinforcement learning, but none is a "natural" metric.

Choices #

Martingale: #

Is a betting strategy popular in 18th century
Strategy: Double the ante until a profit is made
The total winnings at any time \(t\) is given by:
\begin{align} X_t = \begin{cases} 2^t - 1 & \text{with probability } 2^{-t} \\ -1 & \text{with probability } 1 - 2^{-t} \end{cases} \end{align}
Show gambler is assured to eventually make profit (as \(2^{-t} \to 0 \text{ as } t \to \infty\))
But the expected gain is zero \(\mathbb{E}[X_t] = 0\)
So this is not a good strategy.

Role of RL #

RL provides a computation model of how an agent learns
- to predit the reward (value estimation)
- and then using those predictions how it might best control its environment

Continue from Chapter 4.