Ian Osband

1. Deep Exploration by Randomized Value Functions

https://www.talkrl.com/episodes/ian-osband

Working on Decision Making under uncertainity in RL.

His major contributions are in:
- Epistemic Uncertainity ¹
- Bootstrapped DQN
- Randomized value functions
- bsuit benchmark
The major challenge in RL is exploration and dealing with delayed consequences. And so handling uncertainity, especially epistemic uncertainity (in contrast with aleatoric uncertainity), is crucial.
Bayesian methods for handling uncertainity are principled but are computationally expensive. In constrast deep learning approaches scale well but often have poor uncertainty estimates. So, bridging the gap between these two has potential.
Information theory provides an elegant frameworks for handling uncertainty in RL. See paper Reinforcement Learning Bit by Bit. It introduces the following key concepts:
- Environment Proxy
- Learning Target
- Information Ratio (information gain vs cost in reward)

How different networks deal with uncertainty:

A normal network gives marginal prediction

i.e. it gives the probability that the input is of this class is sth percentage.
- When using such networks as in Greedy Q-Learning: where a point prediction Q network with some randomness added to actions is used, it is a very crude way to explore. This is called dithering approach (using randomness for exploration).
- Such crude/unprincipled way to explore aren't good at some problems: Take example of Deep Sea problem which is a grid of NxN where only one grid has reward. If you use any dithering approach (i.e. sampling action randomly) then it would take 2^N episodes to learn any thing.
  
  BSuite (Behaviour Suite) is a curated toy problems that are like Deep Sea. Which is used test algorithms for how deep exploration they can do.
Bayesian Networks are better at some metrics but they don't scale as much as normal NN.

So, training an ensemble is a tradeoff that give us similar effect as bayesain network with less compute. And they scale. But still the compute cost is high.
Epi-Nets: We need an approach that costs slightly more than single network but gives results as better as bayesian network. Epi-Nets are one answer to this problem.

Add some layers at the end that give you joint predictions instead of marginal predictions. This way we use a network that tries to approximate bayes optimal solution, into giving us joint prediction which allow us make better bayesian results.

Joint Predictive Distributions:

Introduced in paper "Predictions to Decisions, the Importance of Joint Predictive Distributions"
It involves making multiple predictions simulatenously and thus exploiting the dependences between those predictions to separate out epistemic and aleatoric uncertainty.

i.e. allows us to make the distinction between what you know, vs what is chance
Epistemic Neural Networks allow us to do Joint Predictions
- Bayesian Neural Networks are also Epistemic NN
- Epi-Nets is an specific type of Epistemic Neural Network
- We can use Epi-nets to do Thompson sampling in RL.
  
  We could do Thompson sampling in bandits but in RL it is infeasible.

1. Deep Exploration by Randomized Value Functions

Thesis defence video:
Thesis presentation: https://docs.google.com/presentation/d/1lis0yBGT-uIXnAsi0vlP3SuWD2svMErJWy_LYtfzMOA/
Thesis: https://stacks.stanford.edu/file/rp457qc7612/iosband_thesis-augmented.pdf

Paper: https://arxiv.org/abs/1703.07608

Footnotes:

Epistemic Uncertainity is the uncertainity due to lack of knowledge, in contrast with Aliatoric Uncertainity (Aliator is latin for dice player) which is due to inherent randomness

Ian Osband

Table of Contents

1. Deep Exploration by Randomized Value Functions

Footnotes: