2025-07-15

DQN

Table of Contents

1. Rainbow DQN

Paper: Rainbow: Combining Improvements in Deep Reinforcement Learning [pdf]

Various extensions to DQN have been proposed and implemented. This papers combines 6 of them and does ablation study.

1.1. Extensions

  1. Double Q-Learning: Separate the best action selection with q value estimation.

    \[ \left( R_{t+1} + \gamma_{t+1} q_{\bar{\theta}} (S_{t+1}, \mathop{argmax}_{a'} q_{\theta}(S_{t+1}, a)) - q_{\theta}(S_{t}, A_t) \right)^2 \]

    where,

    • \(q_{\theta}\) is the online network being optimized
    • \(q_{\bar{\theta}}\) is the target network (a periodic copy of online network)
  2. Prioritized replay: Sample transitions with probablity proportional 1 to training loss value
  3. Dueling Network: Express Q value as interms of value function and advantage function
  4. Multi-Step Learning: Train using n-step returns
  5. Distributional RL: Keep track of distribution of returns instead of just average return of action (i.e. Q value)

    Learning objective is to match the distribution of returns.

  6. Noisy Nets: Incorporate noise in the Q network. So \(\epsilon\) - greedy exploration is not required. The noise leads to state-conditional exploration and later the networks learns to ignore the noise allowing a form of self-annealing.

1.2. Results

Rainbow DQN gave very good results. It learned with far fewer samples and lead to much higher final reward too.

Ablation showed the following techniques contributed (in order of priority):

  • Prioritized Replay: Removing this component caused large drop in performance across all games.
  • Multi-Step Learning: Removing this component caused large drop in performance across all games.
  • Distributional Q-learning: Didn't matter much initially but for later stages of learning near or above human level, the performance lags behind without Distributional Q-Learning
  • Noisy Nets: Many games benefited by this techniques but some were negatively affected.

Other techniques didn't change the results much.

  • Dueling : For some games their impact was significant. But median performance across games didn't change much.
  • Double Q-Learning: Distributional RL was clipped at -10 to 10 range of rewards which nullified the effect of overestimation. So, double Q-learning wasn't much necessary. Perhaps at higher clipping ranges, it too would be important.

Footnotes:

1

Proportional to the loss value raised to a power \(\omega\) (hyperparameter).


Backlinks


You can send your feedback, queries here