All Projects
Reinforcement Learning

Advanced Game Playing — Deep RL

Double Dueling DQN + PER (SumTree). CartPole-v1 solved ep 300 (MA-100=441.1, best eval 497.2/500). LunarLander-v3 solved ep 207 (MA-100=202). 134,275-param network with LayerNorm.

Episode 300
CartPole solved
497.2 / 500
CartPole best eval
Episode 207
LunarLander solved
134,275
Network params
Dataset

CartPole-v1 + LunarLander-v3 (OpenAI Gymnasium)

Approach

Double + Dueling DQN + PER SumTree + soft target updates — all 4 improvements

Tech Stack
PythonPyTorch 2.10Gymnasium 1.2.0CUDANumPy
Keywords
Double DQNDueling DQNPERSumTreeCartPoleLunarLanderGymnasium
Visualizations6 Charts
Deep Dive

State-of-the-art Deep Q-Network combining all 4 modern DRL improvements.

Dueling DQN Architecture (134,275 params)

Input → Linear(256) → LayerNorm → ReLU
→ Value stream:     Linear(256→128) → ReLU → Linear(128→1)      = V(s)
→ Advantage stream: Linear(256→128) → ReLU → Linear(128→n_act)  = A(s,a)
→ Q(s,a) = V(s) + (A(s,a) − mean(A(s,a)))

4 Techniques Combined

TechniqueWhat It Fixes
Double DQNQ-target overestimation bias
Dueling DQNSeparate V(s) and A(s,a) estimation
PER (SumTree)Sample high-TD-error transitions more often
Soft target updates τ=0.005Stable Q-target convergence

Results

EnvironmentMetricValue
CartPole-v1Solved at episode300
CartPole-v1MA-100 reward441.1 / 500
CartPole-v1Best eval (20 ep)497.2 ± 12.2
LunarLander-v3Solved at episode207
LunarLander-v3MA-100 reward202 (threshold: 200)

PER SumTree Binary segment tree: O(log n) priority sampling and updates. β anneals 0.4→1.0 over training to correct importance-sampling bias.

Hyperparameters lr=1e-4, γ=0.99, τ=0.005, buffer=100K, batch=64, ε: 1.0→0.01