How the AI works
Breakout is trained with a DQN. The agent moves the paddle to keep the ball alive and clear bricks, learning Q-values for each action from experience.
State, actions, reward
- State: paddle position, ball position and velocity.
- Actions: move the paddle left, right, or stay.
- Reward: positive for breaking bricks, negative for losing the ball.
Exploration vs exploitation
An ε-greedy policy starts almost random (high ε) and gradually exploits the learned policy as ε decays — balancing trying new moves against using what works.
What you see on screen
The epsilon-decay curve and live Q-value bars show the shift from exploring to exploiting as the agent gets good at clearing the wall.