Breakout AI — DQN with ε-greedy

How the AI works

Breakout is trained with a DQN. The agent moves the paddle to keep the ball alive and clear bricks, learning Q-values for each action from experience.

State, actions, reward

State: paddle position, ball position and velocity.
Actions: move the paddle left, right, or stay.
Reward: positive for breaking bricks, negative for losing the ball.

Exploration vs exploitation

An ε-greedy policy starts almost random (high ε) and gradually exploits the learned policy as ε decays — balancing trying new moves against using what works.

What you see on screen

The epsilon-decay curve and live Q-value bars show the shift from exploring to exploiting as the agent gets good at clearing the wall.

How the AI works

State, actions, reward

Exploration vs exploitation

What you see on screen

Need an AI engineer or data scientist?