How the AI works
Pong uses DQN with self-play. Two agents compete; each one trains against a frozen checkpoint of its opponent, so the difficulty scales up automatically as both improve.
State, actions, reward
- State: paddle and ball positions and the ball's velocity.
- Actions: move the paddle up, down, or stay.
- Reward: +1 for scoring, -1 for conceding.
Why self-play matters
Against a fixed opponent an agent can overfit. Self-play creates an ever-improving curriculum: as one side gets better, the other must too, pushing both toward strong, general play.
What you see on screen
You watch two learned policies rally against each other — no hand-coded paddle AI, just two networks that taught themselves the game.