How the AI works
Snake is trained with a DQN (Deep Q-Network) — reinforcement learning. The agent learns a function Q(state, action) that estimates the future reward of each move, then picks the action with the highest value.
State, actions, reward
- State: danger in each direction, current heading, and the relative direction to the food.
- Actions: turn left, go straight, turn right.
- Reward: positive for eating, negative for dying, and a small shaping reward for moving toward the food.
How it learns
Experiences are stored in a replay buffer and sampled in mini-batches. An ε-greedy policy explores early and exploits later, while a target network stabilizes the updates.
What you see on screen
The Q-value chart updates every frame, so you can watch the agent's confidence in each action shift as it learns to chase food and avoid walls.