Monte Carlo Tree Search: The Algorithm Behind AlphaGo

The Four Steps

1. Selection

Traverse the tree from root, choosing children using UCB1:

UCB1 = win_rate + C * sqrt(ln(parent_visits) / node_visits)

C is the exploration constant (typically √2).

2. Expansion

If the node is not terminal, add one or more child nodes.

3. Simulation (Rollout)

Play random moves until game over. Return win/loss/draw.

4. Backpropagation

Update win counts and visit counts all the way up to root.

Implementation for 2048

class MCTSNode:
    def __init__(self, state, parent=None, action=None):
        self.state = state
        self.parent = parent
        self.action = action
        self.children = []
        self.wins = 0
        self.visits = 0
    
    @property
    def ucb1(self):
        if self.visits == 0: return float('inf')
        return self.wins/self.visits + 1.414*np.sqrt(np.log(self.parent.visits)/self.visits)