The Four Steps
1. Selection
Traverse the tree from root, choosing children using UCB1:
UCB1 = win_rate + C * sqrt(ln(parent_visits) / node_visits)
C is the exploration constant (typically √2).
2. Expansion
If the node is not terminal, add one or more child nodes.
3. Simulation (Rollout)
Play random moves until game over. Return win/loss/draw.
4. Backpropagation
Update win counts and visit counts all the way up to root.
Implementation for 2048
class MCTSNode:
def __init__(self, state, parent=None, action=None):
self.state = state
self.parent = parent
self.action = action
self.children = []
self.wins = 0
self.visits = 0
@property
def ucb1(self):
if self.visits == 0: return float('inf')
return self.wins/self.visits + 1.414*np.sqrt(np.log(self.parent.visits)/self.visits)