Why Tic Tac Toe + Minimax?
Before I go into motivations, let's recap on what Tic Tac Toe is:
Tic-tac-toe, noughts and crosses, or Xs and Os is a paper-and-pencil game for two players who take turns marking the spaces in a three-by-three grid, one with Xs and the other with Os.
It is a very small game: The state space is tiny, the action space is limited, and every outcome is zero-sum - one player's gain is the other's loss (or both draw). This makes it an ideal playground for exploring game-search algorithms.
This fits perfectly into why we'd want to use something like Minimax. So, what is Minimax?
Minimax is a decision rule used in game-playing agents that assume both players play optimally. The maximizing player tries to choose moves that maximize the outcome, while the minimizing player tries to push the outcome in the opposite direction. In simple words, Minimax tries to pick moves that perform best under the worst-case opponent response.
For Tic-Tac-Toe, this is exactly what we want: an agent that explores all possible moves in the future, and picks a move that ensures it never loses.
The model for Tic-Tac-Toe
For simplicity, I used a flat array to represent the board. Let's call it game. In code, this looks something like:
game = [0, 1, 2, 3, 4, 5, 6, 7, 8]
This translates to a pretty nice grid:
0 | 1 | 2
---------
3 | 4 | 5
---------
6 | 7 | 8
There are 8 possible win conditions (same marker (X/O) on any of these):
012,345,678036,147,258048,246
So, checking if a player has won is trivial: we just check if any of these conditions are satisfied.
Implementing the game
Now that we have a way to represent the board, the next step is to implement the game mechanics: making moves, checking whether a player has won, and undoing moves so the Minimax algorithm can simulate future states. These functions form the environment that the agent interacts with.
Small note: for our purposes, let's assume the game is a global state.
Making a move
To do this, we need to define a function that takes the desired move along with whose turn it is, and updates the board accordingly.
def make_move(move, turn):
if turn == 0:
game[int(move)] = 'X'
else:
game[int(move)] = 'O'
Checking for a win
As stated earlier, this is simply checking if any of the win conditions are satisified. In code, this looks something like:
def check_win():
# check all rows
for i in range(0, 9, 3):
if game[i] == game[i + 1] == game[i + 2]:
return True
# check all columns
for i in range(3):
if game[i] == game[i + 3] == game[i + 6]:
return True
# check all diagonals
if game[0] == game[4] == game[8]:
return True
if game[2] == game[4] == game[6]:
return True
return False
There is probably a more pythonic way to do this, but this works for now.
Undoing a move
Similar to making a move, we do the inverse of the move we just made. That involves updating the state back to its previous state, which would be its assigned index in the game array.
def undo_move(move):
game[int(move)] = str(move)
Undoing a move is important because the Minimax algorithm simulates future game states. Instead of copying the board each time, we temporarily apply a move, evaluate it, and then restore the board to its previous state.
Getting the list of available moves
While humans can visually see which cells are empty, the program needs an explicit list of legal moves. Minimax uses this list to decide which branches of the game tree to explore.
def get_available_moves():
return [i for i in game if i != 'X' and i != 'O']
So for example, if the board looks like this:
X | 1 | 2
---------
3 | 4 | 5
---------
6 | 7 | 8
The available moves would be [1, 2, 3, 4, 5, 6, 7, 8]
Teaching the agent to play: The Minimax algorithm
With the game enviornment defined, we now have everything we need to let the agent "think" about moves. Minimax treats the game as a search problem: for the current board state, it recursively explores all possible moves, assuming both players play optimally. At the end of each simulated game, we assign a score to the outcome (10 for win, -10 for loss, 0 for draw), and then work our way back up the tree to find the move that maximizes the score.
Implementing Minimax
The Minimax algorothm involves two functions:
evaluate()minimax(is_maximizing)
Evaluate
At the heart of the Minimax algorithm, lies the evaluate function. This is responsible of 'evaluating' the board state, and assigning a score to it, depending on whose turn it is:
- If the maximizing player, the agent (X), wins, return 10.
- If the minimzing player, the opponent (O), wins, return -10.
- If the game is a draw, return 0.
def evaluate():
# X is the agent
if game.check_win():
return 10 if game.winner == "X" else -10
# Draw, or game is not over
return 0
Pretty simple!
Minimax
This is the core of the Minimax algorithm. It is responsible for exploring the game tree, and finding the best move for the agent.
Note: Minimax doesn’t directly return the best move - it returns a score for each simulated move, and we choose the move with the highest score.
Since this is a recursive algorithm, it is imperative that we have a base case. This happens when:
- Someone has won the game
- The game is a draw
def minimax(self, game: TicTacToe, depth, is_maximizing):
# base case: if someone won, return the score.
# if there are no moves left, we have a draw, so return 0.
score = self.evaluate(game, depth)
moves = game.get_available_moves()
if score != 0 or not moves:
return score
Now comes the exploring part. Exploration relies on whether the agent is maximizing or minimizing. What this means is:
- If it is X's turn (maximizing), we choose the move that produces the highest score.
- If it is O's turn (minimizing), we choose the move that produces the lowest score.
This alternating behaviour is what allows Minimax to simulate two players who are both playing optimally.
After that, the implementation is just two symmetric loops. We iterate over every legal move, simulate it, recurse to get a score for the resulting position, then undo the move. The only difference is whether we keep the maximum score (X’s turn) or the minimum score (O’s turn):
# follows after the base case
if is_maximizing:
best = -math.inf
for move in moves:
make_move(move, 0) # X plays
best = max(best, minimax(game, depth + 1, False))
undo_move(move)
return best
else:
best = math.inf
for move in moves:
make_move(move, 1) # O plays
best = min(best, minimax(game, depth + 1, True))
undo_move(move)
return best
That’s the full Minimax recursion: on X’s turn we take the max score over all possible moves, and on O’s turn we take the min score.
A quick note on depth
You might be wondering what depth is. depth represents how many moves deep we are in the recursion (i.e. how far we are from the current board state). We include it so the agent prefers to finish the game sooner.
In practice, we slightly adjust the win/loss scores using depth:
- For a win: 10 - depth -> earlier wins score higher.
- For a loss: -10 + depth -> later losses are "less bad" than immediate ones.
This doesn’t change whether a position is winning or losing - it just acts as a tie-breaker so the agent plays more decisively.
Choosing the actual move
Minimax returns a score, not the move itself. So to pick a move, we try every legal move for X, run Minimax on it, and keep the one with the highest score:
def find_move(game):
best_val = -math.inf
best_move = None
for move in get_available_moves():
make_move(move, 0) # try X move
score = minimax(game, 0, False) # now it's O's turn
undo_move(move)
if score > best_val:
best_val = score
best_move = move
return best_move
At that point, the game loop is simple: X calls find_move(), O is the human input, and the board updates until check_win() or no moves remain.
Conclusion
At this point, we have everything: a game environment, a Minimax search that scores positions, and a find_move() function that turns those scores into an actual action.
Because Tic-Tac-Toe's state space is small, Minimax can explore the full game tree, which means this agent will never lose.
From here, the natural next step is adding alpha-beta pruning (for larger games) or building a nicer UI. I'm interested to see how I can use this knowledge to build something for Chess, for example.
Well, that's it for now. Thanks for reading!