GAME AI & REINFORCEMENT LEARNING

Anwesha Sinha
4 min readDec 12, 2020

Machine Learning is basically divided into 3 parts.

Supervised Learning: It allows you to collect data or produce a data output from the previous experience. Ex: Regression Model, Random Forest.

Unsupervised Learning: It helps you to find all kind of unknown patterns in data. Ex: K Means clustering algorithm, KNN algorithm, Principle Component Analysis(PCA), Neural network.

Reinforcement Learning: It is concerned with how software agents ought to take actions in an environment in order to maximise the notion of cumulative reward. Here the agents are trained on a reward and punishment mechanism. The agent is rewarded for correct moves and punished for the wrong ones. In doing so, the agent tries to minimise wrong moves and maximise the right ones. As RL models learn by a continuous process of receiving rewards and punishments on every action taken, it is able to train systems to respond to unforeseen environments .

Reinforcement Learning Workflow:

Diagram to explain the Workflow of reinforcement learning
  1. Create the Environment:First you need to define the environment within which the agent operates, including the interface between agent and environment.
  2. Define the Reward: Its performance against the task goals and how this signal is calculated from the environment.
  3. Create the Agent: If you create the agent. The agent consists of the policy and the training algorithm. Example: neural network models or lookup table.
  4. Train and Validate the Agent: Further you need to train the agent when to start, pause, stop, move.
  5. Deploy the Policy: Deploy the trained policy representation using, for example, generated C/C++ or CUDA code.

Application of Reinforcement Learning: Robots, Game, Business Strategy Planning.

Lets understand the building of Game via Reinforcement Learning:

from learntools.core import binder
binder.bind(globals())
from learntools.game_ai.ex1 import *

After creating the environment. Let’s define the role of our agent.

The basic algorithm behind our agent here in the game is: It selects the winning move, if it is available. Otherwise, it should select a random move.

import numpy as np# Gets board at next step if agent drops piece in selected column
def drop_piece(grid, col, piece, config):
next_grid = grid.copy()
for row in range(config.rows-1, -1, -1):
if next_grid[row][col] == 0:
break
next_grid[row][col] = piece
return next_grid
# Returns True if dropping piece in column results in game win
def check_winning_move(obs, config, col, piece):
# Convert the board to a 2D grid
grid = np.asarray(obs.board).reshape(config.rows, config.columns)
next_grid = drop_piece(grid, col, piece, config)
# horizontal
for row in range(config.rows):
for col in range(config.columns-(config.inarow-1)):
window = list(next_grid[row,col:col+config.inarow])
if window.count(piece) == config.inarow:
return True
# vertical
for row in range(config.rows-(config.inarow-1)):
for col in range(config.columns):
window = list(next_grid[row:row+config.inarow,col])
if window.count(piece) == config.inarow:
return True
# positive diagonal
for row in range(config.rows-(config.inarow-1)):
for col in range(config.columns-(config.inarow-1)):
window = list(next_grid[range(row, row+config.inarow), range(col, col+config.inarow)])
if window.count(piece) == config.inarow:
return True
# negative diagonal
for row in range(config.inarow-1, config.rows):
for col in range(config.columns-(config.inarow-1)):
window = list(next_grid[range(row, row-config.inarow, -1), range(col, col+config.inarow)])
if window.count(piece) == config.inarow:
return True
return False

The check_winning_move() function takes four required arguments: the first two (obs and config) should be familiar, and:

  • col is any valid move
  • piece is either the agent's mark or the mark of its opponent.

The function returns True if dropping the piece in the provided column wins the game (for either the agent or its opponent), and otherwise returns False. To check if the agent can win in the next move, you should set piece=obs.mark.

To complete this exercise, you need to define agent_q1() in the code cell below. To do this, you're encouraged to use the check_winning_move() function.

The drop_piece() function is called in the check_winning_move() function.

import randomdef agent_q1(obs, config):
valid_moves = [col for col in range(config.columns) if obs.board[col] == 0]
# Your code here: Amend the agent!
for col in valid_moves:
if check_winning_move(obs, config, col, obs.mark):
return col
return random.choice(valid_moves)

Agent 2

def agent_q2(obs, config):
valid_moves = [col for col in range(config.columns) if obs.board[col] == 0]
for col in valid_moves:
if check_winning_move(obs, config, col, obs.mark):
return col
for col in valid_moves:
if check_winning_move(obs, config, col, obs.mark%2+1):
return col
return random.choice(valid_moves)

Play against an agent

def my_agent(obs, config):
valid_moves = [col for col in range(config.columns) if obs.board[col] == 0]
for col in valid_moves:
if check_winning_move(obs, config, col, obs.mark):
return col
for col in valid_moves:
if check_winning_move(obs, config, col, obs.mark%2+1):
return col
return random.choice(valid_moves)

Environment

from kaggle_environments import evaluate, make, utilsenv = make("connectx", debug=True)
env.play([my_agent, None], width=500, height=450)
Result of the Game When You Play With It: Resource Kaggle

--

--

Anwesha Sinha

Software Engineer @ YellowMessenger |Gold Medalist @JKLU | EX-Business Analyst @ Jk lakshmiCement