Step 2 : Implement our components

In this step, we'll add a bit of logic to our components. The bootstrap created a few python scripts for us:

  • player.py - the implementation of the player agent
  • env.py - the implementation of the environment
  • client.py - a simple, python-based basic interface for our human player

Now, we just need to fill in the blanks for implementing our project.

Implementation of the Agent

In a real project this is were you would implement your ML-based inference, but for now, let’s just use as an agent a bot which randomly selects a value. Update the agent as follows -

player.py

import cog_settings
from data_pb2 import PlayerAction, ROCK, PAPER, SCISSOR
import random

from cogment import Agent, GrpcServer

class Player(Agent):
    VERSIONS = {"player": "1.0.0"}
    actor_class = cog_settings.actor_classes.player

    def decide(self, observation):
        choices = [ROCK, PAPER, SCISSOR]
        decision = random.choice(choices)

        action = PlayerAction()
        action.decision = decision

        print(f"Player decide {action.decision}")
        return action

    def reward(self, reward):
        print("Player reward")

    def end(self):
        print("Player end")


if __name__ == '__main__':
    server = GrpcServer(Player, cog_settings)
    server.serve()

Restart the player by running docker-compose restart player

All agents must implement a decide method which takes an input from the corresponding actor class’ observation space, Observation, and produces an action from the matching action space, PlayerAction.

Implementation of the Environment

Update the environment implementation as follows:

env.py

import cog_settings
from data_pb2 import Observation, ROCK, PAPER, SCISSOR

from cogment import Environment, GrpcServer


class Env(Environment):
    VERSIONS = {"env": "1.0.0"}

    def start(self, config):
        print("environment starting")
        self.observation = Observation()
        self.observation.p1_score = 0
        self.observation.p2_score = 0
        return self.observation

    def update(self, actions):
        print("environment updating")

        p1_decision = actions.player[0].decision
        p2_decision = actions.player[1].decision

        print(f"p1 played {p1_decision} - p2 played {p2_decision}")

        if p1_decision == p2_decision:
            pass
        elif ((p1_decision == ROCK and p2_decision == SCISSOR) or
              (p1_decision == PAPER and p2_decision == ROCK) or
              (p1_decision == SCISSOR and p2_decision == PAPER)):
            self.observation.p1_score += 1
        else:
            self.observation.p2_score += 1

        print(f"p1 score {self.observation.p1_score} - p2 score {self.observation.p2_score}")

        return self.observation

    def end(self):
        print("environment end")


if __name__ == "__main__":
    server = GrpcServer(Env, cog_settings)
    server.serve()

Restart the environment by running docker-compose restart env

All environments must implement an update method which takes a list of actions ordered by actor_class and returns a new observation.

Implementation of the client

When dealing with a human / AI interaction training project, a user interface is required for the human(s) to interact with the environment. Since RPS is a simple game, a simple python based command line frontend can be used.

client.py

import cog_settings
import cmd

from data_pb2 import PlayerAction, ROCK, PAPER, SCISSOR
from cogment.client import Connection


class RPSShell(cmd.Cmd):
    intro = "let's play some rock paper scissors with an AI agent! Type 'help' to display commands "
    prompt = '(RPS)'

    def __init__(self):
        super().__init__()
        connection = Connection(cog_settings, "orchestrator:9000")
        self.trial = connection.start_trial(cog_settings.actor_classes.player)
        self.p1_score = 0
        self.p2_score = 0

    def do_rock(self, arg):
        'Play rock'
        self.update(ROCK)

    def do_paper(self, arg):
        'Play paper'
        self.update(PAPER)

    def do_scissors(self, arg):
        'Play scissors'
        self.update(SCISSOR)

    def do_quit(self, arg):
        'Quit the game'
        self.trial.end()
        return True

    def update(self, choice):
        observation = self.trial.do_action(PlayerAction(decision=choice))

        decision = "it's a draw"
        if observation.p1_score > self.p1_score:
            decision = "AI won"
            self.p1_score = observation.p1_score
        elif observation.p2_score > self.p2_score:
            decision = "You won"
            self.p2_score = observation.p2_score

        print(decision)
        print(f"current score - you: {self.p2_score} - ai: {self.p1_score}")


if __name__ == '__main__':
    RPSShell().cmdloop()

This client update method uses, on the trial instance, do_action, which sends the action to the orchestrator and returns the updated observation.

We now have everything we need in order to start using the client and play RPS against our bot. But, before we can do so we just need to restart both the environment and the player services, as well as the orchestrator. Go back to the corresponding terminal windows, interrupt them if they are still running, and relaunch (you can relaunch them in any order, it doesn’t matter).

Play RPS against our bot agent

Since our client is expecting inputs, we use docker-compose run instead of docker-compose up.

In a new terminal, start the client.

$ docker-compose run client
let's play some rock paper scissors with an AI agent! Type 'help' to display commands
(RPS)help

Documented commands (type help <topic>):
========================================
help  paper  quit  rock  scissors

(RPS)rock
it's a draw
current score - you: 0 - ai: 0
(RPS)scissors
You won
current score - you: 1 - ai: 0
(RPS)paper
it's a draw
current score - you: 1 - ai: 0
(RPS)paper
AI won
current score - you: 1 - ai: 1
(RPS)

help gives you the available commands, which are quit, rock, paper, and you guessed it, scissors.

Everything runs, but we can do much more. Let’s go to step 3!