Step 2 : Implement our components

In this step, we'll add a bit of logic to our components. The bootstrap created a few python scripts for us:

  • main.py, in the agents/agent directory - the implementation of the player agent
  • main.py, in the envs directory - the implementation of the environment
  • main.py, in the clients directory - a simple, python-based basic interface for our human player

Now, we just need to fill in the blanks for implementing our project.

Implementation of the Agent

In a real project this is were you would implement your ML-based inference, but for now, let’s just use as a bot based agent which randomly selects a value. Update the agents/agent/main.py agent as follows -

agents/agent/main.py

import cog_settings
from data_pb2 import AgentAction, ROCK, PAPER, SCISSOR
import random

from cogment import Agent, GrpcServer

class Agent(Agent):
    VERSIONS = {"agent": "1.0.0"}
    actor_class = cog_settings.actor_classes.agent

    def decide(self, observation):
        choices = [ROCK, PAPER, SCISSOR]
        decision = random.choice(choices)

        action = AgentAction()
        action.decision = decision

        print(f"Player decide {action.decision}")
        return action

    def reward(self, reward):
        print("Player reward")

    def on_message(self, sender, msg):
        if msg:
            print(f'Agent {self.id_in_class} received message - {msg} from sender {sender}')

    def end(self):
        print("Player end")



if __name__ == '__main__':
    server = GrpcServer(Agent, cog_settings)
    server.serve()

Restart the agent by running docker-compose restart agent

All agents must implement a decide method which takes an input from the corresponding actor class’ observation space, Observation, and produces an action from the matching action space, AgentAction.

Implementation of the Environment

Update the environment implementation in envs/main.py as follows:

envs/env.py

import cog_settings
from data_pb2 import Observation, ROCK, PAPER, SCISSOR

from cogment import Environment, GrpcServer

choice = ['NONE','ROCK','PAPER','SCISSOR']


class Env(Environment):
    VERSIONS = {"env": "1.0.0"}

    def start(self, config):
        print("environment starting")
        self.observation = Observation()
        self.observation.p1_score = 0
        self.observation.p2_score = 0

        obs_table = cog_settings.ObservationsTable(self.trial)
        for o in obs_table.all_observations():
            o.snapshot = self.observation

        return obs_table

    def update(self, actions):
        print("environment updating")

        p1_decision = actions.human[0].decision
        p2_decision = actions.agent[0].decision

        print(f"human played {choice[p1_decision]} - agent played {choice[p2_decision]}")

        if p1_decision == p2_decision:
            pass
        elif ((p1_decision == ROCK and p2_decision == SCISSOR) or
              (p1_decision == PAPER and p2_decision == ROCK) or
              (p1_decision == SCISSOR and p2_decision == PAPER)):
            self.observation.p1_score += 1
        else:
            self.observation.p2_score += 1

        print(f"human score {self.observation.p1_score} - agent score {self.observation.p2_score}")

        obs_table = cog_settings.ObservationsTable(self.trial)
        for o in obs_table.all_observations():
            o.snapshot = self.observation

        return obs_table

    def on_message(self, sender, msg):
        if msg:
            print(f'Environment received message - {msg} from sender {sender}')


    def end(self):
        print("environment end")


if __name__ == "__main__":
    server = GrpcServer(Env, cog_settings)
    server.serve()

Restart the environment by running docker-compose restart env

All environments must implement an update method which takes a list of actions ordered by actor_class and returns a new observation.

Implementation of the client

When dealing with a human / AI interaction training project, a user interface is required for the human(s) to interact with the environment. Since RPS is a simple game, a simple python based command line frontend can be used. Modify clients/client.py as follows.

clients/client.py

import cog_settings
import cmd

from data_pb2 import HumanAction, ROCK, PAPER, SCISSOR
from cogment.client import Connection


class RPSShell(cmd.Cmd):
    intro = "let's play some rock paper scissors with an AI agent! Type 'help' to display commands "
    prompt = '(RPS)'

    def __init__(self):
        super().__init__()
        connection = Connection(cog_settings, "orchestrator:9000")
        self.trial = connection.start_trial(cog_settings.actor_classes.human)
        self.p1_score = 0
        self.p2_score = 0

    def do_rock(self, arg):
        'Play rock'
        self.update(ROCK)

    def do_paper(self, arg):
        'Play paper'
        self.update(PAPER)

    def do_scissors(self, arg):
        'Play scissors'
        self.update(SCISSOR)

    def do_quit(self, arg):
        'Quit the game'
        self.trial.end()
        return True

    def update(self, choice):
        observation = self.trial.do_action(HumanAction(decision=choice))

        decision = "it's a draw"
        if observation[0].p1_score > self.p1_score:
            decision = "You won"
            self.p1_score = observation[0].p1_score
        elif observation[0].p2_score > self.p2_score:
            decision = "AI won"
            self.p2_score = observation[0].p2_score

        print(decision)
        print(f"current score - you: {self.p1_score} - ai: {self.p2_score}")


if __name__ == '__main__':
    RPSShell().cmdloop()

This client update method uses, on the trial instance, do_action, which sends the action to the orchestrator and returns the updated observation.

We now have everything we need in order to start using the client and play RPS against our bot. But, before we can do so we just need to restart both the environment and the player services, as well as the orchestrator. Go back to the corresponding terminal windows, interrupt them if they are still running, and relaunch (you can relaunch them in any order, it doesn’t matter).

Play RPS against our bot agent

Since our client is expecting inputs, we use docker-compose run instead of docker-compose up.

In a new terminal, start the client.

$ docker-compose run client
let's play some rock paper scissors with an AI agent! Type 'help' to display commands
(RPS)rock
You won
current score - you: 1 - ai: 0
(RPS)rock
You won
current score - you: 2 - ai: 0
(RPS)rock
AI won
current score - you: 2 - ai: 1
(RPS)exit

help gives you the available commands, which are quit, rock, paper, and you guessed it, scissors.

Everything runs, but we can do much more. Let’s go to step 3!