Cogment High-Level API guide (Python)

Prerequisites

This document assumes the reader is familiar with the Cogment Fundamentals.

The High-level cogment API expects users to use protocol buffers to declare a project's data structures. The intricacies of protobufs are beyond the scope of this document. Basic knowledge of the technology and its usage is assumed.

Basic familiarity with Docker is also a prerequisite.

The cogment.yaml file

An actor class is primarily defined by its observation space and action space and both MUST be configured in the cogment.yaml file.

The shape of these spaces is declared by using a protocol buffer message type. Observations and actions will simply be instances of the matching type.

For example, in the following, driver and pedestrian share a common view of the environment, but have different actions available to them.

import:
  proto:
    - city.proto

actors:
  driver:
    observation:
      space: city.Observation

    action:
      space: city.DriverAction

  pedestrian:
    observation:
      space: city.Observation

    action:
      space: city.PedestrianAction

[^1] You can find the full list of options configurable within the yaml file here.

Compiling the cogment.yaml

In order to use the cogment.yaml file within python scripts, it needs to be interpreted into a python module. This is done by a tool called the “cogment cli”.

We recommend using the cogment/cli docker image to run it, as it has all the required dependencies correctly setup already.

# docker run -v $(pwd):/data --rm cogment/cli --file /data/cogment.yaml --python_dir=/data

This will create a cog_settings.py module in the current directory. The /data path is the path within the container at which the current local directory is mounted.

The cogment cli will also compile the imported .proto files in python modules living in the same location. There is no need to invoke protoc yourself.

Environment

Environments are implemented by a Python class that inherits from the cogment.Environment class.

This class will be instantiated once for each trial that is run on the project, and needs to implement two methods: start() and update().

The start() method will be called at the start of the trial, and an instance of the environment’s configuration type (if applicable) will be passed to it. start() must return the initial observation to be sent to the actors.

The update() method will be invoked repeatedly as the trial progresses, and the action of each actor participating in the trial will be passed to it. update() must return the new observations.

In the common case where all actors within a trial share the same observation, a bare-minimum environment service would look like this:

# env.py
import cog_settings
from city_pb2 import Observation

from cogment import Environment, GrpcServer


class CityEnv(Environment):
    def start(self, config):
        return Observation()

    def update(self, actions):
        return Observation()

if __name__ == "__name__":
    server = GrpcServer(CityEnv, cog_settings, 9002)
    server.serve()

(TODO: add explanation of multi-observation API)

Interpreting Actions

The actions argument passed to the environment’s update() has one attribute for each actor class of the project, named accordingly. Each of these is itself a list of deserialized protobuf messages (one per actor of the actor class the attribute refers to).

The type of the objects in the list will be the one that was set as the action:space of that actor class.

def update(self, actions):
    action_a = actions.pedestrian[0]
    action_b = actions.pedestrian[1]

    car_action = actions.driver[0]
    ...

N.B. You should not assume that an environment will be updated in the same thread that created it, nor that all updates will happen within the same thread. Similarly, the SDK does not perform synchronization across environment instances; therefore environments sharing data amongst themselves need to expect a possible high level of contention.

Agent

Agents look a lot like environment, inheriting from cogment.Agent. They are also instantiated and served on demand, though multiple instances of the same agent python class could be created for each trial if the cogment.yaml specifies so.

The two methods the agent should implement are decide() and reward().

decide() chooses which action should be taken when faced with a given observation.

reward() notifies the agent that a judgment has been made on its past performance. The agent is free to do what it wants with that information.

Finally, the agent must announce to the SDK which actor class of the project it is implementing. This is done by setting the actor_class class property to the correct reference from the project's cog_settings module.

A typical agent would look like this:

# agent.py
import cog_settings
from city_pb2 import Action

from cogment import Agent, GrpcServer


class Pedestrian(Agent):
    actor_class = cog_settings.actor_classes.pedestrian

    def decide(self, observation):
        return PedestrianAction()

    def reward(self, reward):
        pass

if __name__ == "__main__":
    server = GrpcServer(MyAgent, cog_settings, 9001)
    server.serve()

Frontend

Unlike the agent and environment APIs, where the code gets invoked on demand by the Cogment framework, the frontend code sends requests to the orchestrator.

# client.py
import cog_settings

from city_pb2 import DriverAction
from cogment.client import Connection

# Create a connection to the Orchestrator serving this project
conn = Connection(cog_settings, "127.0.0.1:9000")

# Initiate a trial
trial = conn.start_trial(cog_settings.actor_classes.player)

# Perform actions, and get observations
observation = trial.do_action(DriverAction())
observation = trial.do_action(DriverAction())
observation = trial.do_action(DriverAction())
observation = trial.do_action(DriverAction())

# cleanup
trial.end()

Feedback

Creating

Feedbacks can be generated from all three components (Environment, Agent or frontend) using the trial object:

In the agent and environment, the trial object can be found as the trial property of the instance itself, whereas in the frontend, the object returned by start_trial() serves that purpose.

# In agent/environment
class Pedestrian(Agent):
    def foo(self):
      human_driver = self.trial.actors.driver[1]
      human_driver.add_feedback(
          time=0, 
          value=-1, 
          confidence=1
      )

# In client
trial = conn.start_trial(cog_settings.actor_classes.driver)
...
ai = trial.actors.pedestrian[0]
ai.add_feedback(time=0, value=-1, confidence=1)

Consuming

All the feedbacks that are sent and destined to each specific actor for a given point in time are combined together in a single reward by the framework.

This reward will be stored in the offline dataset, but the agent has the option to learn from it directly, live, as the trial is running.

# In agent
class Pedestrian(Agent):
    def reward(self, reward):
        print(f'receiving reward: {reward}')

Delta Encoding

By default, observations are sent whole from the environment to the actors. However, it's fairly common to only have a small portion of an observation to change from one update to the next.

Cogment allows you to specify a separate data structure to encode partial observation updates. However, if you do so, you must provide a method that can apply the deltas to previous observations.

# delta.py
def apply_delta(previous_observation, delta):
    # Return the updated observation, more often
    # than not, this should be the previous
    # observation that was modified in-place.
    previous_observation.car_position = delta.new_car_pos
    return previous_observation
# cogment.yaml
import:
  proto:
    - city.proto
  python:
    - delta

actors:
  my_class:
    observation:
      space: city.Observation
      delta: city.ObservationDelta
      delta_apply_fn: 
        python: delta.apply_delta

[^1]: This shows only the relevant part of the full cogment.yaml