Skip to content

Cogment Core Concepts Guide

Welcome to the Cogment core concepts guide. It contains information that is pertinent to both the high-level SDK and the low-level API.

Core Concepts

Cogment is built around concepts adapted from multi-agent systems (agents, environment), Markov decision processes (action and observation space) and reinforcement learning (trials, rewards).

Trials

Trials are what a Cogment deployment runs. They enable Actors to interact with their Environment. Trials are started by clients connecting to Cogment. A trial can end either by being terminated from a client or end by itself, for example once a specific state of the Environment is reached.

During the trial:

A trial is defined by the participating Actors and the host Environment. As a concept, Trials are quite close to Reinforcement Learning (RL)'s Episodes, i.e. all the states that come between an initial state and a terminal state. However, because Cogment can be used outside of an RL context, we prefer using the more generic term of Trial.

Actors

Actors within a trial instantiate actor classes defined by the nature of the information they receive from the environment, their observation space, and what actions they can perform, their action space.

In Cogment, the observation and action space are defined as typed data structures. In particular, Cogment uses protobuf as a format to specify these data structures. This typing defines both an interface contract between the Actors and the Environment and helps convey semantic information, thus facilitating the independent design and development of both.

An Actor might be controlled either by a software Agent, or by a Human. Whichever the case, the process of generating actions based on observations remains the same, and the Environment treats them the same.

Environment

The Environment is the context within which the Trial takes place. The Environment receives the actions done by the actors, usually updates an internal state, and generates an observation for each Actor.

The Environment is the main integration point between Cogment and an external system, either a simulation or a real world system.

The cogment.yaml

At the heart of every Cogment project is a YAML file typically called cogment.yaml. Its primary role is to define the actor classes present within the project, including their action & observation spaces, as well as a default configuration for trials, including the number of actor participating in each trial and their class and implementation.

Architecture

Running trials with Cogment usually involves the deployment of a cluster of services and its clients. These components are either provided by the Cogment framework, depicted below in blue, or implemented for a particular project, depicted below in orange.

Cogment Architecture - Simple

User implemented components use one of the Cogment SDKs or directly implement the underlying protocol. Components communicate using gRPC, clients can also communicate in a web-friendly way using gRPC-Web and grpcwebproxy.

Orchestrator

The Orchestrator is the glue that binds everything together. It is responsible for running the trials and contacting other services as needed to ensure their execution.

The key aspect of Cogment's orchestrator is its capacity to handle a number of network connections in parallel while keeping its responsiveness.

Controller

The Controller is a key part of using Cogment, it initiates communication with the Orchestrator to control the execution of trials. It is responsible for starting Trials, retrieving and watching their state (including the end of the trial), or requesting trial termination.

Environment

The Environment implementation is accessed by the Orchestrator to run the Environment during Trials.

Using one of Cogment's SDKs, the Environment can be implemented as a function integrating a "state of the world" with the Trial. This function performs the following tasks during the Trial:

  • Generate Observations from the current state of the world, for example retrieving the visible objects from a 3D simulation.
  • Apply the Actions, thus updating the state of the world, for example changing the velocity of a moving vehicle in a race simulation.
  • Evaluate the performance of Actors and send them Rewards, for example by checking if a vehicle crossed the finish line in a race simulation.
  • Send and receive direct messages.

Actors

Actors can be implemented in two different ways, either as a service or as a client. Service Actor implementations are accessed by the Orchestrator during Trials, while Client Actor implementations join a Trial by initiating the communication with the Orchestrator. Client Actors implementations can reach a Cogment deployment through NAT traversal. This makes them particularly well-suited to implement human-driven Actors, in web-browsers for example.

Using one of Cogment's SDKs Actors can be implemented as functions handling the integration between a decision-making Actor (software agent or Human) and the Trial. This function performs the following tasks during the Trial:

  • Receive Observations and do Actions in response, for example vectorizing the retrieved observation, feeding it to a neural network and converting its output to an Action.
  • Receive Rewards, for example using them to update a neural network.
  • Send and receive direct messages.

Please note that rewards can also be retrieved after the fact using an activity logger.

Additional components

On top of the core components described above, a Cogment deployment can include these additional ones:

  • CLI is the command line tool used to bootstrap and build Cogment projects.
  • Datalog services can be used to listen to the activity during a trial (actions, observations, rewards, messages) in order to, for example, store these data for the offline training of AI agents. Trial Datastore is an out-of-the-box implementation of this.
  • Model Registry handles the storage and dispatch of AI models train with Cogment and used by the actor.
  • Pre Trial Hooks can be used to dynamically setup Trials from a given configuration, for example changing the number of Actors or pointing to other Environment or Actor implementations.
  • Metrics & Dasboard provides a solution to monitor and visualize various metrics from the services.

Components availability summary

The following table summarizes how each component can either be implemented or used out of the box.

Component Out-of-the-box Python SDK Javascript SDK gRPC API
Orchestrator cogment-orchestrator ✅ implement Control API & Client Actor API
Controller use controller use TrialController ✅ use Control API
Environment register environment impl & serve ✅ implement Environment API
Actor (Service) register actor impl & serve ✅ implement Service Actor API
Actor (Client) register actor impl & join trial register an ActorImplementation ✅ use Client Actor API
CLI cogment-cli
Trial Datastore cogment-trial-datastore register datalog impl & serve ✅ implement Datalog API & Trial Datastore API (🚧)
Model Registry cogment-model-registry ✅ implement Model Registry API (🚧)
Pre Trial Hook register pre trial hook impl & serve ✅ implement Pre Trial Hook API
Metrics cogment-metrics
Dashboard cogment-dashboard