Cogment Core Concepts Guide
Cogment is built around concepts adapted from multi-agent systems (agents, environment), Markov decision processes (action and observation space) and reinforcement learning (trials, rewards).
Trials are what a Cogment deployment runs. They enable Actors to interact with their Environment. Trials are started by clients connecting to Cogment. A trial can end either by being terminated from a client or end by itself, for example once a specific state of the Environment is reached.
During the trial:
- The Environment generates observations of its internal state and sends them to the actors.
- Given these observations, each actor might choose and take an action.
- The Environment receives the actions and updates its state.
- Rewards can be sent to the actors from either the environment or other actors.
- Actors receive rewards.
- The actors or the environment can send messages to actors or the environment.
- A log of the activity during the trial (observations, actions, rewards & messages) is produced and can be stored.
A trial is defined by the participating Actors and the host Environment. As a concept, Trials are quite close to Reinforcement Learning (RL)'s Episodes, i.e. all the states that come between an initial state and a terminal state. However, because Cogment can be used outside of an RL context, we prefer using the more generic term of Trial.
Actors within a trial instantiate actor classes defined by the nature of the information they receive from the environment, their observation space, and what actions they can perform, their action space.
In Cogment, the observation and action space are defined as typed data structures. In particular, Cogment uses protobuf as a format to specify these data structures. This typing defines both an interface contract between the Actors and the Environment and helps convey semantic information, thus facilitating the independent design and development of both.
An Actor might be controlled either by a software Agent, or by a Human. Whichever the case, the process of generating actions based on observations remains the same, and the Environment treats them the same.
The Environment is the context within which the Trial takes place. The Environment receives the actions done by the actors, usually updates an internal state, and generates an observation for each Actor.
The Environment is the main integration point between Cogment and an external system, either a simulation or a real world system.
At the heart of every Cogment project is a YAML file typically called
cogment.yaml. Its primary role is to define the actor classes present within the project, including their action & observation spaces, as well as a default configuration for trials, including the number of actor participating in each trial and their class and implementation.
Running trials with Cogment usually involves the deployment of a cluster of services and its clients. These components are either provided by the Cogment framework, depicted below in blue, or implemented for a particular project, depicted below in orange.
User implemented components use one of the Cogment SDKs or directly implement the underlying protocol. Components communicate using gRPC, clients can also communicate in a web-friendly way using gRPC-Web and grpcwebproxy.
The Orchestrator is the glue that binds everything together. It is responsible for running the trials and contacting other services as needed to ensure their execution.
The key aspect of Cogment's orchestrator is its capacity to handle a number of network connections in parallel while keeping its responsiveness.
The Controller is a key part of using Cogment, it initiates communication with the Orchestrator to control the execution of trials. It is responsible for starting Trials, retrieving and watching their state (including the end of the trial), or requesting trial termination.
- Generate Observations from the current state of the world, for example retrieving the visible objects from a 3D simulation.
- Apply the Actions, thus updating the state of the world, for example changing the velocity of a moving vehicle in a race simulation.
- Evaluate the performance of Actors and send them Rewards, for example by checking if a vehicle crossed the finish line in a race simulation.
- Send and receive direct messages.
Actors can be implemented in two different ways, either as a service or as a client. Service Actor implementations are accessed by the Orchestrator during Trials, while Client Actor implementations join a Trial by initiating the communication with the Orchestrator. Client Actors implementations can reach a Cogment deployment through NAT traversal. This makes them particularly well-suited to implement human-driven Actors, in web-browsers for example.
Using one of Cogment's SDKs Actors can be implemented as functions handling the integration between a decision-making Actor (software agent or Human) and the Trial. This function performs the following tasks during the Trial:
- Receive Observations and do Actions in response, for example vectorizing the retrieved observation, feeding it to a neural network and converting its output to an Action.
- Receive Rewards, for example using them to update a neural network.
- Send and receive direct messages.
Please note that rewards can also be retrieved after the fact using an activity logger.
Additional optional services
Beyond the core services described above, a Cogment deployment can include these additional ones:
- Pre trial hooks can be used to dynamically setup Trials from a given configuration, for example changing the number of Actors or pointing to other Environment or Actor implementations.
- Activity Logger can be used to listen to the activity during a trial (actions, observations, rewards, messages), for example, to do store these data in order to do offline training of AI agents.