Terminology is a very important part of understanding new concepts and learning how to use new technology. The words we use throughout our documentation may cause problems if one is not familiar with how we use those words; this is a glossary of terms for newcomers and seasoned developers alike. Some familiar terms may have additional caveats specifically added to their definition in the context of the Cogment Framework (generally for clarity).
An actor is somebody or something who/which interacts with the environment by executing certain actions, taking observations, and receiving rewards (positive or negative) for this. An Actor can be an Agent (of any level of complexity and any type of flexibility, from bots to ML agents), or a human user.
Each Actor always belongs to a single Actor Class. An Actor Class is primarily defined by its associated Action Space, as a property of an environment. For example, pilot and passenger could be two different Actor Classes.
- An Action is an interaction an Actor performs on the environment. Actions are picked from the Action Space,
- A single element of an Action Space.
We usually call agent, non-human Actors. Agents can use on any sort of decision-making underlying system, able to learn or not.
- The environment is the set of rules defining how a trial evolves over time for any given use case. For example, to train a pilot agent, a flight simulation would be the environment. Actors can interact with the environment itself, or with each other through the environment, within the boundaries of the environment ruleset (i.e. how an environment can change, from environmental rulesets or the actions of Actors in the environment).
- A stateful instance of an environment.
An environment state is the specific set of conditions in which the environment is at a specific time (for example, when it is first instantiated). These conditions can be observable or not, and our Framework does not concern itself with the ones that are not.
These two elements combined are what we call the framework:
The interface, usually an app, that humans use to interact with the rest of the system; the software that turns humans into Actors.
Human / Artificial Intelligence Interaction Loop Training¶
We call Human / AI interaction loop training the fundamental paradigm our Framework was build for: a continuous loop between humans and agents where they learn from each other. It’s a way to train agents in an environment where direct human interactions, whether between humans, between humans and the environment, or between humans and agents, provide live data to the agents (first part of the loop), as well as a way for agents to interact with humans, either directly or through the environment (second part of the loop).
Messages can be sent from any actor or the environment to any actor or the environment. The message can be any protobuf class. This creates channels between any set of actors and the environment. These channels can be used for applications where communication between actors and the environment need to be outside of the standard observation and action spaces.
A model is a representation, usually a mathematical one in our context, of a concept, structure, system, or an aspect of the real world. It is usually a simplified and abstracted representation.
An observation delta is the difference between two observations. Usually, we encode deltas from the past to the future.
An observation transition is an observation delta between two consecutive observations.
The Orchestrator is the central piece of our framework; it’s an executable that handles several things:
- It circulates data flows between Actors and Environments.
- It dumps datasets in the chosen storage location.
- It compresses & encrypts data.
- It collates various reward sources (usually environment or actors) into a single reward for an Actor.
- It instantiates the trials.
A plugin or extension adds functionality to our core framework. We provide plugins that handle special features such as Deployment, Dataset storage destinations, Analytics, that one may or may not choose to use alongside the core framework, depending on their specific needs.
A binary data format for serialized communication,
.proto files are used to specify the available data structures. You can learn more at https://developers.google.com/protocol-buffers/.
A sent reward is a measure of an Actor’s performance within the environment at a given tick. The reward can be sent by the environment, and/or a different Actor. They are sent to the Orchestrator, which collate before they are received by the target actor.
A reward function describes how an agent "ought" to behave; what behaviours lead to Rewards. Note that in our case, Reward functions can be used to reward any Actor, regardless of it being human or not.
Reinforcement Learning (RL)¶
The problem one wants to solve.