Terminology is a very important part of understanding new concepts and learning how to use new technology. The words we use throughout our documentation may cause problems if one is not familiar with how we use those words; this is a glossary of terms for newcomers and seasoned developers alike. Some familiar terms may have additional caveats specifically added to their definition in the context of the Cogment Framework (generally for clarity).
An actor is somebody or something who/which interacts with the environment by executing certain actions, taking observations, and receiving rewards (positive or negative) for this. An Actor can be an Agent (of any level of complexity and any type of flexibility, from bots to ML agents), or a human user.
Each Actor always belongs to a single Actor Class. An Actor Class is primarily defined by its associated Action Space, as a property of an environment. For example, pilot and passenger could be two different Actor Classes.
- An Action is an interaction an Actor performs on the environment. Actions are picked from the Action Space of the Actor Class the Actor belongs to. For example, turning right.
- A single element of an Action Space.
The set of actions an Actor can pick an Action from. There is one Action Space per Actor Class.
An agent is a non-human Actor. It can be based on any sort of decision-making underlying system, able to learn or not.
- The environment is the set of rules defining how a trial evolves over time for any given use case. For example, to train a pilot agent, a flight simulation would be the environment. Actors can interact with the environment itself, or with each other through the environment, within the boundaries of the environment ruleset (i.e. how an environment can change, from environmental rulesets or the actions of Actors in the environment).
- A stateful instance of an environment.
An environment state is the specific set of conditions in which the environment is at a specific time (for example, when it is first instantiated). These conditions can be observable or not, and our Framework does not concern itself with the ones that are not.
A feedback is a measure of an Actor’s performance within the environment. The feedback can be produced by the environment, and/or a different Actor. Feedbacks are sent to the Orchestrator, which collates them into rewards.
These two elements combined are what we call the framework: - Orchestrator - SDKs
The interface, usually an app, that humans use to interact with the rest of the system; the software that turns humans into Actors.
Human / Artificial Intelligence Interaction Loop Training
We call Human / AI interaction loop training the fundamental paradigm our Framework was build for: a continuous loop between humans and agents where they learn from each other. It’s a way to train agents in an environment where direct human interactions, whether between humans, between humans and the environment, or between humans and agents, provide live data to the agents (first part of the loop), as well as a way for agents to interact with humans, either directly or through the environment (second part of the loop).
A model is a representation, usually a mathematical one in our context, of a concept, structure, system, or an aspect of the real world. It is usually a simplified and abstracted representation.
An observation delta is the difference between two observations. Usually, we encode deltas from the past to the future.
An observation transition is an observation delta between two consecutive observations.
The Orchestrator is the central piece of our framework; it’s an executable that handles several things: - It circulates data flows between Actors (Humans and Agents) and Environments. - It dumps datasets in the chosen storage location. - It compresses & encrypts data. - It collates various reward sources (usually environment or actors) into a single reward for an Actor. - It instantiates the trials.
A plugin or extension adds functionality to our core framework.
We provide plugins that handle special features such as Deployment, Dataset storage destinations, Analytics, that one may or may not choose to use alongside the core framework, depending on their specific needs.
A reward function describes how an agent "ought" to behave; what behaviours lead to Rewards. Note that in our case, Reward functions can be used to reward any Actor, regardless of it being human or not.
Reinforcement Learning (RL)
A trial is a single run of a use case, with a beginning and end, populated with a single instance of the use case’s environment and its actors.
The problem one wants to solve.