gym_tl_tools.wrapper
Classes
Base class for generating variable values from the environment's observation and info. |
|
|
|
Configuration dictionary for the reward structure in the TLObservationReward wrapper. |
|
|
A wrapper for Gymnasium environments that augments observations with the state of a temporal logic automaton, and computes rewards based on satisfaction of temporal logic (TL) specifications. |
|
Configuration for the TLObservationReward wrapper. |
- class gym_tl_tools.wrapper.BaseVarValueInfoGenerator[source]
Bases:
Generic[ObsType,ActType],ABCBase class for generating variable values from the environment’s observation and info.
This class should be subclassed to implement custom logic for extracting variable values that are used to evaluate atomic predicates in temporal logic formulas.
The get_var_values method should return a dictionary where keys are variable names used in predicate formulas (e.g., “d_robot_goal”, “d_robot_obstacle”) and values are their corresponding numerical values extracted from the environment’s observation and info.
Example
```python class MyVarValueGenerator(BaseVarValueInfoGenerator):
- def get_var_values(self, env, obs, info):
- return {
“d_robot_goal”: info.get(“d_robot_goal”, float(‘inf’)), “d_robot_obstacle”: obs.get(“obstacle_distance”, 0.0), “robot_speed”: np.linalg.norm(obs[“velocity”]) if “velocity” in obs else 0.0,
}
- abstractmethod get_var_values(env: Env[ObsType, ActType], obs: ObsType, info: dict[str, Any]) dict[str, Any][source]
Extract variable values from the environment’s observation and info.
This method should be implemented by subclasses to provide the variable values needed for evaluating atomic predicates in temporal logic formulas.
- Parameters:
- Returns:
A dictionary where keys are variable names used in predicate formulas and values are their corresponding numerical values.
- Return type:
- class gym_tl_tools.wrapper.RewardConfig(*, terminal_state_reward: float = 5, state_trans_reward_scale: float = 100, dense_reward: bool = False, dense_reward_scale: float = 0.01)[source]
Bases:
BaseModel- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class gym_tl_tools.wrapper.RewardConfigDict[source]
Bases:
TypedDictConfiguration dictionary for the reward structure in the TLObservationReward wrapper. This dictionary is used to define how rewards are computed based on the automaton’s state and the environment’s info.
- terminal_state_reward
Reward given when the automaton reaches a terminal state in addition to the automaton state transition reward.
- Type:
- state_trans_reward_scale
Scale factor for the reward based on the automaton’s state transition robustness. This is applied to the robustness computed from the automaton’s transition. If the transition leads to a trap state, the reward is set to be negative, scaled by this factor.
- Type:
- dense_reward
Whether to use dense rewards (True) or sparse rewards (False). Dense rewards provide a continuous reward signal based on the robustness of the transition to the next non-trap state.
- Type:
- dense_reward_scale
Scale factor for the dense reward. This is applied to the computed dense reward based on the robustness to the next non-trap automaton’s state. If dense rewards are enabled, this factor scales the reward returned by the automaton.
- Type:
- class gym_tl_tools.wrapper.TLObservationReward(env: ~gymnasium.core.Env[~gymnasium.core.ObsType, ~gymnasium.core.ActType], tl_spec: str, atomic_predicates: list[~gym_tl_tools.automaton.Predicate] | list[dict[str, str]], var_value_info_generator: ~gym_tl_tools.wrapper.BaseVarValueInfoGenerator[~gymnasium.core.ObsType, ~gymnasium.core.ActType], reward_config: ~gym_tl_tools.wrapper.RewardConfigDict = {'dense_reward': False, 'dense_reward_scale': 0.01, 'state_trans_reward_scale': 100, 'terminal_state_reward': 5}, early_termination: bool = True, parser: ~gym_tl_tools.parser.Parser = <gym_tl_tools.parser.Parser object>, dict_aut_state_key: str = 'aut_state')[source]
Bases:
ObservationWrapper[dict[str,ObsType|int64|ndarray[tuple[Any, …],dtype[number]]],ActType,ObsType],RecordConstructorArgs,Generic[ObsType,ActType]A wrapper for Gymnasium environments that augments observations with the state of a temporal logic automaton, and computes rewards based on satisfaction of temporal logic (TL) specifications.
This wrapper is designed for environments where the agent’s objective is specified using temporal logic (e.g., LTL). It integrates an automaton (from a TL formula) into the observation and reward structure, enabling RL agents to learn tasks with complex temporal requirements.
Usage
- Define Atomic Predicates:
Create a list of Predicate objects, each representing an atomic proposition in your TL formula. Each predicate has a name (used in the TL formula) and a formula (Boolean expression string). Example:
- Create a Variable Value Generator:
Implement a subclass of BaseVarValueInfoGenerator to extract variable values from the environment’s observation and info for evaluating the atomic predicates. Example:
```python from gym_tl_tools import BaseVarValueInfoGenerator
- class MyVarValueGenerator(BaseVarValueInfoGenerator):
- def get_var_values(self, env, obs, info):
# Extract variables needed for predicate evaluation return {
“d_robot_goal”: info.get(“d_robot_goal”, float(‘inf’)), “d_robot_obstacle”: info.get(“d_robot_obstacle”, float(‘inf’)),
}
- Specify the Temporal Logic Formula:
Write your TL specification as a string, using the names of your atomic predicates. Example:
`python tl_spec = "F(goal_reached) & G(!obstacle_hit)" `
- Wrap Your Environment:
Pass your environment, TL specification, atomic predicates, and variable value generator to TLObservationReward. Example:
- Observation Structure:
The wrapper augments each observation with the current automaton state.
If the original observation space is a [Dict](https://gymnasium.farama.org/main/api/spaces/composite/#gymnasium.spaces.Dict), the automaton state is added as a new key (default: “aut_state”).
~~If the original observation space is a [Tuple](https://gymnasium.farama.org/main/api/spaces/composite/#gymnasium.spaces.Tuple), the automaton state is appended.~~
Otherwise, the observation is wrapped in a [Dict](https://gymnasium.farama.org/main/api/spaces/composite/#gymnasium.spaces.Dict) with keys “obs” and “aut_state”.
- Reward Calculation:
At each step, the wrapper computes the reward based on the automaton’s transition, which reflects progress toward (or violation of) the TL specification.
The automaton state is updated according to the values of the atomic predicates, which are evaluated using the variable values provided by the var_value_info_generator.
- Reset and Step:
On reset(), the automaton is reset to its initial state, and the initial observation is augmented.
On step(action), the automaton transitions based on the variable values extracted by the var_value_info_generator, and the reward is computed accordingly.
Notes
- The wrapper adds additional information to the info dict, including:
is_success: Whether the automaton has reached a goal state.
is_failure: Whether the automaton has reached a trap state.
is_aut_terminated: Whether the automaton has been terminated.
- param env:
The environment to wrap.
- type env:
gymnasium.Env
- param tl_spec:
The temporal logic specification (e.g., LTL formula) to be used for the automaton.
- type tl_spec:
str
- param atomic_predicates:
List of atomic predicates used in the TL formula. Each predicate has a name (used in the TL formula) and a formula (Boolean expression string for evaluation).
- type atomic_predicates:
list[gym_tl_tools.Predicate]
- param var_value_info_generator:
An instance of a subclass of BaseVarValueInfoGenerator that extracts variable values from the environment’s observation and info for evaluating atomic predicates.
- type var_value_info_generator:
BaseVarValueInfoGenerator[ObsType, ActType]
- param reward_config:
Configuration for the reward structure (default values provided).
- type reward_config:
RewardConfigDict, optional
- param early_termination:
Whether to terminate episodes when automaton reaches terminal states (default: True).
- type early_termination:
bool, optional
- param parser:
Parser for TL expressions (default: new instance of Parser).
- type parser:
Parser, optional
- param dict_aut_state_key:
Key for the automaton state in the observation dictionary (default: “aut_state”).
- type dict_aut_state_key:
str, optional
Example
```python from gym_tl_tools import Predicate, BaseVarValueInfoGenerator, TLObservationReward
# Define atomic predicates atomic_predicates = [
Predicate(name=”goal_reached”, formula=”d_robot_goal < 1.0”), Predicate(name=”obstacle_hit”, formula=”d_robot_obstacle < 1.0”),
]
# Create variable value generator class MyVarValueGenerator(BaseVarValueInfoGenerator):
- def get_var_values(self, env, obs, info):
- return {
“d_robot_goal”: info.get(“d_robot_goal”, float(‘inf’)), “d_robot_obstacle”: info.get(“d_robot_obstacle”, float(‘inf’)),
}
var_generator = MyVarValueGenerator() tl_spec = “F(goal_reached) & G(!obstacle_hit)”
- wrapped_env = TLObservationReward(
env, tl_spec=tl_spec, atomic_predicates=atomic_predicates, var_value_info_generator=var_generator,
)
obs, info = wrapped_env.reset() done = False while not done:
action = wrapped_env.action_space.sample() obs, reward, terminated, truncated, info = wrapped_env.step(action) done = terminated or truncated
- forward_aut(obs: ObsType, info: dict[str, Any]) None[source]
Forward the automaton based on the current observation and info.
This method updates the automaton state based on the current observation and info. It is useful for manually stepping the automaton without taking an action in the environment.
- property is_aut_terminated: bool
Check if the automaton is in a terminal state.
- Returns:
True if the automaton is in a terminal state, False otherwise.
- Return type:
- observation(observation: ObsType) dict[str, ObsType | int64 | ndarray[tuple[Any, ...], dtype[_ScalarT]]][source]
Process the observation to include the automaton state.
- reset(*, seed: int | None = None, options: dict[str, Any] | None = {'forward_aut_on_reset': True, 'info': {}, 'obs': {}}) tuple[dict[str, ObsType | int64 | ndarray[tuple[Any, ...], dtype[_ScalarT]]], dict[str, Any]][source]
Reset the environment and return the initial observation.
- Parameters:
- Returns:
new_obs (dict[str,ObsType|int]) – The initial observation with the automaton state.
info (dict[str, Any]) – Additional information from the reset. Should contain the variable keys and values that define the atomic predicates.
- step(action: ActType) tuple[dict[str, ObsType | int64 | ndarray[tuple[Any, ...], dtype[_ScalarT]]], SupportsFloat, bool, bool, dict[str, Any]][source]
Take a step in the environment with the given action.
- Parameters:
action (ActType) – The action to take in the environment.
- Returns:
new_obs (dict[str,ObsType|int]) – The new observation after taking the action.
reward (SupportsFloat) – The reward received from the environment.
terminated (bool) – Whether the episode has terminated.
truncated (bool) – Whether the episode has been truncated.
info (dict[str, Any]) – Additional information from the step. Should contain the variable keys and values that define the atomic predicates.
- class gym_tl_tools.wrapper.TLObservationRewardConfig(*, tl_spec: str = '', atomic_predicates: list[~gym_tl_tools.automaton.Predicate], var_value_info_generator_cls: str, var_value_info_generator_args: dict[str, ~typing.Any] = {}, reward_config: ~gym_tl_tools.wrapper.RewardConfig = RewardConfig(terminal_state_reward=5, state_trans_reward_scale=100, dense_reward=False, dense_reward_scale=0.01), early_termination: bool = True, parser: ~gym_tl_tools.parser.Parser = <gym_tl_tools.parser.Parser object>, dict_aut_state_key: str = 'aut_state')[source]
Bases:
BaseModel,Generic[ObsType,ActType]Configuration for the TLObservationReward wrapper.
This class defines the parameters used to configure the TLObservationReward wrapper, including the temporal logic specification, atomic predicates, variable value generator, reward configuration, and other settings.
- tl_spec
The temporal logic specification (e.g., LTL formula) to be used for the automaton.
- Type:
- var_value_info_generator
An instance of a subclass of BaseVarValueInfoGenerator that extracts variable values from the environment’s observation and info.
- Type:
BaseVarValueInfoGenerator[ObsType, ActType]
- reward_config
Configuration for the reward structure.
- Type:
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- reward_config: RewardConfig
- serialize_model(info: SerializationInfo | None = None) dict[str, Any][source]
Custom model serializer to handle atomic_predicates serialization with context.
- property var_value_info_generator: BaseVarValueInfoGenerator[ObsType, ActType]
Instantiate the variable value info generator based on the provided class and arguments.
- Returns:
An instance of the variable value info generator.
- Return type:
BaseVarValueInfoGenerator[ObsType, ActType]