gym_tl_tools.wrapper

Classes

`BaseVarValueInfoGenerator`()	Base class for generating variable values from the environment's observation and info.
`RewardConfig`(*[, terminal_state_reward, ...])
`RewardConfigDict`	Configuration dictionary for the reward structure in the TLObservationReward wrapper.
`TLObservationReward`(env, ...)	A wrapper for Gymnasium environments that augments observations with the state of a temporal logic automaton, and computes rewards based on satisfaction of temporal logic (TL) specifications.
`TLObservationRewardConfig`(*, tl_spec, ...)	Configuration for the TLObservationReward wrapper.

class gym_tl_tools.wrapper.BaseVarValueInfoGenerator[source]

Bases: Generic[ObsType, ActType], ABC

Base class for generating variable values from the environment’s observation and info.

This class should be subclassed to implement custom logic for extracting variable values that are used to evaluate atomic predicates in temporal logic formulas.

The get_var_values method should return a dictionary where keys are variable names used in predicate formulas (e.g., “d_robot_goal”, “d_robot_obstacle”) and values are their corresponding numerical values extracted from the environment’s observation and info.

Example

```python class MyVarValueGenerator(BaseVarValueInfoGenerator):

def get_var_values(self, env, obs, info):

return {
“d_robot_goal”: info.get(“d_robot_goal”, float(‘inf’)), “d_robot_obstacle”: obs.get(“obstacle_distance”, 0.0), “robot_speed”: np.linalg.norm(obs[“velocity”]) if “velocity” in obs else 0.0,

}

```

abstractmethod get_var_values(env: Env[ObsType, ActType], obs: ObsType, info: dict[str, Any]) → dict[str, Any][source]

Extract variable values from the environment’s observation and info.

This method should be implemented by subclasses to provide the variable values needed for evaluating atomic predicates in temporal logic formulas.

Parameters:

env (Env[ObsType, ActType]) – The environment from which to extract variable values.
obs (ObsType) – The current observation from the environment.
info (dict[str, Any]) – The info dictionary containing additional information from the environment.

Returns:

A dictionary where keys are variable names used in predicate formulas and values are their corresponding numerical values.

Return type:

dict[str, Any]

class gym_tl_tools.wrapper.RewardConfig(*, terminal_state_reward: float = 5, state_trans_reward_scale: float = 100, dense_reward: bool = False, dense_reward_scale: float = 0.01)[source]

Bases: BaseModel

dense_reward: bool

dense_reward_scale: float

model_config: ClassVar[ConfigDict] = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

state_trans_reward_scale: float

terminal_state_reward: float

class gym_tl_tools.wrapper.RewardConfigDict[source]

Bases: TypedDict

Configuration dictionary for the reward structure in the TLObservationReward wrapper. This dictionary is used to define how rewards are computed based on the automaton’s state and the environment’s info.

terminal_state_reward

Reward given when the automaton reaches a terminal state in addition to the automaton state transition reward.

Type:: float

state_trans_reward_scale

Scale factor for the reward based on the automaton’s state transition robustness. This is applied to the robustness computed from the automaton’s transition. If the transition leads to a trap state, the reward is set to be negative, scaled by this factor.

Type:: float

dense_reward

Whether to use dense rewards (True) or sparse rewards (False). Dense rewards provide a continuous reward signal based on the robustness of the transition to the next non-trap state.

Type:: bool

dense_reward_scale

Scale factor for the dense reward. This is applied to the computed dense reward based on the robustness to the next non-trap automaton’s state. If dense rewards are enabled, this factor scales the reward returned by the automaton.

Type:: float

dense_reward: bool

dense_reward_scale: float

state_trans_reward_scale: float

terminal_state_reward: float

class gym_tl_tools.wrapper.TLObservationReward(env: ~gymnasium.core.Env[~gymnasium.core.ObsType, ~gymnasium.core.ActType], tl_spec: str, atomic_predicates: list[~gym_tl_tools.automaton.Predicate] | list[dict[str, str]], var_value_info_generator: ~gym_tl_tools.wrapper.BaseVarValueInfoGenerator[~gymnasium.core.ObsType, ~gymnasium.core.ActType], reward_config: ~gym_tl_tools.wrapper.RewardConfigDict = {'dense_reward': False, 'dense_reward_scale': 0.01, 'state_trans_reward_scale': 100, 'terminal_state_reward': 5}, early_termination: bool = True, parser: ~gym_tl_tools.parser.Parser = <gym_tl_tools.parser.Parser object>, dict_aut_state_key: str = 'aut_state')[source]

Bases: ObservationWrapper[dict[str, ObsType | int64 | ndarray[tuple[Any, …], dtype[number]]], ActType, ObsType], RecordConstructorArgs, Generic[ObsType, ActType]

A wrapper for Gymnasium environments that augments observations with the state of a temporal logic automaton, and computes rewards based on satisfaction of temporal logic (TL) specifications.

This wrapper is designed for environments where the agent’s objective is specified using temporal logic (e.g., LTL). It integrates an automaton (from a TL formula) into the observation and reward structure, enabling RL agents to learn tasks with complex temporal requirements.

Usage

Define Atomic Predicates:
Create a list of Predicate objects, each representing an atomic proposition in your TL formula. Each predicate has a name (used in the TL formula) and a formula (Boolean expression string). Example:

```python from gym_tl_tools import Predicate atomic_predicates = [

Predicate(name=”goal_reached”, formula=”d_robot_goal < 1.0”), Predicate(name=”obstacle_hit”, formula=”d_robot_obstacle < 1.0”),
Create a Variable Value Generator:
Implement a subclass of BaseVarValueInfoGenerator to extract variable values from the environment’s observation and info for evaluating the atomic predicates. Example:

```python from gym_tl_tools import BaseVarValueInfoGenerator

class MyVarValueGenerator(BaseVarValueInfoGenerator):

def get_var_values(self, env, obs, info):
# Extract variables needed for predicate evaluation return {

“d_robot_goal”: info.get(“d_robot_goal”, float(‘inf’)), “d_robot_obstacle”: info.get(“d_robot_obstacle”, float(‘inf’)),

}

var_generator = MyVarValueGenerator() ```
Specify the Temporal Logic Formula:
Write your TL specification as a string, using the names of your atomic predicates. Example:

`python tl_spec = "F(goal_reached) & G(!obstacle_hit)" `
Wrap Your Environment:
Pass your environment, TL specification, atomic predicates, and variable value generator to TLObservationReward. Example:

```python from gym_tl_tools import TLObservationReward wrapped_env = TLObservationReward(

env, tl_spec=tl_spec, atomic_predicates=atomic_predicates, var_value_info_generator=var_generator,
Observation Structure:
- The wrapper augments each observation with the current automaton state.
- If the original observation space is a [Dict](https://gymnasium.farama.org/main/api/spaces/composite/#gymnasium.spaces.Dict), the automaton state is added as a new key (default: “aut_state”).
- ~~If the original observation space is a [Tuple](https://gymnasium.farama.org/main/api/spaces/composite/#gymnasium.spaces.Tuple), the automaton state is appended.~~
- Otherwise, the observation is wrapped in a [Dict](https://gymnasium.farama.org/main/api/spaces/composite/#gymnasium.spaces.Dict) with keys “obs” and “aut_state”.
Reward Calculation:
- At each step, the wrapper computes the reward based on the automaton’s transition, which reflects progress toward (or violation of) the TL specification.
- The automaton state is updated according to the values of the atomic predicates, which are evaluated using the variable values provided by the var_value_info_generator.
Reset and Step:
- On reset(), the automaton is reset to its initial state, and the initial observation is augmented.
- On step(action), the automaton transitions based on the variable values extracted by the var_value_info_generator, and the reward is computed accordingly.

Notes

The wrapper adds additional information to the info dict, including:
- is_success: Whether the automaton has reached a goal state.
- is_failure: Whether the automaton has reached a trap state.
- is_aut_terminated: Whether the automaton has been terminated.

param env:: The environment to wrap.
type env:: gymnasium.Env
param tl_spec:: The temporal logic specification (e.g., LTL formula) to be used for the automaton.
type tl_spec:: str
param atomic_predicates:: List of atomic predicates used in the TL formula. Each predicate has a name (used in the TL formula) and a formula (Boolean expression string for evaluation).
type atomic_predicates:: list[gym_tl_tools.Predicate]
param var_value_info_generator:: An instance of a subclass of BaseVarValueInfoGenerator that extracts variable values from the environment’s observation and info for evaluating atomic predicates.
type var_value_info_generator:: BaseVarValueInfoGenerator[ObsType, ActType]
param reward_config:: Configuration for the reward structure (default values provided).
type reward_config:: RewardConfigDict, optional
param early_termination:: Whether to terminate episodes when automaton reaches terminal states (default: True).
type early_termination:: bool, optional
param parser:: Parser for TL expressions (default: new instance of Parser).
type parser:: Parser, optional
param dict_aut_state_key:: Key for the automaton state in the observation dictionary (default: “aut_state”).
type dict_aut_state_key:: str, optional

Example

```python from gym_tl_tools import Predicate, BaseVarValueInfoGenerator, TLObservationReward

# Define atomic predicates atomic_predicates = [

Predicate(name=”goal_reached”, formula=”d_robot_goal < 1.0”), Predicate(name=”obstacle_hit”, formula=”d_robot_obstacle < 1.0”),

]

# Create variable value generator class MyVarValueGenerator(BaseVarValueInfoGenerator):

def get_var_values(self, env, obs, info):

return {
“d_robot_goal”: info.get(“d_robot_goal”, float(‘inf’)), “d_robot_obstacle”: info.get(“d_robot_obstacle”, float(‘inf’)),

}

var_generator = MyVarValueGenerator() tl_spec = “F(goal_reached) & G(!obstacle_hit)”

wrapped_env = TLObservationReward(: env, tl_spec=tl_spec, atomic_predicates=atomic_predicates, var_value_info_generator=var_generator,

)

obs, info = wrapped_env.reset() done = False while not done:

action = wrapped_env.action_space.sample() obs, reward, terminated, truncated, info = wrapped_env.step(action) done = terminated or truncated

```

forward_aut(obs: ObsType, info: dict[str, Any]) → None[source]

Forward the automaton based on the current observation and info.

This method updates the automaton state based on the current observation and info. It is useful for manually stepping the automaton without taking an action in the environment.

Parameters:

obs (ObsType) – The current observation from the environment.
info (dict[str, Any]) – The info dictionary containing variable values for the atomic predicates.

property is_aut_terminated: bool

Check if the automaton is in a terminal state.

Returns:: True if the automaton is in a terminal state, False otherwise.
Return type:: bool

observation(observation: ObsType) → dict[str, ObsType | int64 | ndarray[tuple[Any, ...], dtype[_ScalarT]]][source]

Process the observation to include the automaton state.

Parameters:: observation (ObsType) – The original observation from the environment.
Returns:: new_obs – The processed observation with the automaton state appended.
Return type:: dict[str,ObsType|int]

reset(*, seed: int | None = None, options: dict[str, Any] | None = {'forward_aut_on_reset': True, 'info': {}, 'obs': {}}) → tuple[dict[str, ObsType | int64 | ndarray[tuple[Any, ...], dtype[_ScalarT]]], dict[str, Any]][source]

Reset the environment and return the initial observation.

Parameters:

seed (int | None, optional) – Random seed for reproducibility.
options (dict[str, Any] | None, optional) – Additional options for resetting the environment.

Returns:

new_obs (dict[str,ObsType|int]) – The initial observation with the automaton state.
info (dict[str, Any]) – Additional information from the reset. Should contain the variable keys and values that define the atomic predicates.

step(action: ActType) → tuple[dict[str, ObsType | int64 | ndarray[tuple[Any, ...], dtype[_ScalarT]]], SupportsFloat, bool, bool, dict[str, Any]][source]

Take a step in the environment with the given action.

Parameters:

action (ActType) – The action to take in the environment.

Returns:

new_obs (dict[str,ObsType|int]) – The new observation after taking the action.
reward (SupportsFloat) – The reward received from the environment.
terminated (bool) – Whether the episode has terminated.
truncated (bool) – Whether the episode has been truncated.
info (dict[str, Any]) – Additional information from the step. Should contain the variable keys and values that define the atomic predicates.

class gym_tl_tools.wrapper.TLObservationRewardConfig(*, tl_spec: str = '', atomic_predicates: list[~gym_tl_tools.automaton.Predicate], var_value_info_generator_cls: str, var_value_info_generator_args: dict[str, ~typing.Any] = {}, reward_config: ~gym_tl_tools.wrapper.RewardConfig = RewardConfig(terminal_state_reward=5, state_trans_reward_scale=100, dense_reward=False, dense_reward_scale=0.01), early_termination: bool = True, parser: ~gym_tl_tools.parser.Parser = <gym_tl_tools.parser.Parser object>, dict_aut_state_key: str = 'aut_state')[source]

Bases: BaseModel, Generic[ObsType, ActType]

Configuration for the TLObservationReward wrapper.

This class defines the parameters used to configure the TLObservationReward wrapper, including the temporal logic specification, atomic predicates, variable value generator, reward configuration, and other settings.

tl_spec

The temporal logic specification (e.g., LTL formula) to be used for the automaton.

Type:: str

atomic_predicates

List of atomic predicates used in the TL formula.

Type:: list[Predicate]

var_value_info_generator

An instance of a subclass of BaseVarValueInfoGenerator that extracts variable values from the environment’s observation and info.

Type:: BaseVarValueInfoGenerator[ObsType, ActType]

reward_config

Configuration for the reward structure.

Type:: RewardConfigDict

early_termination

Whether to terminate episodes when automaton reaches terminal states.

Type:: bool

parser

Parser for TL expressions.

Type:: Parser

dict_aut_state_key

Key for the automaton state in the observation dictionary.

Type:: str

atomic_predicates: list[Predicate]

dict_aut_state_key: str

early_termination: bool

model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

parser: Parser

reward_config: RewardConfig

serialize_model(info: SerializationInfo | None = None) → dict[str, Any][source]: Custom model serializer to handle atomic_predicates serialization with context.

tl_spec: str

property var_value_info_generator: BaseVarValueInfoGenerator[ObsType, ActType]

Instantiate the variable value info generator based on the provided class and arguments.

Returns:: An instance of the variable value info generator.
Return type:: BaseVarValueInfoGenerator[ObsType, ActType]

var_value_info_generator_args: dict[str, Any]

var_value_info_generator_cls: str