gym_tl_tools.wrapper

Classes

BaseVarValueInfoGenerator()

Base class for generating variable values from the environment's observation and info.

RewardConfig(*[, terminal_state_reward, ...])

RewardConfigDict

Configuration dictionary for the reward structure in the TLObservationReward wrapper.

TLObservationReward(env, ...)

A wrapper for Gymnasium environments that augments observations with the state of a temporal logic automaton, and computes rewards based on satisfaction of temporal logic (TL) specifications.

TLObservationRewardConfig(*, tl_spec, ...)

Configuration for the TLObservationReward wrapper.

class gym_tl_tools.wrapper.BaseVarValueInfoGenerator[source]

Bases: Generic[ObsType, ActType], ABC

Base class for generating variable values from the environment’s observation and info.

This class should be subclassed to implement custom logic for extracting variable values that are used to evaluate atomic predicates in temporal logic formulas.

The get_var_values method should return a dictionary where keys are variable names used in predicate formulas (e.g., “d_robot_goal”, “d_robot_obstacle”) and values are their corresponding numerical values extracted from the environment’s observation and info.

Example

```python class MyVarValueGenerator(BaseVarValueInfoGenerator):

def get_var_values(self, env, obs, info):
return {

“d_robot_goal”: info.get(“d_robot_goal”, float(‘inf’)), “d_robot_obstacle”: obs.get(“obstacle_distance”, 0.0), “robot_speed”: np.linalg.norm(obs[“velocity”]) if “velocity” in obs else 0.0,

}

```

abstractmethod get_var_values(env: Env[ObsType, ActType], obs: ObsType, info: dict[str, Any]) dict[str, Any][source]

Extract variable values from the environment’s observation and info.

This method should be implemented by subclasses to provide the variable values needed for evaluating atomic predicates in temporal logic formulas.

Parameters:
  • env (Env[ObsType, ActType]) – The environment from which to extract variable values.

  • obs (ObsType) – The current observation from the environment.

  • info (dict[str, Any]) – The info dictionary containing additional information from the environment.

Returns:

A dictionary where keys are variable names used in predicate formulas and values are their corresponding numerical values.

Return type:

dict[str, Any]

class gym_tl_tools.wrapper.RewardConfig(*, terminal_state_reward: float = 5, state_trans_reward_scale: float = 100, dense_reward: bool = False, dense_reward_scale: float = 0.01)[source]

Bases: BaseModel

dense_reward: bool
dense_reward_scale: float
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

state_trans_reward_scale: float
terminal_state_reward: float
class gym_tl_tools.wrapper.RewardConfigDict[source]

Bases: TypedDict

Configuration dictionary for the reward structure in the TLObservationReward wrapper. This dictionary is used to define how rewards are computed based on the automaton’s state and the environment’s info.

terminal_state_reward

Reward given when the automaton reaches a terminal state in addition to the automaton state transition reward.

Type:

float

state_trans_reward_scale

Scale factor for the reward based on the automaton’s state transition robustness. This is applied to the robustness computed from the automaton’s transition. If the transition leads to a trap state, the reward is set to be negative, scaled by this factor.

Type:

float

dense_reward

Whether to use dense rewards (True) or sparse rewards (False). Dense rewards provide a continuous reward signal based on the robustness of the transition to the next non-trap state.

Type:

bool

dense_reward_scale

Scale factor for the dense reward. This is applied to the computed dense reward based on the robustness to the next non-trap automaton’s state. If dense rewards are enabled, this factor scales the reward returned by the automaton.

Type:

float

dense_reward: bool
dense_reward_scale: float
state_trans_reward_scale: float
terminal_state_reward: float
class gym_tl_tools.wrapper.TLObservationReward(env: ~gymnasium.core.Env[~gymnasium.core.ObsType, ~gymnasium.core.ActType], tl_spec: str, atomic_predicates: list[~gym_tl_tools.automaton.Predicate] | list[dict[str, str]], var_value_info_generator: ~gym_tl_tools.wrapper.BaseVarValueInfoGenerator[~gymnasium.core.ObsType, ~gymnasium.core.ActType], reward_config: ~gym_tl_tools.wrapper.RewardConfigDict = {'dense_reward': False, 'dense_reward_scale': 0.01, 'state_trans_reward_scale': 100, 'terminal_state_reward': 5}, early_termination: bool = True, parser: ~gym_tl_tools.parser.Parser = <gym_tl_tools.parser.Parser object>, dict_aut_state_key: str = 'aut_state')[source]

Bases: ObservationWrapper[dict[str, ObsType | int64 | ndarray[tuple[Any, …], dtype[number]]], ActType, ObsType], RecordConstructorArgs, Generic[ObsType, ActType]

A wrapper for Gymnasium environments that augments observations with the state of a temporal logic automaton, and computes rewards based on satisfaction of temporal logic (TL) specifications.

This wrapper is designed for environments where the agent’s objective is specified using temporal logic (e.g., LTL). It integrates an automaton (from a TL formula) into the observation and reward structure, enabling RL agents to learn tasks with complex temporal requirements.

Usage

  1. Define Atomic Predicates:

    Create a list of Predicate objects, each representing an atomic proposition in your TL formula. Each predicate has a name (used in the TL formula) and a formula (Boolean expression string). Example:

    ```python from gym_tl_tools import Predicate atomic_predicates = [

    Predicate(name=”goal_reached”, formula=”d_robot_goal < 1.0”), Predicate(name=”obstacle_hit”, formula=”d_robot_obstacle < 1.0”),

  2. Create a Variable Value Generator:

    Implement a subclass of BaseVarValueInfoGenerator to extract variable values from the environment’s observation and info for evaluating the atomic predicates. Example:

    ```python from gym_tl_tools import BaseVarValueInfoGenerator

    class MyVarValueGenerator(BaseVarValueInfoGenerator):
    def get_var_values(self, env, obs, info):

    # Extract variables needed for predicate evaluation return {

    “d_robot_goal”: info.get(“d_robot_goal”, float(‘inf’)), “d_robot_obstacle”: info.get(“d_robot_obstacle”, float(‘inf’)),

    }

    var_generator = MyVarValueGenerator() ```

  3. Specify the Temporal Logic Formula:

    Write your TL specification as a string, using the names of your atomic predicates. Example:

    `python tl_spec = "F(goal_reached) & G(!obstacle_hit)" `

  4. Wrap Your Environment:

    Pass your environment, TL specification, atomic predicates, and variable value generator to TLObservationReward. Example:

    ```python from gym_tl_tools import TLObservationReward wrapped_env = TLObservationReward(

    env, tl_spec=tl_spec, atomic_predicates=atomic_predicates, var_value_info_generator=var_generator,

  5. Observation Structure:
  6. Reward Calculation:
    • At each step, the wrapper computes the reward based on the automaton’s transition, which reflects progress toward (or violation of) the TL specification.

    • The automaton state is updated according to the values of the atomic predicates, which are evaluated using the variable values provided by the var_value_info_generator.

  7. Reset and Step:
    • On reset(), the automaton is reset to its initial state, and the initial observation is augmented.

    • On step(action), the automaton transitions based on the variable values extracted by the var_value_info_generator, and the reward is computed accordingly.

Notes

  • The wrapper adds additional information to the info dict, including:
    • is_success: Whether the automaton has reached a goal state.

    • is_failure: Whether the automaton has reached a trap state.

    • is_aut_terminated: Whether the automaton has been terminated.

param env:

The environment to wrap.

type env:

gymnasium.Env

param tl_spec:

The temporal logic specification (e.g., LTL formula) to be used for the automaton.

type tl_spec:

str

param atomic_predicates:

List of atomic predicates used in the TL formula. Each predicate has a name (used in the TL formula) and a formula (Boolean expression string for evaluation).

type atomic_predicates:

list[gym_tl_tools.Predicate]

param var_value_info_generator:

An instance of a subclass of BaseVarValueInfoGenerator that extracts variable values from the environment’s observation and info for evaluating atomic predicates.

type var_value_info_generator:

BaseVarValueInfoGenerator[ObsType, ActType]

param reward_config:

Configuration for the reward structure (default values provided).

type reward_config:

RewardConfigDict, optional

param early_termination:

Whether to terminate episodes when automaton reaches terminal states (default: True).

type early_termination:

bool, optional

param parser:

Parser for TL expressions (default: new instance of Parser).

type parser:

Parser, optional

param dict_aut_state_key:

Key for the automaton state in the observation dictionary (default: “aut_state”).

type dict_aut_state_key:

str, optional

Example

```python from gym_tl_tools import Predicate, BaseVarValueInfoGenerator, TLObservationReward

# Define atomic predicates atomic_predicates = [

Predicate(name=”goal_reached”, formula=”d_robot_goal < 1.0”), Predicate(name=”obstacle_hit”, formula=”d_robot_obstacle < 1.0”),

]

# Create variable value generator class MyVarValueGenerator(BaseVarValueInfoGenerator):

def get_var_values(self, env, obs, info):
return {

“d_robot_goal”: info.get(“d_robot_goal”, float(‘inf’)), “d_robot_obstacle”: info.get(“d_robot_obstacle”, float(‘inf’)),

}

var_generator = MyVarValueGenerator() tl_spec = “F(goal_reached) & G(!obstacle_hit)”

wrapped_env = TLObservationReward(

env, tl_spec=tl_spec, atomic_predicates=atomic_predicates, var_value_info_generator=var_generator,

)

obs, info = wrapped_env.reset() done = False while not done:

action = wrapped_env.action_space.sample() obs, reward, terminated, truncated, info = wrapped_env.step(action) done = terminated or truncated

```

forward_aut(obs: ObsType, info: dict[str, Any]) None[source]

Forward the automaton based on the current observation and info.

This method updates the automaton state based on the current observation and info. It is useful for manually stepping the automaton without taking an action in the environment.

Parameters:
  • obs (ObsType) – The current observation from the environment.

  • info (dict[str, Any]) – The info dictionary containing variable values for the atomic predicates.

property is_aut_terminated: bool

Check if the automaton is in a terminal state.

Returns:

True if the automaton is in a terminal state, False otherwise.

Return type:

bool

observation(observation: ObsType) dict[str, ObsType | int64 | ndarray[tuple[Any, ...], dtype[_ScalarT]]][source]

Process the observation to include the automaton state.

Parameters:

observation (ObsType) – The original observation from the environment.

Returns:

new_obs – The processed observation with the automaton state appended.

Return type:

dict[str,ObsType|int]

reset(*, seed: int | None = None, options: dict[str, Any] | None = {'forward_aut_on_reset': True, 'info': {}, 'obs': {}}) tuple[dict[str, ObsType | int64 | ndarray[tuple[Any, ...], dtype[_ScalarT]]], dict[str, Any]][source]

Reset the environment and return the initial observation.

Parameters:
  • seed (int | None, optional) – Random seed for reproducibility.

  • options (dict[str, Any] | None, optional) – Additional options for resetting the environment.

Returns:

  • new_obs (dict[str,ObsType|int]) – The initial observation with the automaton state.

  • info (dict[str, Any]) – Additional information from the reset. Should contain the variable keys and values that define the atomic predicates.

step(action: ActType) tuple[dict[str, ObsType | int64 | ndarray[tuple[Any, ...], dtype[_ScalarT]]], SupportsFloat, bool, bool, dict[str, Any]][source]

Take a step in the environment with the given action.

Parameters:

action (ActType) – The action to take in the environment.

Returns:

  • new_obs (dict[str,ObsType|int]) – The new observation after taking the action.

  • reward (SupportsFloat) – The reward received from the environment.

  • terminated (bool) – Whether the episode has terminated.

  • truncated (bool) – Whether the episode has been truncated.

  • info (dict[str, Any]) – Additional information from the step. Should contain the variable keys and values that define the atomic predicates.

class gym_tl_tools.wrapper.TLObservationRewardConfig(*, tl_spec: str = '', atomic_predicates: list[~gym_tl_tools.automaton.Predicate], var_value_info_generator_cls: str, var_value_info_generator_args: dict[str, ~typing.Any] = {}, reward_config: ~gym_tl_tools.wrapper.RewardConfig = RewardConfig(terminal_state_reward=5, state_trans_reward_scale=100, dense_reward=False, dense_reward_scale=0.01), early_termination: bool = True, parser: ~gym_tl_tools.parser.Parser = <gym_tl_tools.parser.Parser object>, dict_aut_state_key: str = 'aut_state')[source]

Bases: BaseModel, Generic[ObsType, ActType]

Configuration for the TLObservationReward wrapper.

This class defines the parameters used to configure the TLObservationReward wrapper, including the temporal logic specification, atomic predicates, variable value generator, reward configuration, and other settings.

tl_spec

The temporal logic specification (e.g., LTL formula) to be used for the automaton.

Type:

str

atomic_predicates

List of atomic predicates used in the TL formula.

Type:

list[Predicate]

var_value_info_generator

An instance of a subclass of BaseVarValueInfoGenerator that extracts variable values from the environment’s observation and info.

Type:

BaseVarValueInfoGenerator[ObsType, ActType]

reward_config

Configuration for the reward structure.

Type:

RewardConfigDict

early_termination

Whether to terminate episodes when automaton reaches terminal states.

Type:

bool

parser

Parser for TL expressions.

Type:

Parser

dict_aut_state_key

Key for the automaton state in the observation dictionary.

Type:

str

atomic_predicates: list[Predicate]
dict_aut_state_key: str
early_termination: bool
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

parser: Parser
reward_config: RewardConfig
serialize_model(info: SerializationInfo | None = None) dict[str, Any][source]

Custom model serializer to handle atomic_predicates serialization with context.

tl_spec: str
property var_value_info_generator: BaseVarValueInfoGenerator[ObsType, ActType]

Instantiate the variable value info generator based on the provided class and arguments.

Returns:

An instance of the variable value info generator.

Return type:

BaseVarValueInfoGenerator[ObsType, ActType]

var_value_info_generator_args: dict[str, Any]
var_value_info_generator_cls: str