API Reference

This page contains the complete API reference for gym-tl-tools.

Core Classes

class gym_tl_tools.Automaton(tl_spec: str, atomic_predicates: list[~gym_tl_tools.automaton.Predicate], parser: ~gym_tl_tools.parser.Parser = <gym_tl_tools.parser.Parser object>)[source]

Bases: object

property is_goaled: bool

Check if the current state of the automaton is a goal state. :returns: True if the current state is a goal state, False otherwise. :rtype: bool

property is_terminated: bool

Check if the automaton is in a terminal state (goal or trap). :returns: True if the automaton is in a terminal state, False otherwise. :rtype: bool

property is_trapped: bool

Check if the current state of the automaton is a trap state. :returns: True if the current state is a trap state, False otherwise. :rtype: bool

reset(seed: int | None = None) None[source]

Reset the automaton to its initial state.

step(var_value_dict: dict[str, float], terminal_state_reward: float = 5, state_trans_reward_scale: float = 100, dense_reward: bool = False, dense_reward_scale: float = 0.01) tuple[float, int][source]

Step the automaton to the next state based on the current predicate values. This function updates the current state of the automaton based on the values of variables defining atomic predicates provided in var_value_dict.

Parameters:
  • var_value_dict (dict[str, float]) – A dictionary mapping the variable names used in the atomic predicate definitions to their current values. e.g. {“d_robot_goal”: 3.0, “d_robot_obstacle”: 1.0, “d_robot_goal”: 0.5}.

  • terminal_state_reward (float = 5)

Returns:

  • reward (float) – The reward for the current step.

  • new_state (int) – The new automaton state.

tl_reward(ap_rob_dict: dict[str, float], curr_aut_state: int, terminal_state_reward: float = 5, state_trans_reward_scale: float = 100, dense_reward: bool = False, dense_reward_scale: float = 0.01) tuple[float, int][source]

Calculate the reward of the step from a given automaton.

Parameters:
  • ap_rob_dict (dict[str, float]) – A dictionary mapping the names of atomic predicates to their robustness values.

  • curr_aut_state (int) – The current state of the automaton.

  • dense_reward (bool) – If True, the reward is calculated based on the maximum robustness of non-trap transitions.

  • terminal_state_reward (float) – The reward given when the automaton reaches a terminal state.

Returns:

  • reward (float) – The reward of the step based on the MDP and automaton states.

  • next_aut_state (int) – The resultant automaton state after the step.

transition_robustness(transitions: list[Transition], ap_rob_dict: dict[str, float]) tuple[list[RobNextStatePair], list[RobNextStatePair], list[RobNextStatePair], list[RobNextStatePair], list[RobNextStatePair]][source]

Calculate the robustness of transitions in the automaton. This function computes the robustness values for each transition based on the conditions of the transitions and the robustness values of the atomic predicates.

Parameters:
  • transitions (list[Transition]) – A list of transitions in the automaton.

  • ap_rob_dict (dict[str, float]) – A dictionary mapping the names of atomic predicates to their robustness values.

Returns:

  • pos_trap_rob_trans_pairs (list[RobNextStatePair]) – A list of pairs of positive robustness values and transitions that lead to trap states.

  • pos_non_trap_rob_trans_pairs (list[RobNextStatePair]) – A list of pairs of positive robustness values and transitions that do not lead to trap states.

  • zero_trap_rob_trans_pairs (list[RobNextStatePair]) – A list of pairs of zero robustness values and transitions that lead to trap states.

  • zero_non_trap_rob_trans_pairs (list[RobNextStatePair]) – A list of pairs of zero robustness values and transitions that do not lead to trap states.

class gym_tl_tools.BaseVarValueInfoGenerator[source]

Bases: Generic[ObsType, ActType], ABC

Base class for generating variable values from the environment’s observation and info.

This class should be subclassed to implement custom logic for extracting variable values that are used to evaluate atomic predicates in temporal logic formulas.

The get_var_values method should return a dictionary where keys are variable names used in predicate formulas (e.g., “d_robot_goal”, “d_robot_obstacle”) and values are their corresponding numerical values extracted from the environment’s observation and info.

Example

```python class MyVarValueGenerator(BaseVarValueInfoGenerator):

def get_var_values(self, env, obs, info):
return {

“d_robot_goal”: info.get(“d_robot_goal”, float(‘inf’)), “d_robot_obstacle”: obs.get(“obstacle_distance”, 0.0), “robot_speed”: np.linalg.norm(obs[“velocity”]) if “velocity” in obs else 0.0,

}

```

abstractmethod get_var_values(env: Env[ObsType, ActType], obs: ObsType, info: dict[str, Any]) dict[str, Any][source]

Extract variable values from the environment’s observation and info.

This method should be implemented by subclasses to provide the variable values needed for evaluating atomic predicates in temporal logic formulas.

Parameters:
  • env (Env[ObsType, ActType]) – The environment from which to extract variable values.

  • obs (ObsType) – The current observation from the environment.

  • info (dict[str, Any]) – The info dictionary containing additional information from the environment.

Returns:

A dictionary where keys are variable names used in predicate formulas and values are their corresponding numerical values.

Return type:

dict[str, Any]

class gym_tl_tools.Parser[source]

Bases: object

evaluate(parsed_tokens: list[str], var_dict: dict[str, float]) float[source]

Checking from the start, get tokens from the stack and compute there if there is a symbol

get_func(s: str) Callable[[...], Any][source]
get_priority(s: str) int[source]
is_num(s: str) bool[source]
is_parentheses(s: str, **kwargs) bool[source]
is_symbol(s: Any) bool[source]
is_var(s: str, var_names: list[str]) bool[source]
parse(token_list: list[str], var_names: list[str]) list[str][source]

Convert the token to Reverse Polish Notation

tl2rob(spec: str, var_dict: dict[str, float]) float[source]

Parsing a TL spec to its robustness

Parameters:
  • spec (str) – The TL spec to be parsed. e.g. “psi_1 & psi_2 | !psi_3”.

  • var_dict (dict[str, float]) – A dictionary mapping the variable names used in the TL spec to their current values. The keys should match the names of the atomic predicates defined in the spec. e.g. {“d_robot_goal”: 3.0, “d_robot_obstacle”: 1.0, “d_robot_goal”: 0.5}.

Returns:

robustness – The robustness value of the TL spec given the variable values. A positive value indicates satisfaction, while a negative value indicates violation.

Return type:

float

tokenize(spec: str) list[str][source]
class gym_tl_tools.ParserSymbol(*, priority: int, func: Callable[[...], Any])[source]

Bases: BaseModel

Represents a symbol in the parser with its priority and function.

priority

The priority of the symbol, used for parsing precedence.

Type:

int

func

The function that implements the operation of the symbol. It should take the appropriate number of arguments and return a value.

Type:

Callable[…, Any]

func: Callable[[...], Any]
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

priority: int
class gym_tl_tools.Predicate(*, name: str, formula: str)[source]

Bases: BaseModel

Represents a predicate in a TL formula.

name

The name of the atomic predicate. e.g. “psi_goal_robot”.

Type:

str

formula

The formula of the atomic predicate, which can be a Boolean expression. e.g. “d_robot_goal < 5”.

Type:

str

formula: str
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str
class gym_tl_tools.RewardConfig(*, terminal_state_reward: float = 5, state_trans_reward_scale: float = 100, dense_reward: bool = False, dense_reward_scale: float = 0.01)[source]

Bases: BaseModel

dense_reward: bool
dense_reward_scale: float
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

state_trans_reward_scale: float
terminal_state_reward: float
class gym_tl_tools.RewardConfigDict[source]

Bases: TypedDict

Configuration dictionary for the reward structure in the TLObservationReward wrapper. This dictionary is used to define how rewards are computed based on the automaton’s state and the environment’s info.

terminal_state_reward

Reward given when the automaton reaches a terminal state in addition to the automaton state transition reward.

Type:

float

state_trans_reward_scale

Scale factor for the reward based on the automaton’s state transition robustness. This is applied to the robustness computed from the automaton’s transition. If the transition leads to a trap state, the reward is set to be negative, scaled by this factor.

Type:

float

dense_reward

Whether to use dense rewards (True) or sparse rewards (False). Dense rewards provide a continuous reward signal based on the robustness of the transition to the next non-trap state.

Type:

bool

dense_reward_scale

Scale factor for the dense reward. This is applied to the computed dense reward based on the robustness to the next non-trap automaton’s state. If dense rewards are enabled, this factor scales the reward returned by the automaton.

Type:

float

dense_reward: bool
dense_reward_scale: float
state_trans_reward_scale: float
terminal_state_reward: float
class gym_tl_tools.TLObservationReward(env: ~gymnasium.core.Env[~gymnasium.core.ObsType, ~gymnasium.core.ActType], tl_spec: str, atomic_predicates: list[~gym_tl_tools.automaton.Predicate] | list[dict[str, str]], var_value_info_generator: ~gym_tl_tools.wrapper.BaseVarValueInfoGenerator[~gymnasium.core.ObsType, ~gymnasium.core.ActType], reward_config: ~gym_tl_tools.wrapper.RewardConfigDict = {'dense_reward': False, 'dense_reward_scale': 0.01, 'state_trans_reward_scale': 100, 'terminal_state_reward': 5}, early_termination: bool = True, parser: ~gym_tl_tools.parser.Parser = <gym_tl_tools.parser.Parser object>, dict_aut_state_key: str = 'aut_state')[source]

Bases: ObservationWrapper[dict[str, ObsType | int64 | ndarray[tuple[Any, …], dtype[number]]], ActType, ObsType], RecordConstructorArgs, Generic[ObsType, ActType]

A wrapper for Gymnasium environments that augments observations with the state of a temporal logic automaton, and computes rewards based on satisfaction of temporal logic (TL) specifications.

This wrapper is designed for environments where the agent’s objective is specified using temporal logic (e.g., LTL). It integrates an automaton (from a TL formula) into the observation and reward structure, enabling RL agents to learn tasks with complex temporal requirements.

Usage

  1. Define Atomic Predicates:

    Create a list of Predicate objects, each representing an atomic proposition in your TL formula. Each predicate has a name (used in the TL formula) and a formula (Boolean expression string). Example:

    ```python from gym_tl_tools import Predicate atomic_predicates = [

    Predicate(name=”goal_reached”, formula=”d_robot_goal < 1.0”), Predicate(name=”obstacle_hit”, formula=”d_robot_obstacle < 1.0”),

  2. Create a Variable Value Generator:

    Implement a subclass of BaseVarValueInfoGenerator to extract variable values from the environment’s observation and info for evaluating the atomic predicates. Example:

    ```python from gym_tl_tools import BaseVarValueInfoGenerator

    class MyVarValueGenerator(BaseVarValueInfoGenerator):
    def get_var_values(self, env, obs, info):

    # Extract variables needed for predicate evaluation return {

    “d_robot_goal”: info.get(“d_robot_goal”, float(‘inf’)), “d_robot_obstacle”: info.get(“d_robot_obstacle”, float(‘inf’)),

    }

    var_generator = MyVarValueGenerator() ```

  3. Specify the Temporal Logic Formula:

    Write your TL specification as a string, using the names of your atomic predicates. Example:

    `python tl_spec = "F(goal_reached) & G(!obstacle_hit)" `

  4. Wrap Your Environment:

    Pass your environment, TL specification, atomic predicates, and variable value generator to TLObservationReward. Example:

    ```python from gym_tl_tools import TLObservationReward wrapped_env = TLObservationReward(

    env, tl_spec=tl_spec, atomic_predicates=atomic_predicates, var_value_info_generator=var_generator,

  5. Observation Structure:
  6. Reward Calculation:
    • At each step, the wrapper computes the reward based on the automaton’s transition, which reflects progress toward (or violation of) the TL specification.

    • The automaton state is updated according to the values of the atomic predicates, which are evaluated using the variable values provided by the var_value_info_generator.

  7. Reset and Step:
    • On reset(), the automaton is reset to its initial state, and the initial observation is augmented.

    • On step(action), the automaton transitions based on the variable values extracted by the var_value_info_generator, and the reward is computed accordingly.

Notes

  • The wrapper adds additional information to the info dict, including:
    • is_success: Whether the automaton has reached a goal state.

    • is_failure: Whether the automaton has reached a trap state.

    • is_aut_terminated: Whether the automaton has been terminated.

param env:

The environment to wrap.

type env:

gymnasium.Env

param tl_spec:

The temporal logic specification (e.g., LTL formula) to be used for the automaton.

type tl_spec:

str

param atomic_predicates:

List of atomic predicates used in the TL formula. Each predicate has a name (used in the TL formula) and a formula (Boolean expression string for evaluation).

type atomic_predicates:

list[gym_tl_tools.Predicate]

param var_value_info_generator:

An instance of a subclass of BaseVarValueInfoGenerator that extracts variable values from the environment’s observation and info for evaluating atomic predicates.

type var_value_info_generator:

BaseVarValueInfoGenerator[ObsType, ActType]

param reward_config:

Configuration for the reward structure (default values provided).

type reward_config:

RewardConfigDict, optional

param early_termination:

Whether to terminate episodes when automaton reaches terminal states (default: True).

type early_termination:

bool, optional

param parser:

Parser for TL expressions (default: new instance of Parser).

type parser:

Parser, optional

param dict_aut_state_key:

Key for the automaton state in the observation dictionary (default: “aut_state”).

type dict_aut_state_key:

str, optional

Example

```python from gym_tl_tools import Predicate, BaseVarValueInfoGenerator, TLObservationReward

# Define atomic predicates atomic_predicates = [

Predicate(name=”goal_reached”, formula=”d_robot_goal < 1.0”), Predicate(name=”obstacle_hit”, formula=”d_robot_obstacle < 1.0”),

]

# Create variable value generator class MyVarValueGenerator(BaseVarValueInfoGenerator):

def get_var_values(self, env, obs, info):
return {

“d_robot_goal”: info.get(“d_robot_goal”, float(‘inf’)), “d_robot_obstacle”: info.get(“d_robot_obstacle”, float(‘inf’)),

}

var_generator = MyVarValueGenerator() tl_spec = “F(goal_reached) & G(!obstacle_hit)”

wrapped_env = TLObservationReward(

env, tl_spec=tl_spec, atomic_predicates=atomic_predicates, var_value_info_generator=var_generator,

)

obs, info = wrapped_env.reset() done = False while not done:

action = wrapped_env.action_space.sample() obs, reward, terminated, truncated, info = wrapped_env.step(action) done = terminated or truncated

```

forward_aut(obs: ObsType, info: dict[str, Any]) None[source]

Forward the automaton based on the current observation and info.

This method updates the automaton state based on the current observation and info. It is useful for manually stepping the automaton without taking an action in the environment.

Parameters:
  • obs (ObsType) – The current observation from the environment.

  • info (dict[str, Any]) – The info dictionary containing variable values for the atomic predicates.

property is_aut_terminated: bool

Check if the automaton is in a terminal state.

Returns:

True if the automaton is in a terminal state, False otherwise.

Return type:

bool

observation(observation: ObsType) dict[str, ObsType | int64 | ndarray[tuple[Any, ...], dtype[_ScalarT]]][source]

Process the observation to include the automaton state.

Parameters:

observation (ObsType) – The original observation from the environment.

Returns:

new_obs – The processed observation with the automaton state appended.

Return type:

dict[str,ObsType|int]

reset(*, seed: int | None = None, options: dict[str, Any] | None = {'forward_aut_on_reset': True, 'info': {}, 'obs': {}}) tuple[dict[str, ObsType | int64 | ndarray[tuple[Any, ...], dtype[_ScalarT]]], dict[str, Any]][source]

Reset the environment and return the initial observation.

Parameters:
  • seed (int | None, optional) – Random seed for reproducibility.

  • options (dict[str, Any] | None, optional) – Additional options for resetting the environment.

Returns:

  • new_obs (dict[str,ObsType|int]) – The initial observation with the automaton state.

  • info (dict[str, Any]) – Additional information from the reset. Should contain the variable keys and values that define the atomic predicates.

step(action: ActType) tuple[dict[str, ObsType | int64 | ndarray[tuple[Any, ...], dtype[_ScalarT]]], SupportsFloat, bool, bool, dict[str, Any]][source]

Take a step in the environment with the given action.

Parameters:

action (ActType) – The action to take in the environment.

Returns:

  • new_obs (dict[str,ObsType|int]) – The new observation after taking the action.

  • reward (SupportsFloat) – The reward received from the environment.

  • terminated (bool) – Whether the episode has terminated.

  • truncated (bool) – Whether the episode has been truncated.

  • info (dict[str, Any]) – Additional information from the step. Should contain the variable keys and values that define the atomic predicates.

class gym_tl_tools.TLObservationRewardConfig(*, tl_spec: str = '', atomic_predicates: list[~gym_tl_tools.automaton.Predicate], var_value_info_generator_cls: str, var_value_info_generator_args: dict[str, ~typing.Any] = {}, reward_config: ~gym_tl_tools.wrapper.RewardConfig = RewardConfig(terminal_state_reward=5, state_trans_reward_scale=100, dense_reward=False, dense_reward_scale=0.01), early_termination: bool = True, parser: ~gym_tl_tools.parser.Parser = <gym_tl_tools.parser.Parser object>, dict_aut_state_key: str = 'aut_state')[source]

Bases: BaseModel, Generic[ObsType, ActType]

Configuration for the TLObservationReward wrapper.

This class defines the parameters used to configure the TLObservationReward wrapper, including the temporal logic specification, atomic predicates, variable value generator, reward configuration, and other settings.

tl_spec

The temporal logic specification (e.g., LTL formula) to be used for the automaton.

Type:

str

atomic_predicates

List of atomic predicates used in the TL formula.

Type:

list[Predicate]

var_value_info_generator

An instance of a subclass of BaseVarValueInfoGenerator that extracts variable values from the environment’s observation and info.

Type:

BaseVarValueInfoGenerator[ObsType, ActType]

reward_config

Configuration for the reward structure.

Type:

RewardConfigDict

early_termination

Whether to terminate episodes when automaton reaches terminal states.

Type:

bool

parser

Parser for TL expressions.

Type:

Parser

dict_aut_state_key

Key for the automaton state in the observation dictionary.

Type:

str

atomic_predicates: list[Predicate]
dict_aut_state_key: str
early_termination: bool
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

parser: Parser
reward_config: RewardConfig
serialize_model(info: SerializationInfo | None = None) dict[str, Any][source]

Custom model serializer to handle atomic_predicates serialization with context.

tl_spec: str
property var_value_info_generator: BaseVarValueInfoGenerator[ObsType, ActType]

Instantiate the variable value info generator based on the provided class and arguments.

Returns:

An instance of the variable value info generator.

Return type:

BaseVarValueInfoGenerator[ObsType, ActType]

var_value_info_generator_args: dict[str, Any]
var_value_info_generator_cls: str
gym_tl_tools.replace_special_characters(spec: str) str[source]

Replace special characters in the spec with underscores so that it can be used as a file name.

Parameters:

spec (str) – spec to be replaced. The spec can contain special characters like spaces, &, |, etc. These characters are replaced with underscores to create a valid file name.

Returns:

replaced_spec – spec with special characters replaced by underscores

Return type:

str

Examples

`python spec = replace_special_characters("F(psi_1 | psi_2) & G(!psi_3)") print(spec) # Output: "F(psi_1_or_psi_2)_and_G(!psi_3)" `

TLObservationReward Wrapper

class gym_tl_tools.TLObservationReward(env: ~gymnasium.core.Env[~gymnasium.core.ObsType, ~gymnasium.core.ActType], tl_spec: str, atomic_predicates: list[~gym_tl_tools.automaton.Predicate] | list[dict[str, str]], var_value_info_generator: ~gym_tl_tools.wrapper.BaseVarValueInfoGenerator[~gymnasium.core.ObsType, ~gymnasium.core.ActType], reward_config: ~gym_tl_tools.wrapper.RewardConfigDict = {'dense_reward': False, 'dense_reward_scale': 0.01, 'state_trans_reward_scale': 100, 'terminal_state_reward': 5}, early_termination: bool = True, parser: ~gym_tl_tools.parser.Parser = <gym_tl_tools.parser.Parser object>, dict_aut_state_key: str = 'aut_state')[source]

Bases: ObservationWrapper[dict[str, ObsType | int64 | ndarray[tuple[Any, …], dtype[number]]], ActType, ObsType], RecordConstructorArgs, Generic[ObsType, ActType]

A wrapper for Gymnasium environments that augments observations with the state of a temporal logic automaton, and computes rewards based on satisfaction of temporal logic (TL) specifications.

This wrapper is designed for environments where the agent’s objective is specified using temporal logic (e.g., LTL). It integrates an automaton (from a TL formula) into the observation and reward structure, enabling RL agents to learn tasks with complex temporal requirements.

Usage

  1. Define Atomic Predicates:

    Create a list of Predicate objects, each representing an atomic proposition in your TL formula. Each predicate has a name (used in the TL formula) and a formula (Boolean expression string). Example:

    ```python from gym_tl_tools import Predicate atomic_predicates = [

    Predicate(name=”goal_reached”, formula=”d_robot_goal < 1.0”), Predicate(name=”obstacle_hit”, formula=”d_robot_obstacle < 1.0”),

  2. Create a Variable Value Generator:

    Implement a subclass of BaseVarValueInfoGenerator to extract variable values from the environment’s observation and info for evaluating the atomic predicates. Example:

    ```python from gym_tl_tools import BaseVarValueInfoGenerator

    class MyVarValueGenerator(BaseVarValueInfoGenerator):
    def get_var_values(self, env, obs, info):

    # Extract variables needed for predicate evaluation return {

    “d_robot_goal”: info.get(“d_robot_goal”, float(‘inf’)), “d_robot_obstacle”: info.get(“d_robot_obstacle”, float(‘inf’)),

    }

    var_generator = MyVarValueGenerator() ```

  3. Specify the Temporal Logic Formula:

    Write your TL specification as a string, using the names of your atomic predicates. Example:

    `python tl_spec = "F(goal_reached) & G(!obstacle_hit)" `

  4. Wrap Your Environment:

    Pass your environment, TL specification, atomic predicates, and variable value generator to TLObservationReward. Example:

    ```python from gym_tl_tools import TLObservationReward wrapped_env = TLObservationReward(

    env, tl_spec=tl_spec, atomic_predicates=atomic_predicates, var_value_info_generator=var_generator,

  5. Observation Structure:
  6. Reward Calculation:
    • At each step, the wrapper computes the reward based on the automaton’s transition, which reflects progress toward (or violation of) the TL specification.

    • The automaton state is updated according to the values of the atomic predicates, which are evaluated using the variable values provided by the var_value_info_generator.

  7. Reset and Step:
    • On reset(), the automaton is reset to its initial state, and the initial observation is augmented.

    • On step(action), the automaton transitions based on the variable values extracted by the var_value_info_generator, and the reward is computed accordingly.

Notes

  • The wrapper adds additional information to the info dict, including:
    • is_success: Whether the automaton has reached a goal state.

    • is_failure: Whether the automaton has reached a trap state.

    • is_aut_terminated: Whether the automaton has been terminated.

param env:

The environment to wrap.

type env:

gymnasium.Env

param tl_spec:

The temporal logic specification (e.g., LTL formula) to be used for the automaton.

type tl_spec:

str

param atomic_predicates:

List of atomic predicates used in the TL formula. Each predicate has a name (used in the TL formula) and a formula (Boolean expression string for evaluation).

type atomic_predicates:

list[gym_tl_tools.Predicate]

param var_value_info_generator:

An instance of a subclass of BaseVarValueInfoGenerator that extracts variable values from the environment’s observation and info for evaluating atomic predicates.

type var_value_info_generator:

BaseVarValueInfoGenerator[ObsType, ActType]

param reward_config:

Configuration for the reward structure (default values provided).

type reward_config:

RewardConfigDict, optional

param early_termination:

Whether to terminate episodes when automaton reaches terminal states (default: True).

type early_termination:

bool, optional

param parser:

Parser for TL expressions (default: new instance of Parser).

type parser:

Parser, optional

param dict_aut_state_key:

Key for the automaton state in the observation dictionary (default: “aut_state”).

type dict_aut_state_key:

str, optional

Example

```python from gym_tl_tools import Predicate, BaseVarValueInfoGenerator, TLObservationReward

# Define atomic predicates atomic_predicates = [

Predicate(name=”goal_reached”, formula=”d_robot_goal < 1.0”), Predicate(name=”obstacle_hit”, formula=”d_robot_obstacle < 1.0”),

]

# Create variable value generator class MyVarValueGenerator(BaseVarValueInfoGenerator):

def get_var_values(self, env, obs, info):
return {

“d_robot_goal”: info.get(“d_robot_goal”, float(‘inf’)), “d_robot_obstacle”: info.get(“d_robot_obstacle”, float(‘inf’)),

}

var_generator = MyVarValueGenerator() tl_spec = “F(goal_reached) & G(!obstacle_hit)”

wrapped_env = TLObservationReward(

env, tl_spec=tl_spec, atomic_predicates=atomic_predicates, var_value_info_generator=var_generator,

)

obs, info = wrapped_env.reset() done = False while not done:

action = wrapped_env.action_space.sample() obs, reward, terminated, truncated, info = wrapped_env.step(action) done = terminated or truncated

```

classmethod __class_getitem__(params)

Parameterizes a generic class.

At least, parameterizing a generic class is the main thing this method does. For example, for some generic class Foo, this is called when we do Foo[int] - there, with cls=Foo and params=int.

However, note that this method is also called when defining generic classes in the first place with class Foo(Generic[T]): ….

__enter__()

Support with-statement for the environment.

__exit__(*args: Any)

Support with-statement for the environment and closes the environment.

__repr__()

Returns the string representation of the wrapper.

__str__()

Returns the wrapper name and the env representation string.

property action_space: Space[ActType] | Space[WrapperActType]

Return the Env action_space unless overwritten then the wrapper action_space is used.

classmethod class_name() str

Returns the class name of the wrapper.

close()

Closes the wrapper and env.

forward_aut(obs: ObsType, info: dict[str, Any]) None[source]

Forward the automaton based on the current observation and info.

This method updates the automaton state based on the current observation and info. It is useful for manually stepping the automaton without taking an action in the environment.

Parameters:
  • obs (ObsType) – The current observation from the environment.

  • info (dict[str, Any]) – The info dictionary containing variable values for the atomic predicates.

get_wrapper_attr(name: str) Any

Gets an attribute from the wrapper and lower environments if name doesn’t exist in this object.

Parameters:

name – The variable name to get

Returns:

The variable with name in wrapper or lower environments

has_wrapper_attr(name: str) bool

Checks if the given attribute is within the wrapper or its environment.

property is_aut_terminated: bool

Check if the automaton is in a terminal state.

Returns:

True if the automaton is in a terminal state, False otherwise.

Return type:

bool

property metadata: dict[str, Any]

Returns the Env metadata.

property np_random: Generator

Returns the Env np_random attribute.

property np_random_seed: int | None

Returns the base environment’s np_random_seed.

observation(observation: ObsType) dict[str, ObsType | int64 | ndarray[tuple[Any, ...], dtype[_ScalarT]]][source]

Process the observation to include the automaton state.

Parameters:

observation (ObsType) – The original observation from the environment.

Returns:

new_obs – The processed observation with the automaton state appended.

Return type:

dict[str,ObsType|int]

property observation_space: Space[ObsType] | Space[WrapperObsType]

Return the Env observation_space unless overwritten then the wrapper observation_space is used.

render() RenderFrame | list[RenderFrame] | None

Uses the render() of the env that can be overwritten to change the returned data.

property render_mode: str | None

Returns the Env render_mode.

reset(*, seed: int | None = None, options: dict[str, Any] | None = {'forward_aut_on_reset': True, 'info': {}, 'obs': {}}) tuple[dict[str, ObsType | int64 | ndarray[tuple[Any, ...], dtype[_ScalarT]]], dict[str, Any]][source]

Reset the environment and return the initial observation.

Parameters:
  • seed (int | None, optional) – Random seed for reproducibility.

  • options (dict[str, Any] | None, optional) – Additional options for resetting the environment.

Returns:

  • new_obs (dict[str,ObsType|int]) – The initial observation with the automaton state.

  • info (dict[str, Any]) – Additional information from the reset. Should contain the variable keys and values that define the atomic predicates.

set_wrapper_attr(name: str, value: Any, *, force: bool = True) bool

Sets an attribute on this wrapper or lower environment if name is already defined.

Parameters:
  • name – The variable name

  • value – The new variable value

  • force – Whether to create the attribute on this wrapper if it does not exists on the lower environment instead of raising an exception

Returns:

If the variable has been set in this or a lower wrapper.

property spec: EnvSpec | None

Returns the Env spec attribute with the WrapperSpec if the wrapper inherits from EzPickle.

step(action: ActType) tuple[dict[str, ObsType | int64 | ndarray[tuple[Any, ...], dtype[_ScalarT]]], SupportsFloat, bool, bool, dict[str, Any]][source]

Take a step in the environment with the given action.

Parameters:

action (ActType) – The action to take in the environment.

Returns:

  • new_obs (dict[str,ObsType|int]) – The new observation after taking the action.

  • reward (SupportsFloat) – The reward received from the environment.

  • terminated (bool) – Whether the episode has terminated.

  • truncated (bool) – Whether the episode has been truncated.

  • info (dict[str, Any]) – Additional information from the step. Should contain the variable keys and values that define the atomic predicates.

property unwrapped: Env[ObsType, ActType]

Returns the base environment of the wrapper.

This will be the bare gymnasium.Env environment, underneath all layers of wrappers.

classmethod wrapper_spec(**kwargs: Any) WrapperSpec

Generates a WrapperSpec for the wrappers.

Variable Value Generator

class gym_tl_tools.BaseVarValueInfoGenerator[source]

Bases: Generic[ObsType, ActType], ABC

Base class for generating variable values from the environment’s observation and info.

This class should be subclassed to implement custom logic for extracting variable values that are used to evaluate atomic predicates in temporal logic formulas.

The get_var_values method should return a dictionary where keys are variable names used in predicate formulas (e.g., “d_robot_goal”, “d_robot_obstacle”) and values are their corresponding numerical values extracted from the environment’s observation and info.

Example

```python class MyVarValueGenerator(BaseVarValueInfoGenerator):

def get_var_values(self, env, obs, info):
return {

“d_robot_goal”: info.get(“d_robot_goal”, float(‘inf’)), “d_robot_obstacle”: obs.get(“obstacle_distance”, 0.0), “robot_speed”: np.linalg.norm(obs[“velocity”]) if “velocity” in obs else 0.0,

}

```

abstractmethod get_var_values(env: Env[ObsType, ActType], obs: ObsType, info: dict[str, Any]) dict[str, Any][source]

Extract variable values from the environment’s observation and info.

This method should be implemented by subclasses to provide the variable values needed for evaluating atomic predicates in temporal logic formulas.

Parameters:
  • env (Env[ObsType, ActType]) – The environment from which to extract variable values.

  • obs (ObsType) – The current observation from the environment.

  • info (dict[str, Any]) – The info dictionary containing additional information from the environment.

Returns:

A dictionary where keys are variable names used in predicate formulas and values are their corresponding numerical values.

Return type:

dict[str, Any]

Configuration Classes

class gym_tl_tools.TLObservationRewardConfig(*, tl_spec: str = '', atomic_predicates: list[~gym_tl_tools.automaton.Predicate], var_value_info_generator_cls: str, var_value_info_generator_args: dict[str, ~typing.Any] = {}, reward_config: ~gym_tl_tools.wrapper.RewardConfig = RewardConfig(terminal_state_reward=5, state_trans_reward_scale=100, dense_reward=False, dense_reward_scale=0.01), early_termination: bool = True, parser: ~gym_tl_tools.parser.Parser = <gym_tl_tools.parser.Parser object>, dict_aut_state_key: str = 'aut_state')[source]

Bases: BaseModel, Generic[ObsType, ActType]

Configuration for the TLObservationReward wrapper.

This class defines the parameters used to configure the TLObservationReward wrapper, including the temporal logic specification, atomic predicates, variable value generator, reward configuration, and other settings.

tl_spec

The temporal logic specification (e.g., LTL formula) to be used for the automaton.

Type:

str

atomic_predicates

List of atomic predicates used in the TL formula.

Type:

list[Predicate]

var_value_info_generator

An instance of a subclass of BaseVarValueInfoGenerator that extracts variable values from the environment’s observation and info.

Type:

BaseVarValueInfoGenerator[ObsType, ActType]

reward_config

Configuration for the reward structure.

Type:

RewardConfigDict

early_termination

Whether to terminate episodes when automaton reaches terminal states.

Type:

bool

parser

Parser for TL expressions.

Type:

Parser

dict_aut_state_key

Key for the automaton state in the observation dictionary.

Type:

str

atomic_predicates: list[Predicate]
dict_aut_state_key: str
early_termination: bool
model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

parser: Parser
reward_config: RewardConfig
serialize_model(info: SerializationInfo | None = None) dict[str, Any][source]

Custom model serializer to handle atomic_predicates serialization with context.

tl_spec: str
property var_value_info_generator: BaseVarValueInfoGenerator[ObsType, ActType]

Instantiate the variable value info generator based on the provided class and arguments.

Returns:

An instance of the variable value info generator.

Return type:

BaseVarValueInfoGenerator[ObsType, ActType]

var_value_info_generator_args: dict[str, Any]
var_value_info_generator_cls: str
class gym_tl_tools.RewardConfig(*, terminal_state_reward: float = 5, state_trans_reward_scale: float = 100, dense_reward: bool = False, dense_reward_scale: float = 0.01)[source]

Bases: BaseModel

dense_reward: bool
dense_reward_scale: float
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

state_trans_reward_scale: float
terminal_state_reward: float

Automaton and Predicates

class gym_tl_tools.Automaton(tl_spec: str, atomic_predicates: list[~gym_tl_tools.automaton.Predicate], parser: ~gym_tl_tools.parser.Parser = <gym_tl_tools.parser.Parser object>)[source]

Bases: object

property is_goaled: bool

Check if the current state of the automaton is a goal state. :returns: True if the current state is a goal state, False otherwise. :rtype: bool

property is_terminated: bool

Check if the automaton is in a terminal state (goal or trap). :returns: True if the automaton is in a terminal state, False otherwise. :rtype: bool

property is_trapped: bool

Check if the current state of the automaton is a trap state. :returns: True if the current state is a trap state, False otherwise. :rtype: bool

reset(seed: int | None = None) None[source]

Reset the automaton to its initial state.

step(var_value_dict: dict[str, float], terminal_state_reward: float = 5, state_trans_reward_scale: float = 100, dense_reward: bool = False, dense_reward_scale: float = 0.01) tuple[float, int][source]

Step the automaton to the next state based on the current predicate values. This function updates the current state of the automaton based on the values of variables defining atomic predicates provided in var_value_dict.

Parameters:
  • var_value_dict (dict[str, float]) – A dictionary mapping the variable names used in the atomic predicate definitions to their current values. e.g. {“d_robot_goal”: 3.0, “d_robot_obstacle”: 1.0, “d_robot_goal”: 0.5}.

  • terminal_state_reward (float = 5)

Returns:

  • reward (float) – The reward for the current step.

  • new_state (int) – The new automaton state.

tl_reward(ap_rob_dict: dict[str, float], curr_aut_state: int, terminal_state_reward: float = 5, state_trans_reward_scale: float = 100, dense_reward: bool = False, dense_reward_scale: float = 0.01) tuple[float, int][source]

Calculate the reward of the step from a given automaton.

Parameters:
  • ap_rob_dict (dict[str, float]) – A dictionary mapping the names of atomic predicates to their robustness values.

  • curr_aut_state (int) – The current state of the automaton.

  • dense_reward (bool) – If True, the reward is calculated based on the maximum robustness of non-trap transitions.

  • terminal_state_reward (float) – The reward given when the automaton reaches a terminal state.

Returns:

  • reward (float) – The reward of the step based on the MDP and automaton states.

  • next_aut_state (int) – The resultant automaton state after the step.

transition_robustness(transitions: list[Transition], ap_rob_dict: dict[str, float]) tuple[list[RobNextStatePair], list[RobNextStatePair], list[RobNextStatePair], list[RobNextStatePair], list[RobNextStatePair]][source]

Calculate the robustness of transitions in the automaton. This function computes the robustness values for each transition based on the conditions of the transitions and the robustness values of the atomic predicates.

Parameters:
  • transitions (list[Transition]) – A list of transitions in the automaton.

  • ap_rob_dict (dict[str, float]) – A dictionary mapping the names of atomic predicates to their robustness values.

Returns:

  • pos_trap_rob_trans_pairs (list[RobNextStatePair]) – A list of pairs of positive robustness values and transitions that lead to trap states.

  • pos_non_trap_rob_trans_pairs (list[RobNextStatePair]) – A list of pairs of positive robustness values and transitions that do not lead to trap states.

  • zero_trap_rob_trans_pairs (list[RobNextStatePair]) – A list of pairs of zero robustness values and transitions that lead to trap states.

  • zero_non_trap_rob_trans_pairs (list[RobNextStatePair]) – A list of pairs of zero robustness values and transitions that do not lead to trap states.

class gym_tl_tools.Predicate(*, name: str, formula: str)[source]

Bases: BaseModel

Represents a predicate in a TL formula.

name

The name of the atomic predicate. e.g. “psi_goal_robot”.

Type:

str

formula

The formula of the atomic predicate, which can be a Boolean expression. e.g. “d_robot_goal < 5”.

Type:

str

formula: str
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str

Parser

class gym_tl_tools.Parser[source]

Bases: object

evaluate(parsed_tokens: list[str], var_dict: dict[str, float]) float[source]

Checking from the start, get tokens from the stack and compute there if there is a symbol

get_func(s: str) Callable[[...], Any][source]
get_priority(s: str) int[source]
is_num(s: str) bool[source]
is_parentheses(s: str, **kwargs) bool[source]
is_symbol(s: Any) bool[source]
is_var(s: str, var_names: list[str]) bool[source]
parse(token_list: list[str], var_names: list[str]) list[str][source]

Convert the token to Reverse Polish Notation

tl2rob(spec: str, var_dict: dict[str, float]) float[source]

Parsing a TL spec to its robustness

Parameters:
  • spec (str) – The TL spec to be parsed. e.g. “psi_1 & psi_2 | !psi_3”.

  • var_dict (dict[str, float]) – A dictionary mapping the variable names used in the TL spec to their current values. The keys should match the names of the atomic predicates defined in the spec. e.g. {“d_robot_goal”: 3.0, “d_robot_obstacle”: 1.0, “d_robot_goal”: 0.5}.

Returns:

robustness – The robustness value of the TL spec given the variable values. A positive value indicates satisfaction, while a negative value indicates violation.

Return type:

float

tokenize(spec: str) list[str][source]
class gym_tl_tools.ParserSymbol(*, priority: int, func: Callable[[...], Any])[source]

Bases: BaseModel

Represents a symbol in the parser with its priority and function.

priority

The priority of the symbol, used for parsing precedence.

Type:

int

func

The function that implements the operation of the symbol. It should take the appropriate number of arguments and return a value.

Type:

Callable[…, Any]

func: Callable[[...], Any]
model_config: ClassVar[ConfigDict] = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

priority: int

Utility Functions

gym_tl_tools.replace_special_characters(spec: str) str[source]

Replace special characters in the spec with underscores so that it can be used as a file name.

Parameters:

spec (str) – spec to be replaced. The spec can contain special characters like spaces, &, |, etc. These characters are replaced with underscores to create a valid file name.

Returns:

replaced_spec – spec with special characters replaced by underscores

Return type:

str

Examples

`python spec = replace_special_characters("F(psi_1 | psi_2) & G(!psi_3)") print(spec) # Output: "F(psi_1_or_psi_2)_and_G(!psi_3)" `

Type Definitions

gym_tl_tools.RewardConfigDict = <class 'gym_tl_tools.wrapper.RewardConfigDict'>[source]

Configuration dictionary for the reward structure in the TLObservationReward wrapper. This dictionary is used to define how rewards are computed based on the automaton’s state and the environment’s info.

gym_tl_tools.terminal_state_reward

Reward given when the automaton reaches a terminal state in addition to the automaton state transition reward.

Type:

float

gym_tl_tools.state_trans_reward_scale

Scale factor for the reward based on the automaton’s state transition robustness. This is applied to the robustness computed from the automaton’s transition. If the transition leads to a trap state, the reward is set to be negative, scaled by this factor.

Type:

float

gym_tl_tools.dense_reward

Whether to use dense rewards (True) or sparse rewards (False). Dense rewards provide a continuous reward signal based on the robustness of the transition to the next non-trap state.

Type:

bool

gym_tl_tools.dense_reward_scale

Scale factor for the dense reward. This is applied to the computed dense reward based on the robustness to the next non-trap automaton’s state. If dense rewards are enabled, this factor scales the reward returned by the automaton.

Type:

float