API Reference
This page contains the complete API reference for gym-tl-tools.
Core Classes
- class gym_tl_tools.Automaton(tl_spec: str, atomic_predicates: list[~gym_tl_tools.automaton.Predicate], parser: ~gym_tl_tools.parser.Parser = <gym_tl_tools.parser.Parser object>)[source]
Bases:
object- property is_goaled: bool
Check if the current state of the automaton is a goal state. :returns: True if the current state is a goal state, False otherwise. :rtype: bool
- property is_terminated: bool
Check if the automaton is in a terminal state (goal or trap). :returns: True if the automaton is in a terminal state, False otherwise. :rtype: bool
- property is_trapped: bool
Check if the current state of the automaton is a trap state. :returns: True if the current state is a trap state, False otherwise. :rtype: bool
- step(var_value_dict: dict[str, float], terminal_state_reward: float = 5, state_trans_reward_scale: float = 100, dense_reward: bool = False, dense_reward_scale: float = 0.01) tuple[float, int][source]
Step the automaton to the next state based on the current predicate values. This function updates the current state of the automaton based on the values of variables defining atomic predicates provided in var_value_dict.
- Parameters:
- Returns:
reward (float) – The reward for the current step.
new_state (int) – The new automaton state.
- tl_reward(ap_rob_dict: dict[str, float], curr_aut_state: int, terminal_state_reward: float = 5, state_trans_reward_scale: float = 100, dense_reward: bool = False, dense_reward_scale: float = 0.01) tuple[float, int][source]
Calculate the reward of the step from a given automaton.
- Parameters:
ap_rob_dict (dict[str, float]) – A dictionary mapping the names of atomic predicates to their robustness values.
curr_aut_state (int) – The current state of the automaton.
dense_reward (bool) – If True, the reward is calculated based on the maximum robustness of non-trap transitions.
terminal_state_reward (float) – The reward given when the automaton reaches a terminal state.
- Returns:
reward (float) – The reward of the step based on the MDP and automaton states.
next_aut_state (int) – The resultant automaton state after the step.
- transition_robustness(transitions: list[Transition], ap_rob_dict: dict[str, float]) tuple[list[RobNextStatePair], list[RobNextStatePair], list[RobNextStatePair], list[RobNextStatePair], list[RobNextStatePair]][source]
Calculate the robustness of transitions in the automaton. This function computes the robustness values for each transition based on the conditions of the transitions and the robustness values of the atomic predicates.
- Parameters:
transitions (list[Transition]) – A list of transitions in the automaton.
ap_rob_dict (dict[str, float]) – A dictionary mapping the names of atomic predicates to their robustness values.
- Returns:
pos_trap_rob_trans_pairs (list[RobNextStatePair]) – A list of pairs of positive robustness values and transitions that lead to trap states.
pos_non_trap_rob_trans_pairs (list[RobNextStatePair]) – A list of pairs of positive robustness values and transitions that do not lead to trap states.
zero_trap_rob_trans_pairs (list[RobNextStatePair]) – A list of pairs of zero robustness values and transitions that lead to trap states.
zero_non_trap_rob_trans_pairs (list[RobNextStatePair]) – A list of pairs of zero robustness values and transitions that do not lead to trap states.
- class gym_tl_tools.BaseVarValueInfoGenerator[source]
Bases:
Generic[ObsType,ActType],ABCBase class for generating variable values from the environment’s observation and info.
This class should be subclassed to implement custom logic for extracting variable values that are used to evaluate atomic predicates in temporal logic formulas.
The get_var_values method should return a dictionary where keys are variable names used in predicate formulas (e.g., “d_robot_goal”, “d_robot_obstacle”) and values are their corresponding numerical values extracted from the environment’s observation and info.
Example
```python class MyVarValueGenerator(BaseVarValueInfoGenerator):
- def get_var_values(self, env, obs, info):
- return {
“d_robot_goal”: info.get(“d_robot_goal”, float(‘inf’)), “d_robot_obstacle”: obs.get(“obstacle_distance”, 0.0), “robot_speed”: np.linalg.norm(obs[“velocity”]) if “velocity” in obs else 0.0,
}
- abstractmethod get_var_values(env: Env[ObsType, ActType], obs: ObsType, info: dict[str, Any]) dict[str, Any][source]
Extract variable values from the environment’s observation and info.
This method should be implemented by subclasses to provide the variable values needed for evaluating atomic predicates in temporal logic formulas.
- Parameters:
- Returns:
A dictionary where keys are variable names used in predicate formulas and values are their corresponding numerical values.
- Return type:
- class gym_tl_tools.Parser[source]
Bases:
object- evaluate(parsed_tokens: list[str], var_dict: dict[str, float]) float[source]
Checking from the start, get tokens from the stack and compute there if there is a symbol
- parse(token_list: list[str], var_names: list[str]) list[str][source]
Convert the token to Reverse Polish Notation
- tl2rob(spec: str, var_dict: dict[str, float]) float[source]
Parsing a TL spec to its robustness
- Parameters:
spec (str) – The TL spec to be parsed. e.g. “psi_1 & psi_2 | !psi_3”.
var_dict (dict[str, float]) – A dictionary mapping the variable names used in the TL spec to their current values. The keys should match the names of the atomic predicates defined in the spec. e.g. {“d_robot_goal”: 3.0, “d_robot_obstacle”: 1.0, “d_robot_goal”: 0.5}.
- Returns:
robustness – The robustness value of the TL spec given the variable values. A positive value indicates satisfaction, while a negative value indicates violation.
- Return type:
- class gym_tl_tools.ParserSymbol(*, priority: int, func: Callable[[...], Any])[source]
Bases:
BaseModelRepresents a symbol in the parser with its priority and function.
- func
The function that implements the operation of the symbol. It should take the appropriate number of arguments and return a value.
- Type:
Callable[…, Any]
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class gym_tl_tools.Predicate(*, name: str, formula: str)[source]
Bases:
BaseModelRepresents a predicate in a TL formula.
- formula
The formula of the atomic predicate, which can be a Boolean expression. e.g. “d_robot_goal < 5”.
- Type:
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class gym_tl_tools.RewardConfig(*, terminal_state_reward: float = 5, state_trans_reward_scale: float = 100, dense_reward: bool = False, dense_reward_scale: float = 0.01)[source]
Bases:
BaseModel- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- class gym_tl_tools.RewardConfigDict[source]
Bases:
TypedDictConfiguration dictionary for the reward structure in the TLObservationReward wrapper. This dictionary is used to define how rewards are computed based on the automaton’s state and the environment’s info.
- terminal_state_reward
Reward given when the automaton reaches a terminal state in addition to the automaton state transition reward.
- Type:
- state_trans_reward_scale
Scale factor for the reward based on the automaton’s state transition robustness. This is applied to the robustness computed from the automaton’s transition. If the transition leads to a trap state, the reward is set to be negative, scaled by this factor.
- Type:
- dense_reward
Whether to use dense rewards (True) or sparse rewards (False). Dense rewards provide a continuous reward signal based on the robustness of the transition to the next non-trap state.
- Type:
- dense_reward_scale
Scale factor for the dense reward. This is applied to the computed dense reward based on the robustness to the next non-trap automaton’s state. If dense rewards are enabled, this factor scales the reward returned by the automaton.
- Type:
- class gym_tl_tools.TLObservationReward(env: ~gymnasium.core.Env[~gymnasium.core.ObsType, ~gymnasium.core.ActType], tl_spec: str, atomic_predicates: list[~gym_tl_tools.automaton.Predicate] | list[dict[str, str]], var_value_info_generator: ~gym_tl_tools.wrapper.BaseVarValueInfoGenerator[~gymnasium.core.ObsType, ~gymnasium.core.ActType], reward_config: ~gym_tl_tools.wrapper.RewardConfigDict = {'dense_reward': False, 'dense_reward_scale': 0.01, 'state_trans_reward_scale': 100, 'terminal_state_reward': 5}, early_termination: bool = True, parser: ~gym_tl_tools.parser.Parser = <gym_tl_tools.parser.Parser object>, dict_aut_state_key: str = 'aut_state')[source]
Bases:
ObservationWrapper[dict[str,ObsType|int64|ndarray[tuple[Any, …],dtype[number]]],ActType,ObsType],RecordConstructorArgs,Generic[ObsType,ActType]A wrapper for Gymnasium environments that augments observations with the state of a temporal logic automaton, and computes rewards based on satisfaction of temporal logic (TL) specifications.
This wrapper is designed for environments where the agent’s objective is specified using temporal logic (e.g., LTL). It integrates an automaton (from a TL formula) into the observation and reward structure, enabling RL agents to learn tasks with complex temporal requirements.
Usage
- Define Atomic Predicates:
Create a list of Predicate objects, each representing an atomic proposition in your TL formula. Each predicate has a name (used in the TL formula) and a formula (Boolean expression string). Example:
- Create a Variable Value Generator:
Implement a subclass of BaseVarValueInfoGenerator to extract variable values from the environment’s observation and info for evaluating the atomic predicates. Example:
```python from gym_tl_tools import BaseVarValueInfoGenerator
- class MyVarValueGenerator(BaseVarValueInfoGenerator):
- def get_var_values(self, env, obs, info):
# Extract variables needed for predicate evaluation return {
“d_robot_goal”: info.get(“d_robot_goal”, float(‘inf’)), “d_robot_obstacle”: info.get(“d_robot_obstacle”, float(‘inf’)),
}
- Specify the Temporal Logic Formula:
Write your TL specification as a string, using the names of your atomic predicates. Example:
`python tl_spec = "F(goal_reached) & G(!obstacle_hit)" `
- Wrap Your Environment:
Pass your environment, TL specification, atomic predicates, and variable value generator to TLObservationReward. Example:
- Observation Structure:
The wrapper augments each observation with the current automaton state.
If the original observation space is a [Dict](https://gymnasium.farama.org/main/api/spaces/composite/#gymnasium.spaces.Dict), the automaton state is added as a new key (default: “aut_state”).
~~If the original observation space is a [Tuple](https://gymnasium.farama.org/main/api/spaces/composite/#gymnasium.spaces.Tuple), the automaton state is appended.~~
Otherwise, the observation is wrapped in a [Dict](https://gymnasium.farama.org/main/api/spaces/composite/#gymnasium.spaces.Dict) with keys “obs” and “aut_state”.
- Reward Calculation:
At each step, the wrapper computes the reward based on the automaton’s transition, which reflects progress toward (or violation of) the TL specification.
The automaton state is updated according to the values of the atomic predicates, which are evaluated using the variable values provided by the var_value_info_generator.
- Reset and Step:
On reset(), the automaton is reset to its initial state, and the initial observation is augmented.
On step(action), the automaton transitions based on the variable values extracted by the var_value_info_generator, and the reward is computed accordingly.
Notes
- The wrapper adds additional information to the info dict, including:
is_success: Whether the automaton has reached a goal state.
is_failure: Whether the automaton has reached a trap state.
is_aut_terminated: Whether the automaton has been terminated.
- param env:
The environment to wrap.
- type env:
gymnasium.Env
- param tl_spec:
The temporal logic specification (e.g., LTL formula) to be used for the automaton.
- type tl_spec:
str
- param atomic_predicates:
List of atomic predicates used in the TL formula. Each predicate has a name (used in the TL formula) and a formula (Boolean expression string for evaluation).
- type atomic_predicates:
list[gym_tl_tools.Predicate]
- param var_value_info_generator:
An instance of a subclass of BaseVarValueInfoGenerator that extracts variable values from the environment’s observation and info for evaluating atomic predicates.
- type var_value_info_generator:
BaseVarValueInfoGenerator[ObsType, ActType]
- param reward_config:
Configuration for the reward structure (default values provided).
- type reward_config:
RewardConfigDict, optional
- param early_termination:
Whether to terminate episodes when automaton reaches terminal states (default: True).
- type early_termination:
bool, optional
- param parser:
Parser for TL expressions (default: new instance of Parser).
- type parser:
Parser, optional
- param dict_aut_state_key:
Key for the automaton state in the observation dictionary (default: “aut_state”).
- type dict_aut_state_key:
str, optional
Example
```python from gym_tl_tools import Predicate, BaseVarValueInfoGenerator, TLObservationReward
# Define atomic predicates atomic_predicates = [
Predicate(name=”goal_reached”, formula=”d_robot_goal < 1.0”), Predicate(name=”obstacle_hit”, formula=”d_robot_obstacle < 1.0”),
]
# Create variable value generator class MyVarValueGenerator(BaseVarValueInfoGenerator):
- def get_var_values(self, env, obs, info):
- return {
“d_robot_goal”: info.get(“d_robot_goal”, float(‘inf’)), “d_robot_obstacle”: info.get(“d_robot_obstacle”, float(‘inf’)),
}
var_generator = MyVarValueGenerator() tl_spec = “F(goal_reached) & G(!obstacle_hit)”
- wrapped_env = TLObservationReward(
env, tl_spec=tl_spec, atomic_predicates=atomic_predicates, var_value_info_generator=var_generator,
)
obs, info = wrapped_env.reset() done = False while not done:
action = wrapped_env.action_space.sample() obs, reward, terminated, truncated, info = wrapped_env.step(action) done = terminated or truncated
- forward_aut(obs: ObsType, info: dict[str, Any]) None[source]
Forward the automaton based on the current observation and info.
This method updates the automaton state based on the current observation and info. It is useful for manually stepping the automaton without taking an action in the environment.
- property is_aut_terminated: bool
Check if the automaton is in a terminal state.
- Returns:
True if the automaton is in a terminal state, False otherwise.
- Return type:
- observation(observation: ObsType) dict[str, ObsType | int64 | ndarray[tuple[Any, ...], dtype[_ScalarT]]][source]
Process the observation to include the automaton state.
- reset(*, seed: int | None = None, options: dict[str, Any] | None = {'forward_aut_on_reset': True, 'info': {}, 'obs': {}}) tuple[dict[str, ObsType | int64 | ndarray[tuple[Any, ...], dtype[_ScalarT]]], dict[str, Any]][source]
Reset the environment and return the initial observation.
- Parameters:
- Returns:
new_obs (dict[str,ObsType|int]) – The initial observation with the automaton state.
info (dict[str, Any]) – Additional information from the reset. Should contain the variable keys and values that define the atomic predicates.
- step(action: ActType) tuple[dict[str, ObsType | int64 | ndarray[tuple[Any, ...], dtype[_ScalarT]]], SupportsFloat, bool, bool, dict[str, Any]][source]
Take a step in the environment with the given action.
- Parameters:
action (ActType) – The action to take in the environment.
- Returns:
new_obs (dict[str,ObsType|int]) – The new observation after taking the action.
reward (SupportsFloat) – The reward received from the environment.
terminated (bool) – Whether the episode has terminated.
truncated (bool) – Whether the episode has been truncated.
info (dict[str, Any]) – Additional information from the step. Should contain the variable keys and values that define the atomic predicates.
- class gym_tl_tools.TLObservationRewardConfig(*, tl_spec: str = '', atomic_predicates: list[~gym_tl_tools.automaton.Predicate], var_value_info_generator_cls: str, var_value_info_generator_args: dict[str, ~typing.Any] = {}, reward_config: ~gym_tl_tools.wrapper.RewardConfig = RewardConfig(terminal_state_reward=5, state_trans_reward_scale=100, dense_reward=False, dense_reward_scale=0.01), early_termination: bool = True, parser: ~gym_tl_tools.parser.Parser = <gym_tl_tools.parser.Parser object>, dict_aut_state_key: str = 'aut_state')[source]
Bases:
BaseModel,Generic[ObsType,ActType]Configuration for the TLObservationReward wrapper.
This class defines the parameters used to configure the TLObservationReward wrapper, including the temporal logic specification, atomic predicates, variable value generator, reward configuration, and other settings.
- tl_spec
The temporal logic specification (e.g., LTL formula) to be used for the automaton.
- Type:
- var_value_info_generator
An instance of a subclass of BaseVarValueInfoGenerator that extracts variable values from the environment’s observation and info.
- Type:
BaseVarValueInfoGenerator[ObsType, ActType]
- reward_config
Configuration for the reward structure.
- Type:
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- reward_config: RewardConfig
- serialize_model(info: SerializationInfo | None = None) dict[str, Any][source]
Custom model serializer to handle atomic_predicates serialization with context.
- property var_value_info_generator: BaseVarValueInfoGenerator[ObsType, ActType]
Instantiate the variable value info generator based on the provided class and arguments.
- Returns:
An instance of the variable value info generator.
- Return type:
BaseVarValueInfoGenerator[ObsType, ActType]
- gym_tl_tools.replace_special_characters(spec: str) str[source]
Replace special characters in the spec with underscores so that it can be used as a file name.
- Parameters:
spec (str) – spec to be replaced. The spec can contain special characters like spaces, &, |, etc. These characters are replaced with underscores to create a valid file name.
- Returns:
replaced_spec – spec with special characters replaced by underscores
- Return type:
Examples
`python spec = replace_special_characters("F(psi_1 | psi_2) & G(!psi_3)") print(spec) # Output: "F(psi_1_or_psi_2)_and_G(!psi_3)" `
TLObservationReward Wrapper
- class gym_tl_tools.TLObservationReward(env: ~gymnasium.core.Env[~gymnasium.core.ObsType, ~gymnasium.core.ActType], tl_spec: str, atomic_predicates: list[~gym_tl_tools.automaton.Predicate] | list[dict[str, str]], var_value_info_generator: ~gym_tl_tools.wrapper.BaseVarValueInfoGenerator[~gymnasium.core.ObsType, ~gymnasium.core.ActType], reward_config: ~gym_tl_tools.wrapper.RewardConfigDict = {'dense_reward': False, 'dense_reward_scale': 0.01, 'state_trans_reward_scale': 100, 'terminal_state_reward': 5}, early_termination: bool = True, parser: ~gym_tl_tools.parser.Parser = <gym_tl_tools.parser.Parser object>, dict_aut_state_key: str = 'aut_state')[source]
Bases:
ObservationWrapper[dict[str,ObsType|int64|ndarray[tuple[Any, …],dtype[number]]],ActType,ObsType],RecordConstructorArgs,Generic[ObsType,ActType]A wrapper for Gymnasium environments that augments observations with the state of a temporal logic automaton, and computes rewards based on satisfaction of temporal logic (TL) specifications.
This wrapper is designed for environments where the agent’s objective is specified using temporal logic (e.g., LTL). It integrates an automaton (from a TL formula) into the observation and reward structure, enabling RL agents to learn tasks with complex temporal requirements.
Usage
- Define Atomic Predicates:
Create a list of Predicate objects, each representing an atomic proposition in your TL formula. Each predicate has a name (used in the TL formula) and a formula (Boolean expression string). Example:
- Create a Variable Value Generator:
Implement a subclass of BaseVarValueInfoGenerator to extract variable values from the environment’s observation and info for evaluating the atomic predicates. Example:
```python from gym_tl_tools import BaseVarValueInfoGenerator
- class MyVarValueGenerator(BaseVarValueInfoGenerator):
- def get_var_values(self, env, obs, info):
# Extract variables needed for predicate evaluation return {
“d_robot_goal”: info.get(“d_robot_goal”, float(‘inf’)), “d_robot_obstacle”: info.get(“d_robot_obstacle”, float(‘inf’)),
}
- Specify the Temporal Logic Formula:
Write your TL specification as a string, using the names of your atomic predicates. Example:
`python tl_spec = "F(goal_reached) & G(!obstacle_hit)" `
- Wrap Your Environment:
Pass your environment, TL specification, atomic predicates, and variable value generator to TLObservationReward. Example:
- Observation Structure:
The wrapper augments each observation with the current automaton state.
If the original observation space is a [Dict](https://gymnasium.farama.org/main/api/spaces/composite/#gymnasium.spaces.Dict), the automaton state is added as a new key (default: “aut_state”).
~~If the original observation space is a [Tuple](https://gymnasium.farama.org/main/api/spaces/composite/#gymnasium.spaces.Tuple), the automaton state is appended.~~
Otherwise, the observation is wrapped in a [Dict](https://gymnasium.farama.org/main/api/spaces/composite/#gymnasium.spaces.Dict) with keys “obs” and “aut_state”.
- Reward Calculation:
At each step, the wrapper computes the reward based on the automaton’s transition, which reflects progress toward (or violation of) the TL specification.
The automaton state is updated according to the values of the atomic predicates, which are evaluated using the variable values provided by the var_value_info_generator.
- Reset and Step:
On reset(), the automaton is reset to its initial state, and the initial observation is augmented.
On step(action), the automaton transitions based on the variable values extracted by the var_value_info_generator, and the reward is computed accordingly.
Notes
- The wrapper adds additional information to the info dict, including:
is_success: Whether the automaton has reached a goal state.
is_failure: Whether the automaton has reached a trap state.
is_aut_terminated: Whether the automaton has been terminated.
- param env:
The environment to wrap.
- type env:
gymnasium.Env
- param tl_spec:
The temporal logic specification (e.g., LTL formula) to be used for the automaton.
- type tl_spec:
str
- param atomic_predicates:
List of atomic predicates used in the TL formula. Each predicate has a name (used in the TL formula) and a formula (Boolean expression string for evaluation).
- type atomic_predicates:
list[gym_tl_tools.Predicate]
- param var_value_info_generator:
An instance of a subclass of BaseVarValueInfoGenerator that extracts variable values from the environment’s observation and info for evaluating atomic predicates.
- type var_value_info_generator:
BaseVarValueInfoGenerator[ObsType, ActType]
- param reward_config:
Configuration for the reward structure (default values provided).
- type reward_config:
RewardConfigDict, optional
- param early_termination:
Whether to terminate episodes when automaton reaches terminal states (default: True).
- type early_termination:
bool, optional
- param parser:
Parser for TL expressions (default: new instance of Parser).
- type parser:
Parser, optional
- param dict_aut_state_key:
Key for the automaton state in the observation dictionary (default: “aut_state”).
- type dict_aut_state_key:
str, optional
Example
```python from gym_tl_tools import Predicate, BaseVarValueInfoGenerator, TLObservationReward
# Define atomic predicates atomic_predicates = [
Predicate(name=”goal_reached”, formula=”d_robot_goal < 1.0”), Predicate(name=”obstacle_hit”, formula=”d_robot_obstacle < 1.0”),
]
# Create variable value generator class MyVarValueGenerator(BaseVarValueInfoGenerator):
- def get_var_values(self, env, obs, info):
- return {
“d_robot_goal”: info.get(“d_robot_goal”, float(‘inf’)), “d_robot_obstacle”: info.get(“d_robot_obstacle”, float(‘inf’)),
}
var_generator = MyVarValueGenerator() tl_spec = “F(goal_reached) & G(!obstacle_hit)”
- wrapped_env = TLObservationReward(
env, tl_spec=tl_spec, atomic_predicates=atomic_predicates, var_value_info_generator=var_generator,
)
obs, info = wrapped_env.reset() done = False while not done:
action = wrapped_env.action_space.sample() obs, reward, terminated, truncated, info = wrapped_env.step(action) done = terminated or truncated
- classmethod __class_getitem__(params)
Parameterizes a generic class.
At least, parameterizing a generic class is the main thing this method does. For example, for some generic class Foo, this is called when we do Foo[int] - there, with cls=Foo and params=int.
However, note that this method is also called when defining generic classes in the first place with class Foo(Generic[T]): ….
- __enter__()
Support with-statement for the environment.
- __repr__()
Returns the string representation of the wrapper.
- __str__()
Returns the wrapper name and the
envrepresentation string.
- property action_space: Space[ActType] | Space[WrapperActType]
Return the
Envaction_spaceunless overwritten then the wrapperaction_spaceis used.
- close()
Closes the wrapper and
env.
- forward_aut(obs: ObsType, info: dict[str, Any]) None[source]
Forward the automaton based on the current observation and info.
This method updates the automaton state based on the current observation and info. It is useful for manually stepping the automaton without taking an action in the environment.
- get_wrapper_attr(name: str) Any
Gets an attribute from the wrapper and lower environments if name doesn’t exist in this object.
- Parameters:
name – The variable name to get
- Returns:
The variable with name in wrapper or lower environments
- has_wrapper_attr(name: str) bool
Checks if the given attribute is within the wrapper or its environment.
- property is_aut_terminated: bool
Check if the automaton is in a terminal state.
- Returns:
True if the automaton is in a terminal state, False otherwise.
- Return type:
- property np_random_seed: int | None
Returns the base environment’s
np_random_seed.
- observation(observation: ObsType) dict[str, ObsType | int64 | ndarray[tuple[Any, ...], dtype[_ScalarT]]][source]
Process the observation to include the automaton state.
- property observation_space: Space[ObsType] | Space[WrapperObsType]
Return the
Envobservation_spaceunless overwritten then the wrapperobservation_spaceis used.
- render() RenderFrame | list[RenderFrame] | None
Uses the
render()of theenvthat can be overwritten to change the returned data.
- property render_mode: str | None
Returns the
Envrender_mode.
- reset(*, seed: int | None = None, options: dict[str, Any] | None = {'forward_aut_on_reset': True, 'info': {}, 'obs': {}}) tuple[dict[str, ObsType | int64 | ndarray[tuple[Any, ...], dtype[_ScalarT]]], dict[str, Any]][source]
Reset the environment and return the initial observation.
- Parameters:
- Returns:
new_obs (dict[str,ObsType|int]) – The initial observation with the automaton state.
info (dict[str, Any]) – Additional information from the reset. Should contain the variable keys and values that define the atomic predicates.
- set_wrapper_attr(name: str, value: Any, *, force: bool = True) bool
Sets an attribute on this wrapper or lower environment if name is already defined.
- Parameters:
name – The variable name
value – The new variable value
force – Whether to create the attribute on this wrapper if it does not exists on the lower environment instead of raising an exception
- Returns:
If the variable has been set in this or a lower wrapper.
- property spec: EnvSpec | None
Returns the
Envspecattribute with the WrapperSpec if the wrapper inherits from EzPickle.
- step(action: ActType) tuple[dict[str, ObsType | int64 | ndarray[tuple[Any, ...], dtype[_ScalarT]]], SupportsFloat, bool, bool, dict[str, Any]][source]
Take a step in the environment with the given action.
- Parameters:
action (ActType) – The action to take in the environment.
- Returns:
new_obs (dict[str,ObsType|int]) – The new observation after taking the action.
reward (SupportsFloat) – The reward received from the environment.
terminated (bool) – Whether the episode has terminated.
truncated (bool) – Whether the episode has been truncated.
info (dict[str, Any]) – Additional information from the step. Should contain the variable keys and values that define the atomic predicates.
- property unwrapped: Env[ObsType, ActType]
Returns the base environment of the wrapper.
This will be the bare
gymnasium.Envenvironment, underneath all layers of wrappers.
- classmethod wrapper_spec(**kwargs: Any) WrapperSpec
Generates a WrapperSpec for the wrappers.
Variable Value Generator
- class gym_tl_tools.BaseVarValueInfoGenerator[source]
Bases:
Generic[ObsType,ActType],ABCBase class for generating variable values from the environment’s observation and info.
This class should be subclassed to implement custom logic for extracting variable values that are used to evaluate atomic predicates in temporal logic formulas.
The get_var_values method should return a dictionary where keys are variable names used in predicate formulas (e.g., “d_robot_goal”, “d_robot_obstacle”) and values are their corresponding numerical values extracted from the environment’s observation and info.
Example
```python class MyVarValueGenerator(BaseVarValueInfoGenerator):
- def get_var_values(self, env, obs, info):
- return {
“d_robot_goal”: info.get(“d_robot_goal”, float(‘inf’)), “d_robot_obstacle”: obs.get(“obstacle_distance”, 0.0), “robot_speed”: np.linalg.norm(obs[“velocity”]) if “velocity” in obs else 0.0,
}
- abstractmethod get_var_values(env: Env[ObsType, ActType], obs: ObsType, info: dict[str, Any]) dict[str, Any][source]
Extract variable values from the environment’s observation and info.
This method should be implemented by subclasses to provide the variable values needed for evaluating atomic predicates in temporal logic formulas.
- Parameters:
- Returns:
A dictionary where keys are variable names used in predicate formulas and values are their corresponding numerical values.
- Return type:
Configuration Classes
- class gym_tl_tools.TLObservationRewardConfig(*, tl_spec: str = '', atomic_predicates: list[~gym_tl_tools.automaton.Predicate], var_value_info_generator_cls: str, var_value_info_generator_args: dict[str, ~typing.Any] = {}, reward_config: ~gym_tl_tools.wrapper.RewardConfig = RewardConfig(terminal_state_reward=5, state_trans_reward_scale=100, dense_reward=False, dense_reward_scale=0.01), early_termination: bool = True, parser: ~gym_tl_tools.parser.Parser = <gym_tl_tools.parser.Parser object>, dict_aut_state_key: str = 'aut_state')[source]
Bases:
BaseModel,Generic[ObsType,ActType]Configuration for the TLObservationReward wrapper.
This class defines the parameters used to configure the TLObservationReward wrapper, including the temporal logic specification, atomic predicates, variable value generator, reward configuration, and other settings.
- tl_spec
The temporal logic specification (e.g., LTL formula) to be used for the automaton.
- Type:
- var_value_info_generator
An instance of a subclass of BaseVarValueInfoGenerator that extracts variable values from the environment’s observation and info.
- Type:
BaseVarValueInfoGenerator[ObsType, ActType]
- reward_config
Configuration for the reward structure.
- Type:
- model_config: ClassVar[ConfigDict] = {'arbitrary_types_allowed': True}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
- reward_config: RewardConfig
- serialize_model(info: SerializationInfo | None = None) dict[str, Any][source]
Custom model serializer to handle atomic_predicates serialization with context.
- property var_value_info_generator: BaseVarValueInfoGenerator[ObsType, ActType]
Instantiate the variable value info generator based on the provided class and arguments.
- Returns:
An instance of the variable value info generator.
- Return type:
BaseVarValueInfoGenerator[ObsType, ActType]
- class gym_tl_tools.RewardConfig(*, terminal_state_reward: float = 5, state_trans_reward_scale: float = 100, dense_reward: bool = False, dense_reward_scale: float = 0.01)[source]
Bases:
BaseModel- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
Automaton and Predicates
- class gym_tl_tools.Automaton(tl_spec: str, atomic_predicates: list[~gym_tl_tools.automaton.Predicate], parser: ~gym_tl_tools.parser.Parser = <gym_tl_tools.parser.Parser object>)[source]
Bases:
object- property is_goaled: bool
Check if the current state of the automaton is a goal state. :returns: True if the current state is a goal state, False otherwise. :rtype: bool
- property is_terminated: bool
Check if the automaton is in a terminal state (goal or trap). :returns: True if the automaton is in a terminal state, False otherwise. :rtype: bool
- property is_trapped: bool
Check if the current state of the automaton is a trap state. :returns: True if the current state is a trap state, False otherwise. :rtype: bool
- step(var_value_dict: dict[str, float], terminal_state_reward: float = 5, state_trans_reward_scale: float = 100, dense_reward: bool = False, dense_reward_scale: float = 0.01) tuple[float, int][source]
Step the automaton to the next state based on the current predicate values. This function updates the current state of the automaton based on the values of variables defining atomic predicates provided in var_value_dict.
- Parameters:
- Returns:
reward (float) – The reward for the current step.
new_state (int) – The new automaton state.
- tl_reward(ap_rob_dict: dict[str, float], curr_aut_state: int, terminal_state_reward: float = 5, state_trans_reward_scale: float = 100, dense_reward: bool = False, dense_reward_scale: float = 0.01) tuple[float, int][source]
Calculate the reward of the step from a given automaton.
- Parameters:
ap_rob_dict (dict[str, float]) – A dictionary mapping the names of atomic predicates to their robustness values.
curr_aut_state (int) – The current state of the automaton.
dense_reward (bool) – If True, the reward is calculated based on the maximum robustness of non-trap transitions.
terminal_state_reward (float) – The reward given when the automaton reaches a terminal state.
- Returns:
reward (float) – The reward of the step based on the MDP and automaton states.
next_aut_state (int) – The resultant automaton state after the step.
- transition_robustness(transitions: list[Transition], ap_rob_dict: dict[str, float]) tuple[list[RobNextStatePair], list[RobNextStatePair], list[RobNextStatePair], list[RobNextStatePair], list[RobNextStatePair]][source]
Calculate the robustness of transitions in the automaton. This function computes the robustness values for each transition based on the conditions of the transitions and the robustness values of the atomic predicates.
- Parameters:
transitions (list[Transition]) – A list of transitions in the automaton.
ap_rob_dict (dict[str, float]) – A dictionary mapping the names of atomic predicates to their robustness values.
- Returns:
pos_trap_rob_trans_pairs (list[RobNextStatePair]) – A list of pairs of positive robustness values and transitions that lead to trap states.
pos_non_trap_rob_trans_pairs (list[RobNextStatePair]) – A list of pairs of positive robustness values and transitions that do not lead to trap states.
zero_trap_rob_trans_pairs (list[RobNextStatePair]) – A list of pairs of zero robustness values and transitions that lead to trap states.
zero_non_trap_rob_trans_pairs (list[RobNextStatePair]) – A list of pairs of zero robustness values and transitions that do not lead to trap states.
- class gym_tl_tools.Predicate(*, name: str, formula: str)[source]
Bases:
BaseModelRepresents a predicate in a TL formula.
- formula
The formula of the atomic predicate, which can be a Boolean expression. e.g. “d_robot_goal < 5”.
- Type:
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
Parser
- class gym_tl_tools.Parser[source]
Bases:
object- evaluate(parsed_tokens: list[str], var_dict: dict[str, float]) float[source]
Checking from the start, get tokens from the stack and compute there if there is a symbol
- parse(token_list: list[str], var_names: list[str]) list[str][source]
Convert the token to Reverse Polish Notation
- tl2rob(spec: str, var_dict: dict[str, float]) float[source]
Parsing a TL spec to its robustness
- Parameters:
spec (str) – The TL spec to be parsed. e.g. “psi_1 & psi_2 | !psi_3”.
var_dict (dict[str, float]) – A dictionary mapping the variable names used in the TL spec to their current values. The keys should match the names of the atomic predicates defined in the spec. e.g. {“d_robot_goal”: 3.0, “d_robot_obstacle”: 1.0, “d_robot_goal”: 0.5}.
- Returns:
robustness – The robustness value of the TL spec given the variable values. A positive value indicates satisfaction, while a negative value indicates violation.
- Return type:
- class gym_tl_tools.ParserSymbol(*, priority: int, func: Callable[[...], Any])[source]
Bases:
BaseModelRepresents a symbol in the parser with its priority and function.
- func
The function that implements the operation of the symbol. It should take the appropriate number of arguments and return a value.
- Type:
Callable[…, Any]
- model_config: ClassVar[ConfigDict] = {}
Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].
Utility Functions
- gym_tl_tools.replace_special_characters(spec: str) str[source]
Replace special characters in the spec with underscores so that it can be used as a file name.
- Parameters:
spec (str) – spec to be replaced. The spec can contain special characters like spaces, &, |, etc. These characters are replaced with underscores to create a valid file name.
- Returns:
replaced_spec – spec with special characters replaced by underscores
- Return type:
Examples
`python spec = replace_special_characters("F(psi_1 | psi_2) & G(!psi_3)") print(spec) # Output: "F(psi_1_or_psi_2)_and_G(!psi_3)" `
Type Definitions
- gym_tl_tools.RewardConfigDict = <class 'gym_tl_tools.wrapper.RewardConfigDict'>[source]
Configuration dictionary for the reward structure in the TLObservationReward wrapper. This dictionary is used to define how rewards are computed based on the automaton’s state and the environment’s info.
- gym_tl_tools.terminal_state_reward
Reward given when the automaton reaches a terminal state in addition to the automaton state transition reward.
- Type:
- gym_tl_tools.state_trans_reward_scale
Scale factor for the reward based on the automaton’s state transition robustness. This is applied to the robustness computed from the automaton’s transition. If the transition leads to a trap state, the reward is set to be negative, scaled by this factor.
- Type:
- gym_tl_tools.dense_reward
Whether to use dense rewards (True) or sparse rewards (False). Dense rewards provide a continuous reward signal based on the robustness of the transition to the next non-trap state.
- Type: