Quick Start Guide ================ This guide will walk you through the basic usage of gym-tl-tools to wrap your Gymnasium environments with temporal logic rewards. Basic Concepts -------------- **Temporal Logic (TL)**: A formal specification language that allows you to express properties about sequences of states over time. **Atomic Predicates**: Basic propositions that can be evaluated as true or false based on the current state. **Automaton**: A finite state machine derived from your temporal logic formula that tracks progress toward goal satisfaction. Step-by-Step Usage ------------------ 1. Define Atomic Predicates ~~~~~~~~~~~~~~~~~~~~~~~~~~~ Create predicates that represent the basic conditions in your environment: .. code-block:: python from gym_tl_tools import Predicate atomic_predicates = [ Predicate(name="goal_reached", formula="d_robot_goal < 1.0"), Predicate(name="obstacle_hit", formula="d_robot_obstacle < 1.0"), Predicate(name="safe_speed", formula="robot_speed < 2.0"), ] 2. Create a Variable Value Generator ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Implement a class that extracts variables from your environment's observations and info: .. code-block:: python from gym_tl_tools import BaseVarValueInfoGenerator class MyVarValueGenerator(BaseVarValueInfoGenerator): def get_var_values(self, env, obs, info): return { "d_robot_goal": info.get("d_robot_goal", float('inf')), "d_robot_obstacle": info.get("d_robot_obstacle", float('inf')), "robot_speed": np.linalg.norm(obs.get("velocity", [0, 0])), } 3. Specify Your Temporal Logic Formula ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Write your specification using the predicate names: .. code-block:: python # Eventually reach goal while always avoiding obstacles and maintaining safe speed tl_spec = "F(goal_reached) & G(!obstacle_hit & safe_speed)" 4. Wrap Your Environment ~~~~~~~~~~~~~~~~~~~~~~~~ Apply the wrapper to your Gymnasium environment: .. code-block:: python from gym_tl_tools import TLObservationReward import gymnasium as gym env = gym.make("YourEnv-v0") wrapped_env = TLObservationReward( env, tl_spec=tl_spec, atomic_predicates=atomic_predicates, var_value_info_generator=MyVarValueGenerator(), ) 5. Use the Wrapped Environment ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The wrapped environment can be used like any Gymnasium environment: .. code-block:: python obs, info = wrapped_env.reset() print(f"Initial observation keys: {obs.keys()}") # Output: dict_keys(['obs', 'aut_state']) done = False total_reward = 0 while not done: action = wrapped_env.action_space.sample() obs, reward, terminated, truncated, info = wrapped_env.step(action) total_reward += reward done = terminated or truncated # Check automaton status if info.get("is_success"): print("Goal achieved!") elif info.get("is_failure"): print("Specification violated!") Understanding the Output ------------------------ **Observation Structure**: - Original observations are preserved under the `"obs"` key - Automaton state is added under the `"aut_state"` key (configurable) **Reward Structure**: - Rewards are based on automaton transitions and robustness values - Positive rewards indicate progress toward goal satisfaction - Negative rewards indicate violations or movement toward failure states **Info Dictionary Additions**: - `is_success`: Boolean indicating if the automaton reached a goal state - `is_failure`: Boolean indicating if the automaton reached a trap state - `is_aut_terminated`: Boolean indicating if the automaton episode has ended Temporal Logic Operators ------------------------- The parser supports the following operators: - `&`: Logical AND (conjunction) - `|`: Logical OR (disjunction) - `!`: Logical NOT (negation) - `F`: Eventually (future) - `G`: Always (globally) - `->`: Implication - `<-`: Reverse implication Example formulas: .. code-block:: python # Eventually reach goal "F(goal_reached)" # Always avoid obstacles "G(!obstacle_hit)" # Eventually reach goal while always avoiding obstacles "F(goal_reached) & G(!obstacle_hit)" # If close to obstacle, then slow down "G(close_to_obstacle -> slow_speed)"