Quick Start Guide

This guide will walk you through the basic usage of gym-tl-tools to wrap your Gymnasium environments with temporal logic rewards.

Basic Concepts

Temporal Logic (TL): A formal specification language that allows you to express properties about sequences of states over time.

Atomic Predicates: Basic propositions that can be evaluated as true or false based on the current state.

Automaton: A finite state machine derived from your temporal logic formula that tracks progress toward goal satisfaction.

Step-by-Step Usage

1. Define Atomic Predicates

Create predicates that represent the basic conditions in your environment:

from gym_tl_tools import Predicate

atomic_predicates = [
    Predicate(name="goal_reached", formula="d_robot_goal < 1.0"),
    Predicate(name="obstacle_hit", formula="d_robot_obstacle < 1.0"),
    Predicate(name="safe_speed", formula="robot_speed < 2.0"),
]

2. Create a Variable Value Generator

Implement a class that extracts variables from your environment’s observations and info:

from gym_tl_tools import BaseVarValueInfoGenerator

class MyVarValueGenerator(BaseVarValueInfoGenerator):
    def get_var_values(self, env, obs, info):
        return {
            "d_robot_goal": info.get("d_robot_goal", float('inf')),
            "d_robot_obstacle": info.get("d_robot_obstacle", float('inf')),
            "robot_speed": np.linalg.norm(obs.get("velocity", [0, 0])),
        }

3. Specify Your Temporal Logic Formula

Write your specification using the predicate names:

# Eventually reach goal while always avoiding obstacles and maintaining safe speed
tl_spec = "F(goal_reached) & G(!obstacle_hit & safe_speed)"

4. Wrap Your Environment

Apply the wrapper to your Gymnasium environment:

from gym_tl_tools import TLObservationReward
import gymnasium as gym

env = gym.make("YourEnv-v0")
wrapped_env = TLObservationReward(
    env,
    tl_spec=tl_spec,
    atomic_predicates=atomic_predicates,
    var_value_info_generator=MyVarValueGenerator(),
)

5. Use the Wrapped Environment

The wrapped environment can be used like any Gymnasium environment:

obs, info = wrapped_env.reset()
print(f"Initial observation keys: {obs.keys()}")
# Output: dict_keys(['obs', 'aut_state'])

done = False
total_reward = 0
while not done:
    action = wrapped_env.action_space.sample()
    obs, reward, terminated, truncated, info = wrapped_env.step(action)
    total_reward += reward
    done = terminated or truncated

    # Check automaton status
    if info.get("is_success"):
        print("Goal achieved!")
    elif info.get("is_failure"):
        print("Specification violated!")

Understanding the Output

Observation Structure: - Original observations are preserved under the “obs” key - Automaton state is added under the “aut_state” key (configurable)

Reward Structure: - Rewards are based on automaton transitions and robustness values - Positive rewards indicate progress toward goal satisfaction - Negative rewards indicate violations or movement toward failure states

Info Dictionary Additions: - is_success: Boolean indicating if the automaton reached a goal state - is_failure: Boolean indicating if the automaton reached a trap state - is_aut_terminated: Boolean indicating if the automaton episode has ended

Temporal Logic Operators

The parser supports the following operators:

  • &: Logical AND (conjunction)

  • |: Logical OR (disjunction)

  • !: Logical NOT (negation)

  • F: Eventually (future)

  • G: Always (globally)

  • ->: Implication

  • <-: Reverse implication

Example formulas:

# Eventually reach goal
"F(goal_reached)"

# Always avoid obstacles
"G(!obstacle_hit)"

# Eventually reach goal while always avoiding obstacles
"F(goal_reached) & G(!obstacle_hit)"

# If close to obstacle, then slow down
"G(close_to_obstacle -> slow_speed)"