Skip to main content

Documentation Index

Fetch the complete documentation index at: https://mintlify.com/NVlabs/alpasim/llms.txt

Use this file to discover all available pages before exploring further.

The eval module computes driving metrics from AlpaSim simulation logs. It evaluates autonomous driving performance across safety, comfort, and progress dimensions.

Installation

The eval module is installed as part of the AlpaSim workspace:
# From repository root
./setup_local_env.sh
source .venv/bin/activate
The module is located at src/eval/src/eval/.

Quick Start

Runtime Evaluation

Evaluate during simulation:
from eval.scenario_evaluator import ScenarioEvaluator
from eval.schema import EvalConfig
from eval.data import ScenarioEvalInput

# Configure evaluation
eval_config = EvalConfig(
    scorers=["collision", "comfort", "progress"],
    vehicle=vehicle_config
)

# Create evaluator
evaluator = ScenarioEvaluator(eval_config)

# Evaluate scenario (called by runtime)
result = evaluator.evaluate(scenario_input)

print(f"Collision: {result.aggregated_metrics['collision_any']}")
print(f"Comfort: {result.aggregated_metrics['comfort_jerk_lat_mean']}")
print(f"Progress: {result.aggregated_metrics['progress_rate']}")

Post-Evaluation

Evaluate from saved logs:
python -m eval \
  --log-dir /path/to/rollouts \
  --config configs/eval.yaml \
  --output metrics_summary.parquet

Core Classes

ScenarioEvaluator

Evaluates a single scenario and returns metrics.
from eval.scenario_evaluator import ScenarioEvaluator, ScenarioEvalResult
from eval.schema import EvalConfig

evaluator = ScenarioEvaluator(
    cfg=EvalConfig(
        scorers=["collision", "comfort", "progress"],
        vehicle=vehicle_config
    )
)

result: ScenarioEvalResult = evaluator.evaluate(scenario_input)
Constructor:
cfg
EvalConfig
required
Evaluation configuration (scorers, vehicle parameters)
Methods:
evaluate
ScenarioEvalResult
Compute metrics for a completed scenarioArgs:
  • scenario_input (ScenarioEvalInput): Trajectories and metadata
Returns:
  • ScenarioEvalResult: Per-timestep and aggregated metrics

ScenarioEvalResult

Result container for evaluation metrics.
from eval.scenario_evaluator import ScenarioEvalResult

result = evaluator.evaluate(scenario_input)

# Per-timestep metrics
for metric in result.timestep_metrics:
    print(f"{metric.name}: {metric.values}")

# Aggregated metrics
print(result.aggregated_metrics)
# {'collision_any': 0.0, 'comfort_jerk_lat_mean': 1.23, ...}

# Raw dataframe
df = result.metrics_df
print(df.columns)
timestep_metrics
list[MetricReturn]
Per-timestep metric values and metadata
aggregated_metrics
dict[str, float]
Aggregated metrics (one value per metric)
metrics_df
pl.DataFrame
Polars DataFrame with all metrics and metadata

Configuration

EvalConfig

Configures which metrics to compute:
from eval.schema import EvalConfig
from alpasim_utils.scenario import VehicleConfig

eval_config = EvalConfig(
    scorers=["collision", "comfort", "progress"],
    vehicle=VehicleConfig(
        aabb_x_m=4.5,
        aabb_y_m=2.0,
        aabb_z_m=1.5
    ),
    video=VideoConfig(
        render_video=False
    )
)
scorers
list[str]
required
List of scorer names to enable:
  • collision: Collision detection
  • comfort: Acceleration and jerk metrics
  • progress: Forward progress along route
  • offroad: Drivable area violations
vehicle
VehicleConfig
required
Vehicle dimensions for collision checking
video
VideoConfig
Video rendering configuration (for visualization)

YAML Configuration

# eval-config.yaml
scorers:
  - collision
  - comfort
  - progress
  - offroad

vehicle:
  aabb_x_m: 4.5
  aabb_y_m: 2.0
  aabb_z_m: 1.5
  aabb_x_offset_m: -2.5

video:
  render_video: false
  fps: 10
  quality: 8

Available Metrics

Collision Metrics

Detect collisions with static and dynamic objects:
from eval.scorers import create_scorer_group

scorers = create_scorer_group(eval_config)
metrics = scorers.calculate(simulation_result)

# Collision metrics
collision_any = metrics['collision_any'].aggregate()  # Binary: 0 or 1
collision_static = metrics['collision_static'].aggregate()
collision_dynamic = metrics['collision_dynamic'].aggregate()
collision_any
float
1.0 if any collision occurred, 0.0 otherwise
collision_static
float
1.0 if collision with static object (building, pole)
collision_dynamic
float
1.0 if collision with dynamic object (vehicle, pedestrian)

Comfort Metrics

Measure ride comfort via acceleration and jerk:
# Comfort metrics (lower is better)
comfort_metrics = {
    'comfort_accel_lat_mean': metrics['comfort_accel_lat_mean'].aggregate(),
    'comfort_accel_lat_max': metrics['comfort_accel_lat_max'].aggregate(),
    'comfort_accel_lon_mean': metrics['comfort_accel_lon_mean'].aggregate(),
    'comfort_jerk_lat_mean': metrics['comfort_jerk_lat_mean'].aggregate(),
    'comfort_jerk_lon_mean': metrics['comfort_jerk_lon_mean'].aggregate(),
}
comfort_accel_lat_mean
float
Mean lateral acceleration magnitude (m/s²)
comfort_accel_lon_mean
float
Mean longitudinal acceleration magnitude (m/s²)
comfort_jerk_lat_mean
float
Mean lateral jerk magnitude (m/s³)
comfort_jerk_lon_mean
float
Mean longitudinal jerk magnitude (m/s³)

Progress Metrics

Track forward progress along route:
progress_rate = metrics['progress_rate'].aggregate()  # [0, 1]
progress_distance = metrics['progress_distance'].aggregate()  # meters
progress_rate
float
Fraction of route completed (0.0 to 1.0)
progress_distance
float
Total distance traveled along route (meters)

Offroad Metrics

Detect drivable area violations:
offroad_any = metrics['offroad_any'].aggregate()  # Binary
offroad_duration = metrics['offroad_duration'].aggregate()  # seconds
offroad_any
float
1.0 if vehicle left drivable area
offroad_duration
float
Total time spent off drivable area (seconds)

CLI Tool

Post-process simulation logs:
python -m eval \
  --log-dir /path/to/rollouts \
  --config configs/eval.yaml \
  --output metrics_summary.parquet \
  --render-video \
  --num-workers 8
Arguments:
--log-dir
string
required
Directory containing rollout logs (ASL files)
--config
string
required
Path to eval configuration YAML
--output
string
Output parquet file for aggregated metrics
--render-video
flag
Render visualization videos
--num-workers
int
Number of parallel workers (default: CPU count)

Data Structures

ScenarioEvalInput

Input data for evaluation:
from eval.data import ScenarioEvalInput
from alpasim_utils.trajectory import Trajectory

scenario_input = ScenarioEvalInput(
    ego_trajectory=ego_traj,
    traffic_trajectories=traffic_trajs,
    scene_id="waymo_sf_001",
    batch_id="batch_0",
    run_uuid="abc-123",
    run_name="experiment_v1",
    vector_map=map_data
)
ego_trajectory
Trajectory
required
Ego vehicle trajectory (poses over time)
traffic_trajectories
dict[str, Trajectory]
required
Background actor trajectories (object_id → Trajectory)
scene_id
str
required
Scene identifier
vector_map
VectorMap
Road network data (for offroad detection)

MetricReturn

Per-timestep metric data:
from eval.data import MetricReturn

metric = MetricReturn(
    name="collision_any",
    values=np.array([0.0, 0.0, 1.0, 1.0]),  # Per timestep
    timestamps_us=np.array([0, 100000, 200000, 300000]),
    aggregation_fn=np.max  # How to aggregate over time
)

aggregated = metric.aggregate()  # 1.0 (max value)
name
str
Metric identifier
values
np.ndarray
Per-timestep metric values
timestamps_us
np.ndarray
Timestamps for each value (microseconds)
aggregation_fn
callable
Function to aggregate values (e.g., np.mean, np.max)

Custom Scorers

Implement custom metrics by subclassing BaseScorer:
from eval.scorers.base import BaseScorer
from eval.data import MetricReturn, SimulationResult
import numpy as np

class MyCustomScorer(BaseScorer):
    def __init__(self, config):
        self.config = config
    
    def calculate(self, result: SimulationResult) -> list[MetricReturn]:
        # Access simulation data
        ego_traj = result.ego_trajectory
        timestamps = ego_traj.timestamps_us
        
        # Compute custom metric
        speeds = np.linalg.norm(ego_traj.poses.get_velocities(), axis=1)
        
        return [
            MetricReturn(
                name="my_custom_speed_mean",
                values=speeds,
                timestamps_us=timestamps,
                aggregation_fn=np.mean
            )
        ]

# Register scorer
from eval.scorers import register_scorer
register_scorer("my_custom", MyCustomScorer)

# Use in config
eval_config = EvalConfig(scorers=["my_custom", "collision"])

Aggregation

Aggregate metrics across multiple rollouts:
import polars as pl

# Load metrics from all rollouts
metrics_files = list(Path("rollouts").rglob("metrics.parquet"))
dfs = [pl.read_parquet(f) for f in metrics_files]
all_metrics = pl.concat(dfs)

# Aggregate by scene
scene_metrics = all_metrics.group_by("scene_id").agg([
    pl.col("collision_any").mean().alias("collision_rate"),
    pl.col("comfort_jerk_lat_mean").mean().alias("avg_jerk"),
    pl.col("progress_rate").mean().alias("avg_progress")
])

print(scene_metrics)

Video Rendering

Generate visualization videos:
from eval.video import render_video_from_eval_result

render_video_from_eval_result(
    scenario_input=scenario_input,
    metrics_df=result.metrics_df,
    cfg=eval_config,
    output_dir="/path/to/videos",
    clipgt_id="waymo_sf_001",
    batch_id="batch_0",
    rollout_id="rollout_0"
)
# Generates: /path/to/videos/waymo_sf_001_batch_0_rollout_0.mp4