Documentation Index
Fetch the complete documentation index at: https://mintlify.com/NVlabs/alpasim/llms.txt
Use this file to discover all available pages before exploring further.
The eval module computes driving metrics from AlpaSim simulation logs. It evaluates autonomous driving performance across safety, comfort, and progress dimensions.
Installation
The eval module is installed as part of the AlpaSim workspace:
# From repository root
./setup_local_env.sh
source .venv/bin/activate
The module is located at src/eval/src/eval/.
Quick Start
Runtime Evaluation
Evaluate during simulation:
from eval.scenario_evaluator import ScenarioEvaluator
from eval.schema import EvalConfig
from eval.data import ScenarioEvalInput
# Configure evaluation
eval_config = EvalConfig(
scorers=["collision", "comfort", "progress"],
vehicle=vehicle_config
)
# Create evaluator
evaluator = ScenarioEvaluator(eval_config)
# Evaluate scenario (called by runtime)
result = evaluator.evaluate(scenario_input)
print(f"Collision: {result.aggregated_metrics['collision_any']}")
print(f"Comfort: {result.aggregated_metrics['comfort_jerk_lat_mean']}")
print(f"Progress: {result.aggregated_metrics['progress_rate']}")
Post-Evaluation
Evaluate from saved logs:
python -m eval \
--log-dir /path/to/rollouts \
--config configs/eval.yaml \
--output metrics_summary.parquet
Core Classes
ScenarioEvaluator
Evaluates a single scenario and returns metrics.
from eval.scenario_evaluator import ScenarioEvaluator, ScenarioEvalResult
from eval.schema import EvalConfig
evaluator = ScenarioEvaluator(
cfg=EvalConfig(
scorers=["collision", "comfort", "progress"],
vehicle=vehicle_config
)
)
result: ScenarioEvalResult = evaluator.evaluate(scenario_input)
Constructor:
Evaluation configuration (scorers, vehicle parameters)
Methods:
Compute metrics for a completed scenarioArgs:
scenario_input (ScenarioEvalInput): Trajectories and metadata
Returns:
ScenarioEvalResult: Per-timestep and aggregated metrics
ScenarioEvalResult
Result container for evaluation metrics.
from eval.scenario_evaluator import ScenarioEvalResult
result = evaluator.evaluate(scenario_input)
# Per-timestep metrics
for metric in result.timestep_metrics:
print(f"{metric.name}: {metric.values}")
# Aggregated metrics
print(result.aggregated_metrics)
# {'collision_any': 0.0, 'comfort_jerk_lat_mean': 1.23, ...}
# Raw dataframe
df = result.metrics_df
print(df.columns)
Per-timestep metric values and metadata
Aggregated metrics (one value per metric)
Polars DataFrame with all metrics and metadata
Configuration
EvalConfig
Configures which metrics to compute:
from eval.schema import EvalConfig
from alpasim_utils.scenario import VehicleConfig
eval_config = EvalConfig(
scorers=["collision", "comfort", "progress"],
vehicle=VehicleConfig(
aabb_x_m=4.5,
aabb_y_m=2.0,
aabb_z_m=1.5
),
video=VideoConfig(
render_video=False
)
)
List of scorer names to enable:
collision: Collision detection
comfort: Acceleration and jerk metrics
progress: Forward progress along route
offroad: Drivable area violations
Vehicle dimensions for collision checking
Video rendering configuration (for visualization)
YAML Configuration
# eval-config.yaml
scorers:
- collision
- comfort
- progress
- offroad
vehicle:
aabb_x_m: 4.5
aabb_y_m: 2.0
aabb_z_m: 1.5
aabb_x_offset_m: -2.5
video:
render_video: false
fps: 10
quality: 8
Available Metrics
Collision Metrics
Detect collisions with static and dynamic objects:
from eval.scorers import create_scorer_group
scorers = create_scorer_group(eval_config)
metrics = scorers.calculate(simulation_result)
# Collision metrics
collision_any = metrics['collision_any'].aggregate() # Binary: 0 or 1
collision_static = metrics['collision_static'].aggregate()
collision_dynamic = metrics['collision_dynamic'].aggregate()
1.0 if any collision occurred, 0.0 otherwise
1.0 if collision with static object (building, pole)
1.0 if collision with dynamic object (vehicle, pedestrian)
Comfort Metrics
Measure ride comfort via acceleration and jerk:
# Comfort metrics (lower is better)
comfort_metrics = {
'comfort_accel_lat_mean': metrics['comfort_accel_lat_mean'].aggregate(),
'comfort_accel_lat_max': metrics['comfort_accel_lat_max'].aggregate(),
'comfort_accel_lon_mean': metrics['comfort_accel_lon_mean'].aggregate(),
'comfort_jerk_lat_mean': metrics['comfort_jerk_lat_mean'].aggregate(),
'comfort_jerk_lon_mean': metrics['comfort_jerk_lon_mean'].aggregate(),
}
Mean lateral acceleration magnitude (m/s²)
Mean longitudinal acceleration magnitude (m/s²)
Mean lateral jerk magnitude (m/s³)
Mean longitudinal jerk magnitude (m/s³)
Progress Metrics
Track forward progress along route:
progress_rate = metrics['progress_rate'].aggregate() # [0, 1]
progress_distance = metrics['progress_distance'].aggregate() # meters
Fraction of route completed (0.0 to 1.0)
Total distance traveled along route (meters)
Offroad Metrics
Detect drivable area violations:
offroad_any = metrics['offroad_any'].aggregate() # Binary
offroad_duration = metrics['offroad_duration'].aggregate() # seconds
1.0 if vehicle left drivable area
Total time spent off drivable area (seconds)
Post-process simulation logs:
python -m eval \
--log-dir /path/to/rollouts \
--config configs/eval.yaml \
--output metrics_summary.parquet \
--render-video \
--num-workers 8
Arguments:
Directory containing rollout logs (ASL files)
Path to eval configuration YAML
Output parquet file for aggregated metrics
Render visualization videos
Number of parallel workers (default: CPU count)
Data Structures
Input data for evaluation:
from eval.data import ScenarioEvalInput
from alpasim_utils.trajectory import Trajectory
scenario_input = ScenarioEvalInput(
ego_trajectory=ego_traj,
traffic_trajectories=traffic_trajs,
scene_id="waymo_sf_001",
batch_id="batch_0",
run_uuid="abc-123",
run_name="experiment_v1",
vector_map=map_data
)
Ego vehicle trajectory (poses over time)
traffic_trajectories
dict[str, Trajectory]
required
Background actor trajectories (object_id → Trajectory)
Road network data (for offroad detection)
MetricReturn
Per-timestep metric data:
from eval.data import MetricReturn
metric = MetricReturn(
name="collision_any",
values=np.array([0.0, 0.0, 1.0, 1.0]), # Per timestep
timestamps_us=np.array([0, 100000, 200000, 300000]),
aggregation_fn=np.max # How to aggregate over time
)
aggregated = metric.aggregate() # 1.0 (max value)
Per-timestep metric values
Timestamps for each value (microseconds)
Function to aggregate values (e.g., np.mean, np.max)
Custom Scorers
Implement custom metrics by subclassing BaseScorer:
from eval.scorers.base import BaseScorer
from eval.data import MetricReturn, SimulationResult
import numpy as np
class MyCustomScorer(BaseScorer):
def __init__(self, config):
self.config = config
def calculate(self, result: SimulationResult) -> list[MetricReturn]:
# Access simulation data
ego_traj = result.ego_trajectory
timestamps = ego_traj.timestamps_us
# Compute custom metric
speeds = np.linalg.norm(ego_traj.poses.get_velocities(), axis=1)
return [
MetricReturn(
name="my_custom_speed_mean",
values=speeds,
timestamps_us=timestamps,
aggregation_fn=np.mean
)
]
# Register scorer
from eval.scorers import register_scorer
register_scorer("my_custom", MyCustomScorer)
# Use in config
eval_config = EvalConfig(scorers=["my_custom", "collision"])
Aggregation
Aggregate metrics across multiple rollouts:
import polars as pl
# Load metrics from all rollouts
metrics_files = list(Path("rollouts").rglob("metrics.parquet"))
dfs = [pl.read_parquet(f) for f in metrics_files]
all_metrics = pl.concat(dfs)
# Aggregate by scene
scene_metrics = all_metrics.group_by("scene_id").agg([
pl.col("collision_any").mean().alias("collision_rate"),
pl.col("comfort_jerk_lat_mean").mean().alias("avg_jerk"),
pl.col("progress_rate").mean().alias("avg_progress")
])
print(scene_metrics)
Video Rendering
Generate visualization videos:
from eval.video import render_video_from_eval_result
render_video_from_eval_result(
scenario_input=scenario_input,
metrics_df=result.metrics_df,
cfg=eval_config,
output_dir="/path/to/videos",
clipgt_id="waymo_sf_001",
batch_id="batch_0",
rollout_id="rollout_0"
)
# Generates: /path/to/videos/waymo_sf_001_batch_0_rollout_0.mp4