Evaluation Metrics and Video Layouts

AlpaSim provides comprehensive evaluation capabilities to assess driving quality, safety, and performance. The evaluation system computes detailed metrics and generates visualization videos to help understand autonomous driving behavior.

Evaluation Modes

AlpaSim supports two evaluation modes:

In-Runtime Evaluation (Default)
Separate Job Evaluation

By default, evaluation runs within the runtime after each rollout completes. This provides immediate feedback and is suitable for most use cases.

eval:
  run_in_runtime: true

No additional configuration needed - this is the default mode.

For large-scale evaluations or resource-constrained environments, you can run evaluation in a separate container:

alpasim_wizard +deploy=local \
  +eval=eval_in_separate_job \
  wizard.log_dir=$PWD/tutorial

This uses dedicated eval and post_eval_aggregation services defined in the configuration.

Metrics Computation

AlpaSim computes multiple categories of metrics to evaluate driving performance:

Safety Metrics

Binary metrics indicating pass (0) or fail (1):

Collision Metrics

collision_at_fault: Driver caused a collision (front/lateral impact)collision_rear: Rear-end collision (not at fault)collision_front: Front collision detectioncollision_lateral: Side collision detectioncollision_any: Any collision occurredThese metrics are computed by analyzing vehicle trajectories and detecting overlaps between the ego vehicle and other agents.

Road Compliance

offroad: Vehicle drove off the designated road surfaceoffroad_or_collision_at_fault: Combined metric for any critical safety violationComputed using the vehicle polygon and road geometry from the map data.

Performance Metrics

Continuous metrics measuring driving quality:

Trajectory Deviation

dist_to_gt_trajectory: Maximum distance from ground truth path (meters)

Lower is better
Indicates how closely the driver follows expected routes
Aggregated using MAX over time (worst deviation during the drive)

dist_to_gt_location: Distance to ground truth at each timestepConfiguration:

eval:
  aggregation_modifiers:
    max_dist_to_gt_trajectory: 4.0  # Threshold in meters

Progress Metrics

progress: Absolute distance traveled along the routeprogress_rel: Relative progress compared to ground truthduration_frac_20s: Fraction of 20s drive completed before any failure

1.0 = completed full 20s without issues
Less than 1.0 = failed early (collision, off-road, or excessive deviation)

Plan Quality (MinADE)

Minimum Average Displacement Error at various time horizons:

eval:
  scorers:
    min_ade:
      time_deltas: [0.5, 1.0, 2.5, 5.0]  # seconds
      incl_z: false  # Exclude vertical dimension
      target: GT     # Compare to ground truth (or "SELF")

Measures how accurately the predicted trajectory matches the actual trajectory at different prediction horizons.

Plan Deviation

Measures deviation from planned trajectory:

eval:
  scorers:
    plan_deviation:
      incl_z: false
      avg_decay_rate: 0.1
      min_timesteps: 5

Tracks how well the vehicle follows its own planned path.

Distance Between Incidents

avg_dist_between_incidents: Average kilometers traveled per incident (collision or offroad)

Higher is better
Measures safety over distance

avg_dist_between_incidents_at_fault: Average kilometers traveled per at-fault incident

Excludes rear-end collisions not caused by the driver

Safety Monitor

safety_monitor_triggered: Indicates if safety interventions were required

Video Generation

AlpaSim generates evaluation videos with multiple layout options:

Video Layouts

DEFAULT Layout
REASONING_OVERLAY Layout
Both Layouts

The default layout provides a comprehensive debug view with three panels:Components:

BEV (Bird’s Eye View) map: Top-down view showing:
- Road lanes and edges
- Ego vehicle position
- Traffic agents
- Planned trajectories
- Ground truth ghost vehicle
Camera view: Front camera feed with optional trajectory overlays
Metrics table: Real-time metric values

Configuration:

eval:
  video:
    video_layouts: ["DEFAULT"]
    camera_id_to_render: camera_front_wide_120fov
    overlay_plans_on_camera: true

Map Elements:

map_video:
  map_radius_m: 20
  ego_loc: BOTTOM_CENTER
  rotate_map_to_ego: true
  map_elements_to_plot:
    - ROAD_LANE_CENTER
    - ROAD_LANE_LEFT_EDGE
    - ROAD_LANE_RIGHT_EDGE
    - ROAD_EDGE
    - STOP_LINE
    - GT_LINESTRING
    - EGO_GT_GHOST_POLYGON
    - DRIVER_RESPONSES
    - ROUTE
    - AGENTS

The reasoning overlay layout is designed for models with chain-of-causation reasoning:Components:

First-person camera: Full driver’s view
Reasoning text overlay: Model’s decision explanations
Trajectory chart: Planned vs. actual path visualization

Configuration:

eval:
  video:
    video_layouts: ["REASONING_OVERLAY"]
    reasoning_text_refresh_interval_s: 1.0

Usage:

alpasim_wizard +deploy=local \
  wizard.log_dir=$PWD/tutorial \
  driver=[ar1,ar1_runtime_configs] \
  eval.video.video_layouts=[REASONING_OVERLAY]

This layout is particularly useful for Alpamayo-R1 and other reasoning-capable models.

Generate both layouts for comprehensive analysis:

alpasim_wizard +deploy=local \
  wizard.log_dir=$PWD/tutorial \
  driver=[ar1,ar1_runtime_configs] \
  eval.video.video_layouts=[DEFAULT,REASONING_OVERLAY]

This creates two video files per rollout:

{clipgt_id}_{batch_id}_{rollout_id}_default.mp4
{clipgt_id}_{batch_id}_{rollout_id}_reasoning_overlay.mp4

Video Configuration Options

eval:
  video:
    # Enable/disable video rendering
    render_video: true
    
    # Layout selection
    video_layouts: ["DEFAULT"]  # or ["REASONING_OVERLAY"] or both
    
    # Camera selection
    camera_id_to_render: camera_front_wide_120fov
    
    # Overlay options
    overlay_plans_on_camera: true
    
    # Performance optimization
    render_every_nth_frame: 1  # Render every frame (increase to skip frames)
    
    # Combined video generation
    generate_combined_video: false
    combined_video_speed_factor: 0.33  # Speed adjustment for combined videos
    
    # Reasoning overlay specific
    reasoning_text_refresh_interval_s: 1.0
    
    # Metrics display
    metrics_table_entries:
      - offroad_or_collision_at_fault
      - collision_any
      - collision_at_fault
      - collision_front
      - collision_lateral
      - collision_rear
      - offroad
      - dist_to_gt_trajectory
      - dist_to_gt_location
      - progress
      - progress_rel
      - safety_monitor_triggered

Performance Analysis

AlpaSim automatically generates performance metrics and visualizations:

Metrics Plot

After each simulation, a comprehensive performance visualization is generated at {log_dir}/metrics/metrics_plot.png.

Metrics Plot Components

3x3 Grid Layout:Row 1: RPC Performance

RPC Duration histogram: Total time from call start to coroutine resumption
RPC Blocking histogram: Event loop scheduler delay
RPC Queue Depth histogram: Service saturation levels

Row 2: Simulation Timing

Rollout Duration histogram: Total time per rollout
Step Duration histogram: Time per simulation step
Service Configuration table: Replica counts and capacity

Row 3: Resource Utilization

CPU Utilization boxplots: Per-service CPU usage
GPU Utilization boxplots: GPU compute usage
GPU Memory boxplots: Memory usage with capacity line

Summary Header:

Async worker idle percentage: Runtime idle time
Sim seconds per rollout: Wallclock time per simulation

Performance Metrics File

Raw performance data is stored in {log_dir}/metrics/metrics.prom in Prometheus format.

Locate Metrics

Find the metrics file:

cat {log_dir}/metrics/metrics.prom

View Visualization

Open the generated plot:

open {log_dir}/metrics/metrics_plot.png

Analyze Bottlenecks

Look for:

High queue depth → Increase replicas or concurrent rollouts
High RPC duration → Service optimization needed
Low GPU utilization → Underutilized resources
High idle percentage → Check for bottlenecks

Vector Map Configuration

The evaluation system uses vector maps for spatial analysis:

eval:
  vec_map:
    incl_road_edges: true
    incl_traffic_signs: true
    incl_wait_lines: true
    max_num_lanes: 20
    num_pts_per_lane: 20

Vehicle Configuration

Vehicle geometry for collision detection:

eval:
  vehicle:
    vehicle_corner_roundness: 0.5  # Corner radius for collision detection
    vehicle_shrink_factor: 0.02    # Safety margin (2% shrink)

Parallel Processing

Evaluation can leverage multiple CPU cores:

eval:
  num_processes: 16  # Number of parallel evaluation processes

alpasim_wizard +deploy=local \
  wizard.log_dir=$PWD/tutorial \
  eval.num_processes=1

Performance Considerations:

More processes = faster evaluation but higher CPU usage
Video rendering is CPU-intensive; consider render_every_nth_frame for optimization
For large-scale evaluations, use +eval=eval_in_separate_job

Best Practices

Start with Default Settings

Use default evaluation settings initially:

alpasim_wizard +deploy=local wizard.log_dir=$PWD/tutorial

Enable Reasoning Overlay for AR1

When using reasoning-capable models:

alpasim_wizard +deploy=local \
  wizard.log_dir=$PWD/tutorial \
  driver=[ar1,ar1_runtime_configs] \
  eval.video.video_layouts=[REASONING_OVERLAY]

Optimize for Large-Scale Runs

For extensive evaluations:

alpasim_wizard +deploy=local \
  wizard.log_dir=$PWD/tutorial \
  scenes.test_suite_id=public_2507_ex_failures \
  eval.video.render_every_nth_frame=5 \
  eval.num_processes=32

Get Started

Core Concepts

User Guides

Advanced

Evaluation Metrics and Video Layouts

Evaluation Modes

Metrics Computation

Safety Metrics

Performance Metrics

Distance Between Incidents

Safety Monitor

Video Generation

Video Layouts

Video Configuration Options

Performance Analysis

Metrics Plot

Performance Metrics File

Vector Map Configuration

Vehicle Configuration

Parallel Processing

Best Practices

Get Started

Core Concepts

User Guides

Advanced

Documentation Index

​Evaluation Modes

​Metrics Computation

​Safety Metrics

​Performance Metrics

​Distance Between Incidents

​Safety Monitor

​Video Generation

​Video Layouts

​Video Configuration Options

​Performance Analysis

​Metrics Plot

​Performance Metrics File

​Vector Map Configuration

​Vehicle Configuration

​Parallel Processing

​Best Practices

Evaluation Modes

Metrics Computation

Safety Metrics

Performance Metrics

Distance Between Incidents

Safety Monitor

Video Generation

Video Layouts

Video Configuration Options

Performance Analysis

Metrics Plot

Performance Metrics File

Vector Map Configuration

Vehicle Configuration

Parallel Processing

Best Practices