AlpaSim provides comprehensive evaluation capabilities to assess driving quality, safety, and performance. The evaluation system computes detailed metrics and generates visualization videos to help understand autonomous driving behavior.Documentation Index
Fetch the complete documentation index at: https://mintlify.com/NVlabs/alpasim/llms.txt
Use this file to discover all available pages before exploring further.
Evaluation Modes
AlpaSim supports two evaluation modes:- In-Runtime Evaluation (Default)
- Separate Job Evaluation
By default, evaluation runs within the runtime after each rollout completes. This provides immediate feedback and is suitable for most use cases.
No additional configuration needed - this is the default mode.
Metrics Computation
AlpaSim computes multiple categories of metrics to evaluate driving performance:Safety Metrics
Binary metrics indicating pass (0) or fail (1):Collision Metrics
Collision Metrics
collision_at_fault: Driver caused a collision (front/lateral impact)collision_rear: Rear-end collision (not at fault)collision_front: Front collision detectioncollision_lateral: Side collision detectioncollision_any: Any collision occurredThese metrics are computed by analyzing vehicle trajectories and detecting overlaps between the ego vehicle and other agents.
Road Compliance
Road Compliance
offroad: Vehicle drove off the designated road surfaceoffroad_or_collision_at_fault: Combined metric for any critical safety violationComputed using the vehicle polygon and road geometry from the map data.
Performance Metrics
Continuous metrics measuring driving quality:Trajectory Deviation
Trajectory Deviation
dist_to_gt_trajectory: Maximum distance from ground truth path (meters)
- Lower is better
- Indicates how closely the driver follows expected routes
- Aggregated using MAX over time (worst deviation during the drive)
Progress Metrics
Progress Metrics
progress: Absolute distance traveled along the routeprogress_rel: Relative progress compared to ground truthduration_frac_20s: Fraction of 20s drive completed before any failure
- 1.0 = completed full 20s without issues
- Less than 1.0 = failed early (collision, off-road, or excessive deviation)
Plan Quality (MinADE)
Plan Quality (MinADE)
Minimum Average Displacement Error at various time horizons:Measures how accurately the predicted trajectory matches the actual trajectory at different prediction horizons.
Plan Deviation
Plan Deviation
Measures deviation from planned trajectory:Tracks how well the vehicle follows its own planned path.
Distance Between Incidents
avg_dist_between_incidents: Average kilometers traveled per incident (collision or offroad)- Higher is better
- Measures safety over distance
- Excludes rear-end collisions not caused by the driver
Safety Monitor
safety_monitor_triggered: Indicates if safety interventions were requiredVideo Generation
AlpaSim generates evaluation videos with multiple layout options:Video Layouts
- DEFAULT Layout
- REASONING_OVERLAY Layout
- Both Layouts
The default layout provides a comprehensive debug view with three panels:Components:Map Elements:
- BEV (Bird’s Eye View) map: Top-down view showing:
- Road lanes and edges
- Ego vehicle position
- Traffic agents
- Planned trajectories
- Ground truth ghost vehicle
- Camera view: Front camera feed with optional trajectory overlays
- Metrics table: Real-time metric values
Video Configuration Options
Performance Analysis
AlpaSim automatically generates performance metrics and visualizations:Metrics Plot
After each simulation, a comprehensive performance visualization is generated at{log_dir}/metrics/metrics_plot.png.
Metrics Plot Components
Metrics Plot Components
3x3 Grid Layout:Row 1: RPC Performance
- RPC Duration histogram: Total time from call start to coroutine resumption
- RPC Blocking histogram: Event loop scheduler delay
- RPC Queue Depth histogram: Service saturation levels
- Rollout Duration histogram: Total time per rollout
- Step Duration histogram: Time per simulation step
- Service Configuration table: Replica counts and capacity
- CPU Utilization boxplots: Per-service CPU usage
- GPU Utilization boxplots: GPU compute usage
- GPU Memory boxplots: Memory usage with capacity line
- Async worker idle percentage: Runtime idle time
- Sim seconds per rollout: Wallclock time per simulation
Performance Metrics File
Raw performance data is stored in{log_dir}/metrics/metrics.prom in Prometheus format.