BATON: A Multimodal Benchmark for Bidirectional Automation Transition Observation in Naturalistic Driving
Yuhang Wang, Yiyao Xu, Chaoyun Yang, Lingyao Li, Jingran Sun, Hao Zhou
Don't rely on camera-only systems to predict driver transitions. Integrate vehicle telemetry and route context—they capture complementarity that vision misses. Design takeover warnings with longer lead time than handover prompts.
Drivers must decide when to engage driving automation and when to take back control—a judgment that imposes steep cognitive load and creates safety risks from both over-reliance and delayed intervention.
Method: Synchronized 136.6 hours of naturalistic driving across 127 drivers, capturing front-view video, in-cabin video, CAN bus signals, radar, and GPS context around each control transition. Visual input alone failed: front-view video missed driver state, in-cabin video missed road context. Adding CAN and route signals substantially improved prediction over video-only baselines. Takeover events developed gradually and benefited from longer horizons; handover events depended on immediate cues—an asymmetry with direct HMI implications.
Caveats: Benchmark tasks defined but baseline performance leaves room for improvement. Real-time deployment latency not tested.
Reflections: What prediction horizon minimizes false alarms while maintaining safety margins for takeover events? · Do transition patterns generalize across vehicle types and automation levels (L2 vs L3)? · Can driver-specific models trained on individual transition history outperform population-level baselines?