Coopernaut: End-to-End Driving with Cooperative Perception for Networked Vehicles

Jiaxun Cui*1    Hang Qiu*2    Dian Chen1    Peter Stone1,3    Yuke Zhu1   

1The University of Texas at Austin    2Stanford University    3Sony AI   

CVPR 2022

Paper | Code | Dataset | Bibtex

Optical sensors and learning algorithms for autonomous vehicles have dramatically advanced in the past few years. Nonetheless, the reliability of today's autonomous vehicles is hindered by the limited line-of-sight sensing capability and the brittleness of data-driven methods in handling extreme situations. With recent developments of telecommunication technologies, cooperative perception with vehicle-to-vehicle communications has become a promising paradigm to enhance autonomous driving in dangerous or emergency situations. We introduce COOPERNAUT, an end-to-end learning model that uses cross-vehicle perception for vision-based cooperative driving. Our model encodes LiDAR information into compact point-based representations that can be transmitted as messages between vehicles via realistic wireless channels. To evaluate our model, we develop AutoCastSim, a network-augmented driving simulation framework with example accident-prone scenarios. Our experiments on AutoCastSim suggest that our cooperative perception driving models lead to a 40% improvement in average success rate over egocentric driving models in these challenging driving situations and a 5 times smaller bandwidth requirement than prior work V2VNet.

Coopernaut Overview

We introduce Coopernaut, an end-to-end point-based model that uses cross-vehicle perception for vision-based cooperative driving. Our model encodes LiDAR information into compact point-based representations that can be transmitted. It contains a Point encoder to extract critical information locally for sharing, a Representation Aggregator for merging multi-vehicle messages, and a Control Module to reason the joint messages. The message produced by the encoder has 128 keypoint coordinates and their corresponding features. The message is then spatially transformed into the ego frame. The ego vehicle merges received messages and performs max voxel pooling on the joint representation. Finally, the Aggregator synthesizes the joint representation from all the neighbors as well as the ego vehicle itself before sending them to the Control Module to generate control decisions. The numbers in parentheses specify the data dimensions as messages between vehicles via realistic wireless channels.

AutoCastSim Framework

Overtaking Left Turn Red Light Violation
We developed AutoCastSim, a simulation framework that offers network-augmented autonomous driving simulation on top of CARLA. This simulation framework allows custom designs of various traffic scenarios for training and evaluating autonomous driving models. The simulated vehicles can be configured with realistic wireless communications. It also provides a path planning-based oracle expert who has access to privileged environment information to generate action supervision for imitation learning. Above show three example challenging traffic scenarios we designed in AutoCastSim as the evaluation benchmark for Coopernaut. We have made AutoCastSim open-source. You can download this simulation framework here.

Qualitative Results

Here we provide qualitative side-to-side comparisons between the No V2V Sharing driving model, which makes driving actions solely based on the line-of-sight sensing, and Coopernaut, our model that makes decisions based on the augmented field of view from cooperative perception. Please click on the thumbnails to switch to a specific scenario.
<b>Scenario 6: Overtaking.</b>
      The controlled ego car will make a lane-change maneuver at the two-way yellow-dashed road when a truck is stuck in front of it. Our model avoids collisions by acting less aggressively and appropriately yielding to the opposite-going vehicles.
<b>Scenario 8: Left Turn.</b>
      The red car is going straight in the opposite direction, occluded behind the orange truck. Our model avoids the collisions by properly yielding to the red car before left-turning, even with the partially observable situation.
<b>Scenario 10: Red Light Violation.</b>
      The controlled vehicle is going straight to pass an intersection on green lights. Coopernaut identifies the abnormal behaviors of the collider(red car) and proactively hard brakes to avoid the potential collision.
1 / 3
2 / 3
3 / 3

Driving Dataset

Overtaking Left Turn Red Light Violation
We provide a driving dataset for imitation learning for our benchmark. You can download the dataset here. Furthermore, you can collect your own dataset by running our data collection scripts provided in the public GitHub repository Coopernaut. The kick-start dataset consists of 3 scenarios, each of which has a Train Set and a Validation Set. The Train set of a scenario contains on average 12 trajectories in total, with 3 of them being accident-prone and 9 being normal driving trajectories.


If you are interested in citing AutoCastSim or Coopernaut in your work, we encourage you to use the following BibTeX:
    title = {Coopernaut: End-to-End Driving with Cooperative Perception for Networked Vehicles},
    author = {Jiaxun Cui and Hang Qiu and Dian Chen and Peter Stone and Yuke Zhu},
    booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    year = {2022}