In recent years, accurate 3D detection plays an important role in a lot of applications. Autonomous driving, for instance, is one of typical representatives. This paper aims to design an accurate 3D detector that takes both LiDAR point clouds and RGB images as inputs according to the fact. Lidat and camera have their own merits. A deep novel end-to end two-stream learnable architecture, CrossFusion Net, is designed to exploit reatures from both Lidar point clouds as well as RGB images through a hierarchical fusion structure. Specifically, CrossFusion Net utilizes bird's eye view (BEV) of point clouds through projection. Besides, these two feature maps of different streams are fused through the newly introduced CrossFusion (CF) layer.
- IA-09-0003 (2M)