Research on deep neural network-based multimodal target detection algorithm and its application to point cloud data

Niya Dong1, Yi Lin 1
1College of Communication and Information Engineering, Chongqing College of Mobile Communication, Chongqing, 401520, China

Abstract

Aiming at the problems of poor point cloud data fusion in traditional MLP models, this paper proposes a multimodal 3D target detection network based on KANs. A KANDyVFE encoder incorporating a fusion layer is designed with KANs as the backbone, and a self-attention mechanism is used to dynamically fuse point cloud features. Two datasets, KITTI and WaymoOpen, are selected as 3D target detection datasets to explore the performance level of the algorithm through controlled experiments. Based on ablation experiments, the effectiveness of the KANDyVFE encoder and the self-attention fusion module is verified. The proposed algorithm achieves 80.72% and 80.23% 3DmAP and 3DmAPH on the WaymoOpen dataset for LEVEL_1, which is 2.14% and 2.17% better than the closest BtcDet method, and achieves the same advanced performance on LEVEL_2. When the KANDyVFE encoder module is not used, the 3DmAP and 3DmAPH are only 72.36% and 74.35%, respectively, and the addition of the KANDyVFE encoder and the self-attention fusion module achieves 91.33% and 92.09% for 3DmAP and 3DmAPH, respectively. The experimental results validate the effectiveness of KANs in point cloud applications, and the ablation experiments further demonstrate the performance improvement brought by the designed modules.

Keywords: multimodal target detection; point cloud features; KANs; KANDyVFE; self-attention mechanism