A Strip Pooling Attention Network for Urban Scene Images Segmentation

Yaojie Zhang1, Yanling Li1
1Department of Computer Science, Changzhi University, Changzhi, Shanxi, 046011, China

Abstract

The field of urban scene image segmentation is a crucial task in the field of computer vision. Aiming at the problems of large parameter count and insufficient image segmentation accuracy of the traditional DeepLabV3+ model, an improved lightweight DeepLabV3+ model is designed. The overall performance of the model is improved by replacing the Xception backbone network with MobileNetV2, introducing the band pooling module and the densely connected null pyramid module in ASPP, and using the GD-FAM multi-feature fusion module in the fusion stage. Using Cityscapes as the dataset, the model experiment results show that compared with the traditional Deeplabv3+ model, this paper’s method increases the target category IoUs of urban scenes such as pedestrians, cyclists, and columns by 3.1%, 4.41%, and 6.74%, respectively. Therefore, the segmentation effect of the model in this paper is significantly better than the segmentation effect of other models. The mIoU of the MobileNetV2 backbone network is 4.91% higher than the baseline model. The loss function change curve of the model shows that it tends to converge after 100 iterations. In summary, the overall segmentation performance of the improved model is significantly improved.

Keywords: image segmentation, urban scenes, improved DeepLabV3+ model, band pooling, GD-FAM