Research on Semantic Segmentation Methods for RGB-D Urban Scenes in the Context of Artificial Intelligence

Abstract

To solve the problem of identifying intrinsic relationships between objects and mirror segmentation in semantic segmentation of urban scenes using current multi-modal data, this study innovatively integrates color images, depth information, and thermal images to propose a network model that integrates modal memory sharing and form complementarity, and a hierarchical assisted fusion network model. Compared with existing advanced urban scene semantic segmentation methods, the proposed method performed excellently in terms of performance, with an average pixel accuracy and mean intersection over union of over 80% for different objects. In addition, the research method achieved clearer and more complete segmentation results by strengthening contextual associations, and edge processing is also smoother. Even in object segmentation with similarities in distance, shape, and brightness such as “vegetation” and “sidewalk”, the research method still maintained high accuracy. The research method can effectively handle the complexity of urban scenes, providing a new solution for semantic segmentation of multi-modal data in urban scenes.

Keywords: RGB-D, Knowledge distillation, Modal adaptation, Urban scenes, Semantic segmentation