Abstract:[Purposes] This study aims to improve the loss of original image information, degradation of image resolution, and accuracy of semantic segmentation of images. [Methods] A two-structure pyramid scene parsing network model with an attention module was proposed, and it was used for the semantic segmentation of images. Firstly, the MobileNet V2 module was used to extract the backbone feature of the original image. Secondly, the feature map was input into the pyramid pooling module 1 to obtain the context information. Then, the attention module was used to pay attention to the important features and synthesize the shallow information to obtain the intermediate feature map. The intermediate feature map was fed into the pyramid pooling module 2 to fuse local and global information. Finally, the original image was segmented by using rich shallow and deep information. [Findings] In PASCAL VOC 2007 dataset, the ratios of the mean pixel accuracy (MPA) and mean intersection over union (MIOU) reach 85.64% and 78.12%, respectively, which are 4.95 and 12.31 percentage points higher than the ratios of MPA and MIOU of the pyramid scene parsing network. [Conclusions] The proposed model can effectively resolve the problems of information loss and resolution degradation in image segmentation.