长沙理工大学学报(自然科学版)
融合注意力模块的双结构金字塔场景解析网络
CSTR:
作者:
作者单位:

(长沙理工大学 数学与统计学院,湖南 长沙 410114)

作者简介:

通讯作者:

梁小林(1965—)(ORCID:0000-0002-1338-2947),男,副教授,主要从事大数据分析方面的研究。 E-mail:liang@csust.edu.cn

中图分类号:

TP18

基金项目:

湖南省自然科学基金资助项目(2021JJ30734);长沙理工大学研究生“实践创新与创业能力提升计划”项目(CLSJCX22124)


Two-structure pyramid scene parsing network with attention module
Author:
Affiliation:

(School of Mathematics and Statistics, Changsha University of Science & Technology, Changsha 410114, China)

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    【目的】改善原始图像信息丢失及图像分辨率下降的问题,提高图像语义分割的精度。【方法】提出融合注意力模块的双结构金字塔场景解析网络模型,并利用该模型对图像进行语义分割。首先,使用MobileNet V2模块提取原始图像的主干特征;其次,将特征图送入金字塔池化模块1中,获取上下文信息;然后,使用注意力模块关注重要特征,并对浅层信息进行综合,得到中间特征图;接着,将中间特征图送入金字塔池化模块2,融合局部和全局信息;最后,利用丰富的浅层和深层信息对原始图像进行分割。【结果】在PASCAL VOC 2007数据集上进行的试验表明,平均像素精度和平均交并比分别达到85.64%和78.12%,比金字塔场景解析网络的分别提高了4.95个百分点和12.31个百分点。【结论】本文模型有效解决了图像分割中信息丢失和分辨率下降问题。

    Abstract:

    [Purposes] This study aims to improve the loss of original image information, degradation of image resolution, and accuracy of semantic segmentation of images. [Methods] A two-structure pyramid scene parsing network model with an attention module was proposed, and it was used for the semantic segmentation of images. Firstly, the MobileNet V2 module was used to extract the backbone feature of the original image. Secondly, the feature map was input into the pyramid pooling module 1 to obtain the context information. Then, the attention module was used to pay attention to the important features and synthesize the shallow information to obtain the intermediate feature map. The intermediate feature map was fed into the pyramid pooling module 2 to fuse local and global information. Finally, the original image was segmented by using rich shallow and deep information. [Findings] In PASCAL VOC 2007 dataset, the ratios of the mean pixel accuracy (MPA) and mean intersection over union (MIOU) reach 85.64% and 78.12%, respectively, which are 4.95 and 12.31 percentage points higher than the ratios of MPA and MIOU of the pyramid scene parsing network. [Conclusions] The proposed model can effectively resolve the problems of information loss and resolution degradation in image segmentation.

    参考文献
    相似文献
    引证文献
引用本文

梁小林,王欣怡,黄雅娟,等.融合注意力模块的双结构金字塔场景解析网络[J].长沙理工大学学报(自然科学版),2024,21(5):104-112.
LIANG Xiaolin, WANG Xinyi, HUANG Yajuan, et al. Two-structure pyramid scene parsing network with attention module[J]. Journal of Changsha University of Science & Technology (Natural Science),2024,21(5):104-112.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2022-11-07
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2024-11-23
  • 出版日期:
文章二维码