长沙理工大学学报(自然科学版)
基于多智能体强化学习的多部件系统维修优化
CSTR:
作者:
作者单位:

(东南大学 机械工程学院,江苏 南京 211189)

作者简介:

通讯作者:

周一帆(1981—)(ORCID:0000-0002-2898-0632),男,教授,主要从事可靠性、维修优化方面的研究。

中图分类号:

TH17

基金项目:

国家自然科学基金资助项目(72071044)


Maintenance optimization of multi-component system based on multi-agent reinforcement learning
Author:
Affiliation:

(School of Mechanical Engineering, Southeast University, Nanjing 211189, China)

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    【目的】研究多智能体强化学习算法用于多部件生产系统维修优化的有效性,及维修优化领域知识用于强化学习的可行性。【方法】将生产系统的维修决策建模为马尔可夫决策过程(Markov decision process, MDP),并采用一种基于奖励塑造的分布式Q学习(shaped reward distributed Q-learning,SR-DQL)算法对其进行求解。通过对智能体的设计和奖励塑造,把维修优化的领域知识应用于强化学习中。【结果】使用包含5个生产单元和4个缓冲库存的生产系统对本文所提出的SR-DQL算法进行验证。相较于Q学习算法,SR-DQL算法能够提升6%的平均收益。此外,由该算法计算得到的平均收益也比由分布式Q学习算法和深度强化学习算法计算得到的大。【结论】多智能体强化学习能有效处理大规模生产系统的维修优化问题,添加奖励塑造可以提升算法性能,并得到更优的维修策略。

    Abstract:

    [Purposes] This paper investigates the effectiveness of multi-agent reinforcement learning algorithms for maintenance optimization of multi-component production system. The feasibility of applying domain knowledge of maintenance optimization in reinforcement learning is also studied. [Methods] The maintenance decision making process of the production system was modeled as a Markov decision process (MDP), which was solved by a shaped reward distributed Q-learning(SR-DQL)algorithm. The domain knowledge of maintenance optimization was introduced into reinforcement learning by designing parameters of agents and reward shaping. [Findings] The proposed methods were validated using a production system with five production units and four inventory buffers. The proposed SR-DQL algorithm had a 6% ehancement of average revenuse comparing with the commonly used Q-learning. SR-DQL also outperformed distributed Q-learning and deep reinforcement learning algorithms. [Conclusions] The SR-DQL algorithm can effectively deal with the maintenance optimization problem of large-scale production systems, and reward shaping can improve the performance of the reinforcement learning algorithm.

    参考文献
    相似文献
    引证文献
引用本文

周一帆,郭凯,李帮诚.基于多智能体强化学习的多部件系统维修优化[J].长沙理工大学学报(自然科学版),2023,20(2):27-34.
ZHOU Yifan, GUO Kai, LI Bangcheng. Maintenance optimization of multi-component system based on multi-agent reinforcement learning[J]. Journal of Changsha University of Science & Technology (Natural Science),2023,20(2):27-34.

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2022-10-11
  • 最后修改日期:
  • 录用日期:
  • 在线发布日期: 2023-05-16
  • 出版日期:
文章二维码