基于多智能体强化学习的多部件系统维修优化

doi:10.19951/j.cnki.1672-9331.20221011001

长沙理工大学学报（自然科学版）

首页 > 过刊浏览>2023年第20卷第2期 >27-34. DOI:10.19951/j.cnki.1672-9331.20221011001

基于多智能体强化学习的多部件系统维修优化
DOI:
                        10.19951/j.cnki.1672-9331.20221011001
                    
CSTR:
                        [cstr]
                    
作者:
                        
                        
                    
作者单位:(东南大学 机械工程学院，江苏 南京 211189)
作者简介:
通讯作者:周一帆（1981—）（ORCID：0000-0002-2898-0632），男，教授，主要从事可靠性、维修优化方面的研究。
中图分类号:TH17
基金项目:国家自然科学基金资助项目（72071044）

Maintenance optimization of multi-component system based on multi-agent reinforcement learning

Author:

Affiliation:

(School of Mechanical Engineering, Southeast University, Nanjing 211189, China)

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

【目的】研究多智能体强化学习算法用于多部件生产系统维修优化的有效性，及维修优化领域知识用于强化学习的可行性。【方法】将生产系统的维修决策建模为马尔可夫决策过程（Markov decision process, MDP），并采用一种基于奖励塑造的分布式Q学习（shaped reward distributed Q-learning,SR-DQL）算法对其进行求解。通过对智能体的设计和奖励塑造，把维修优化的领域知识应用于强化学习中。【结果】使用包含5个生产单元和4个缓冲库存的生产系统对本文所提出的SR-DQL算法进行验证。相较于Q学习算法，SR-DQL算法能够提升6%的平均收益。此外，由该算法计算得到的平均收益也比由分布式Q学习算法和深度强化学习算法计算得到的大。【结论】多智能体强化学习能有效处理大规模生产系统的维修优化问题，添加奖励塑造可以提升算法性能，并得到更优的维修策略。

Abstract:

[Purposes] This paper investigates the effectiveness of multi-agent reinforcement learning algorithms for maintenance optimization of multi-component production system. The feasibility of applying domain knowledge of maintenance optimization in reinforcement learning is also studied. [Methods] The maintenance decision making process of the production system was modeled as a Markov decision process (MDP), which was solved by a shaped reward distributed Q-learning（SR-DQL）algorithm. The domain knowledge of maintenance optimization was introduced into reinforcement learning by designing parameters of agents and reward shaping. [Findings] The proposed methods were validated using a production system with five production units and four inventory buffers. The proposed SR-DQL algorithm had a 6% ehancement of average revenuse comparing with the commonly used Q-learning. SR-DQL also outperformed distributed Q-learning and deep reinforcement learning algorithms. [Conclusions] The SR-DQL algorithm can effectively deal with the maintenance optimization problem of large-scale production systems, and reward shaping can improve the performance of the reinforcement learning algorithm.

参考文献

相似文献

引证文献

引用本文

周一帆,郭凯,李帮诚.基于多智能体强化学习的多部件系统维修优化[J].长沙理工大学学报（自然科学版）,2023,20(2):27-34.
ZHOU Yifan, GUO Kai, LI Bangcheng. Maintenance optimization of multi-component system based on multi-agent reinforcement learning[J]. Journal of Changsha University of Science & Technology (Natural Science),2023,20(2):27-34.

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:2022-10-11
最后修改日期:
录用日期:
在线发布日期: 2023-05-16
出版日期:

引用本文

分享

文章指标

历史

文章二维码