浏览全部资源
扫码关注微信
1.哈尔滨工业大学 控制理论与制导技术研究中心,黑龙江 哈尔滨 150001
2.四川航天系统工程研究所,四川 成都 610100
[ "杨烨峰(1994-),男,黑龙江佳木斯人,博士研究生,2017年于哈尔滨工业大学获得学士学位,研究方向为强化学习控制、自适应控制及飞行器控制。E-mail:18B904013@stu.hit.edu.cn" ]
班晓军(1978-),男,陕西渭南人,教授,博士生导师,2001年于哈尔滨工程大学获得学士学位,2003年、2006年于哈尔滨工业大学分别获得硕士与博士学位,现为哈尔滨工业大学航天学院控制理论与制导技术研究中心教学副主任,从事强化学习控制、模糊控制理论及应用、鲁棒增益调度控制理论及应用、系统辨识理论与应用、机电运动伺服控制系统设计、飞行器控制方面的研究。E-mail:banxiaojun@hit.edu.cn BAN Xiao-jun, E-mail: banxiaojun@hit.edu.cn
收稿日期:2019-07-08,
录用日期:2019-8-14,
纸质出版日期:2019-11-15
移动端阅览
杨烨峰, 邓凯, 左英琦, 等. PILCO框架对飞行姿态模拟器系统的参数设计与优化[J]. 光学 精密工程, 2019,27(11):2365-2373.
Ye-feng YANG, Kai DENG, Ying-qi ZUO, et al. Parameter design and optimization of a flight attitude simulator system based on PILCO framework[J]. Optics and precision engineering, 2019, 27(11): 2365-2373.
杨烨峰, 邓凯, 左英琦, 等. PILCO框架对飞行姿态模拟器系统的参数设计与优化[J]. 光学 精密工程, 2019,27(11):2365-2373. DOI: 10.3788/OPE.20192711.2365.
Ye-feng YANG, Kai DENG, Ying-qi ZUO, et al. Parameter design and optimization of a flight attitude simulator system based on PILCO framework[J]. Optics and precision engineering, 2019, 27(11): 2365-2373. DOI: 10.3788/OPE.20192711.2365.
PID控制是飞行器控制中应用最广泛的控制方法,但是PID参数的调节往往十分繁琐。为了实现飞行模拟器控制系统自主优化PID控制器的参数,从而完成系统的稳定控制,本文使用强化学习中的概率推理学习控制算法(Probabilistic Inference for Learning Control
PILCO)自主优化PID控制器的参数。首先,利用输入输出数据拟合出系统的概率动力学模型,并使用策略评估的方法对当前PID控制器进行评价;最后,使用策略提升的方式对当前PID控制器进行优化。在系统采样频率为100 Hz,每次采集8 s数据的实验中,经过10个回合的离线训练之后,系统控制效果已经可以满足要求,PID控制器参数已经收敛。经过PILCO优化的飞行姿态模拟器在定点实验中表现出良好的鲁棒性,表明PILCO算法可以优化PID控制器的参数,并且在解决非线性控制和参数优化方面具有很大潜能。
Proportional-integral-derivative (PID) controllers are widely used in flight control systems. However
it is often very cumbersome to adjust the parameters of a PID controller. In this study
we use Probabilistic Inference for Learning Control (PILCO) to optimize the parameters of a PID controller. As the first step
we develop a probabilistic dynamics model of the flight control system using input and output data. Next
the existing PID controller is evaluated using the policy evaluation method. Finally
the evaluated PID controller is optimized by policy update. The sampling frequency of the system is 100 Hz and the data acquisition time per round is 8 s. The optimized PID controller can achieve stable control post 10 rounds of offline training. Through PILCO optimization
the flight attitude simulator performed robustly in a fixed-point experiment
indicating that PILCO has tremendous potential in solving nonlinear control and parameter optimization problems.
高九州, 贾宏光.无人机自主着陆纵向控制律设计[J].光学 精密工程, 2016, 24(7):1799-1806.
GAO J ZH, JIA H G. Design of longitudinal control law for small fixed-wing UAV during auto landing[J]. Opt. Precision Eng. , 2016, 24(7):1799-1806. (in Chinese)
李迪, 陈向坚, 续志军.增益自适应滑模控制器在微型飞行器飞行姿态控制中的应用[J].光学 精密工程, 2013, 21(5):1183-1191.
LI D, CHEN X J, XU ZH J. Gain adaptive sliding mode controller used for flight attitude control of MAV[J]. Opt. Precision Eng. , 2013, 21(5):1183-1191. (in Chinese)
尹航, 杨烨峰, 赵岩.二自由度飞行姿态模拟器自整定控制器设计[J].电机与控制学报, 2018, 22(4):105-112.
YIN H, YANG Y F, ZHAO Y, Self-tuning controller design for a 2-DOF flight attitude simulator[J]. Electric Machines and Control , 2018, 22(4):105-112. (in Chinese)
BUȘONIUA L, BRUINB T D, TOLIÇC D , et al .. Reinforcement learning for control: performance, stability, and deep approximators[J]. Annual Reviews in Control , 2018(5):1-18.
RECHT B. A tour of reinforcement learning: the view from continuous control[Z/OL]. https://arxiv.org/pdf/1806.09460.pdf https://arxiv.org/pdf/1806.09460.pdf .[2018-09-09].
DONG L, GUANG-HONG Y. Model-free adaptive control design for nonlinear discrete-time processes with reinforcement learning techniques[J]. International Journal of Systems Science , 2018, 49(11):2298-2308.
LEVINE S. Reinforcement learning and control as probabilistic inference: tutorial and review[Z/OL]. https://arxiv.org/abs/1805.00909 https://arxiv.org/abs/1805.00909 .[2018-05-20].
张天泽.基于强化学习的四旋翼无人机路径规划方法研究[D].哈尔滨: 哈尔滨工业大学, 2018. http://cdmd.cnki.com.cn/Article/CDMD-10213-1018896327.htm
ZHANG T. Research on Path Planning Method of Quadrotor UAV Based on Reinforcement Learning [D]. Harbin: Harbin Institute of Technology, 2018.(in Chinese)
KAELBLING L P, LITTMAN M L, MOORE A W. Reinforcement learning: a survey[J]. Artificial Intelligence Research, 1996, 4(1):237-285.
CHUA K, CALANDRA R, MCALLISTER R, et al .. Deep reinforcement learning in a handful of trials using probabilistic dynamics models[Z/OL]. https://arxiv.org/abs/1805.12114 https://arxiv.org/abs/1805.12114 .[2018-11-02].
DEISENROTH M, RASMUSSEN C. PILCO: A model-based and data-efficient approach to policy search[C]. International Conference on International Conference on Machine Learning. Omnipress , 2011. https://www.researchgate.net/publication/221345233_PILCO_A_Model-Based_and_Data-Efficient_Approach_to_Policy_Search
RICHARD S, ANDREW G. Reinforcement Learning: An Introduction [M]. Second Edition. London: The MIT Press, 2016:78-88.
DURRANT-WHYTE H, ROY N, ABBEEL P. Learning to control a low-cost manipulator using data-efficient reinforcement learning[C]. Robotics : Science and Systems Ⅶ . MIT Press , 2011. https://www.researchgate.net/publication/221344493_Learning_to_Control_a_Low-Cost_Manipulator_using_Data-Efficient_Reinforcement_Learning?_sg=o9yGsYC6qIAsC1CWW6DUbSVs8zFaNclJWJx0ptwVgZOkwd1FKS62ir30zS5JczbLhVa3rj6mS6kPt7vOi3TcUw
DEISENROTH M P. Efficient Reinforcement Learning using Gaussian Processes [D]. Karlsruhe: Karlsruhe Institute of Technology, 2015.
0
浏览量
86
下载量
1
CSCD
关联资源
相关文章
相关作者
相关机构