PILCO框架对飞行姿态模拟器系统的参数设计与优化

杨烨峰; 邓凯; 左英琦; 班晓军; 黄显林

doi:10.3788/OPE.20192711.2365

您当前的位置：

首页 >

文章列表页 >

PILCO框架对飞行姿态模拟器系统的参数设计与优化

微纳技术与精密机械 | 更新时间：2020-08-13

- PILCO框架对飞行姿态模拟器系统的参数设计与优化
- Parameter design and optimization of a flight attitude simulator system based on PILCO framework
- 光学精密工程 2019年27卷第11期页码：2365-2373
- 作者机构：
  
  1.哈尔滨工业大学控制理论与制导技术研究中心，黑龙江哈尔滨 150001
  2.四川航天系统工程研究所，四川成都 610100
- 作者简介：
  
  [ "杨烨峰(1994-)，男，黑龙江佳木斯人，博士研究生，2017年于哈尔滨工业大学获得学士学位，研究方向为强化学习控制、自适应控制及飞行器控制。E-mail:18B904013@stu.hit.edu.cn" ]
  班晓军(1978-)，男，陕西渭南人，教授，博士生导师，2001年于哈尔滨工程大学获得学士学位，2003年、2006年于哈尔滨工业大学分别获得硕士与博士学位，现为哈尔滨工业大学航天学院控制理论与制导技术研究中心教学副主任，从事强化学习控制、模糊控制理论及应用、鲁棒增益调度控制理论及应用、系统辨识理论与应用、机电运动伺服控制系统设计、飞行器控制方面的研究。E-mail:banxiaojun@hit.edu.cn BAN Xiao-jun, E-mail: banxiaojun@hit.edu.cn
- 基金信息：
  
  国家自然科学基金资助项目(61304006);国家自然科学基金资助项目(61273095)
- DOI：10.3788/OPE.20192711.2365
  中图分类号： TP273;TP249
- 收稿日期：2019-07-08，
  
  录用日期：2019-8-14，
  
  纸质出版日期：2019-11-15
- 稿件说明：
移动端阅览
杨烨峰, 邓凯, 左英琦, 等. PILCO框架对飞行姿态模拟器系统的参数设计与优化[J]. 光学精密工程, 2019,27(11):2365-2373.

Ye-feng YANG, Kai DENG, Ying-qi ZUO, et al. Parameter design and optimization of a flight attitude simulator system based on PILCO framework[J]. Optics and precision engineering, 2019, 27(11): 2365-2373.
杨烨峰, 邓凯, 左英琦, 等. PILCO框架对飞行姿态模拟器系统的参数设计与优化[J]. 光学精密工程, 2019,27(11):2365-2373. DOI： 10.3788/OPE.20192711.2365.

Ye-feng YANG, Kai DENG, Ying-qi ZUO, et al. Parameter design and optimization of a flight attitude simulator system based on PILCO framework[J]. Optics and precision engineering, 2019, 27(11): 2365-2373. DOI： 10.3788/OPE.20192711.2365.

摘要

PID控制是飞行器控制中应用最广泛的控制方法，但是PID参数的调节往往十分繁琐。为了实现飞行模拟器控制系统自主优化PID控制器的参数，从而完成系统的稳定控制，本文使用强化学习中的概率推理学习控制算法(Probabilistic Inference for Learning Control

PILCO)自主优化PID控制器的参数。首先，利用输入输出数据拟合出系统的概率动力学模型，并使用策略评估的方法对当前PID控制器进行评价；最后，使用策略提升的方式对当前PID控制器进行优化。在系统采样频率为100 Hz，每次采集8 s数据的实验中，经过10个回合的离线训练之后，系统控制效果已经可以满足要求，PID控制器参数已经收敛。经过PILCO优化的飞行姿态模拟器在定点实验中表现出良好的鲁棒性，表明PILCO算法可以优化PID控制器的参数，并且在解决非线性控制和参数优化方面具有很大潜能。

Abstract

Proportional-integral-derivative (PID) controllers are widely used in flight control systems. However

it is often very cumbersome to adjust the parameters of a PID controller. In this study

we use Probabilistic Inference for Learning Control (PILCO) to optimize the parameters of a PID controller. As the first step

we develop a probabilistic dynamics model of the flight control system using input and output data. Next

the existing PID controller is evaluated using the policy evaluation method. Finally

the evaluated PID controller is optimized by policy update. The sampling frequency of the system is 100 Hz and the data acquisition time per round is 8 s. The optimized PID controller can achieve stable control post 10 rounds of offline training. Through PILCO optimization

the flight attitude simulator performed robustly in a fixed-point experiment

indicating that PILCO has tremendous potential in solving nonlinear control and parameter optimization problems.

关键词

Keywords

references

高九州, 贾宏光.无人机自主着陆纵向控制律设计[J].光学精密工程, 2016, 24(7):1799-1806.

GAO J ZH, JIA H G. Design of longitudinal control law for small fixed-wing UAV during auto landing[J]. Opt. Precision Eng. , 2016, 24(7):1799-1806. (in Chinese)

李迪, 陈向坚, 续志军.增益自适应滑模控制器在微型飞行器飞行姿态控制中的应用[J].光学精密工程, 2013, 21(5):1183-1191.

LI D, CHEN X J, XU ZH J. Gain adaptive sliding mode controller used for flight attitude control of MAV[J]. Opt. Precision Eng. , 2013, 21(5):1183-1191. (in Chinese)

尹航, 杨烨峰, 赵岩.二自由度飞行姿态模拟器自整定控制器设计[J].电机与控制学报, 2018, 22(4):105-112.

YIN H, YANG Y F, ZHAO Y, Self-tuning controller design for a 2-DOF flight attitude simulator[J]. Electric Machines and Control , 2018, 22(4):105-112. (in Chinese)

BUȘONIUA L, BRUINB T D, TOLIÇC D , et al .. Reinforcement learning for control: performance, stability, and deep approximators[J]. Annual Reviews in Control , 2018(5):1-18.

RECHT B. A tour of reinforcement learning: the view from continuous control[Z/OL]. https://arxiv.org/pdf/1806.09460.pdf https://arxiv.org/pdf/1806.09460.pdf .[2018-09-09].

DONG L, GUANG-HONG Y. Model-free adaptive control design for nonlinear discrete-time processes with reinforcement learning techniques[J]. International Journal of Systems Science , 2018, 49(11):2298-2308.

LEVINE S. Reinforcement learning and control as probabilistic inference: tutorial and review[Z/OL]. https://arxiv.org/abs/1805.00909 https://arxiv.org/abs/1805.00909 .[2018-05-20].

张天泽.基于强化学习的四旋翼无人机路径规划方法研究[D].哈尔滨: 哈尔滨工业大学, 2018. http://cdmd.cnki.com.cn/Article/CDMD-10213-1018896327.htm

ZHANG T. Research on Path Planning Method of Quadrotor UAV Based on Reinforcement Learning [D]. Harbin: Harbin Institute of Technology, 2018.(in Chinese)

KAELBLING L P, LITTMAN M L, MOORE A W. Reinforcement learning: a survey[J]. Artificial Intelligence Research, 1996, 4(1):237-285.

CHUA K, CALANDRA R, MCALLISTER R, et al .. Deep reinforcement learning in a handful of trials using probabilistic dynamics models[Z/OL]. https://arxiv.org/abs/1805.12114 https://arxiv.org/abs/1805.12114 .[2018-11-02].

DEISENROTH M, RASMUSSEN C. PILCO: A model-based and data-efficient approach to policy search[C]. International Conference on International Conference on Machine Learning. Omnipress , 2011. https://www.researchgate.net/publication/221345233_PILCO_A_Model-Based_and_Data-Efficient_Approach_to_Policy_Search

RICHARD S, ANDREW G. Reinforcement Learning: An Introduction [M]. Second Edition. London: The MIT Press, 2016:78-88.

DURRANT-WHYTE H, ROY N, ABBEEL P. Learning to control a low-cost manipulator using data-efficient reinforcement learning[C]. Robotics : Science and Systems Ⅶ . MIT Press , 2011. https://www.researchgate.net/publication/221344493_Learning_to_Control_a_Low-Cost_Manipulator_using_Data-Efficient_Reinforcement_Learning?_sg=o9yGsYC6qIAsC1CWW6DUbSVs8zFaNclJWJx0ptwVgZOkwd1FKS62ir30zS5JczbLhVa3rj6mS6kPt7vOi3TcUw

DEISENROTH M P. Efficient Reinforcement Learning using Gaussian Processes [D]. Karlsruhe: Karlsruhe Institute of Technology, 2015.

浏览量

下载量

CSCD

文章被引用时，请邮件提醒。

提交

工具集

关联资源

工业相机过曝光自适应优化控制算法

基于气浮轴承的无摩擦气缸设计及参数优化

改进蜻蜓算法的快速反射镜自抗扰控制

改进人工蜂群在三维椭圆振动切削颤振抑制中的应用

激光自混合干涉角度测量参数优化及旋转方向判别