Supervised optimal control in complex continuous systems with trajectory imitation and reinforcement learning.

Journal: Scientific Reports
Published:
Abstract

Supervisory control theory (SCT) is widely used as safeguard mechanism with control of discrete event systems (DESs). In complex continuous systems, in order to avoid system's behavior violating specifications, the supervised control problem of these systems is quite different. Continuous state and action spaces of high dimension make languages of automaton no longer suitable for describing the information of specifications which remains challenging on control of real physical systems. Reinforcement learning (RL) automatically learns complex decisions through trial and error, but it requires the design of precise reward functions combined with domain knowledge. For complex scenarios where the reward function cannot be achieved or is only with sparse rewards, we proposed a novel supervised optimal control framework based on trajectory imitation (TI) and reinforcement learning (RL) in this paper. Firstly, behavior cloning (BC) is adopted to pre-train the policy model based on a small number of human demonstrations. Secondly, a generative adversarial imitation learning (GAIL) method is carried out to obtain the implicit characteristics of demonstration data. Furthermore, after the primary and implicit features are extracted by the above steps, a Demo-based RL algorithm is designed by adding the demonstration data to the RL replay buffer with augmented loss function to enhance the system performance to its maximum potential. Finally, the proposed method is validated through multiple simulation experiments on object relocation and tool using task of dexterous multifingered hands. In handling the more complex tool using task, the proposed approach achieves a 19.7% decrease in convergence time as opposed to the latest method. And the proposed method for the two tasks results in policies that display natural movements and shows higher robustness compared with the baseline model.

Authors
Yingjun Liu, Fuchun Liu, Renwei Huang