Work by Hesameddin Mohammadi, Armin Zare, Mahdi Soltanolkotabi, and Mihailo R. Jovanovic, IEEE TAC 2020 (under review) / CDC 2019
Keywords: Data-driven control, gradient descent, gradient-flow dynamics, linear quadratic regulator, model-free control,
nonconvex optimization, Polyak-Lojasiewicz inequality, random search method, reinforcement learning, sample complexity
Summary
This work extends results on convergence of policy gradient methods for discrete-time systems of Fazel et al. to the case of continuous-time linear dynamics while also significantly improving the number of cost function evaluations and simulation time. These improvements were made possible by novel proof techniques which included 1) relating the gradient-flow dynamics associated with the nonconvex formulation to that of a convex reparameterization, and 2) relaxing strict bounds on the gradient estimation error to probabilistic guarantees on high correlation between the gradient and its estimate. This echoes the notion that indeed “policy gradient is nothing more than random search“, albeit a random search with compelling convergence properties.
Dr. Zare recently joined UT Dallas as a faculty member and we look forward to working with him!
Read the paper on arXiv here.