Convergence and sample complexity of gradient methods for the model-free linear quadratic regulator problem

Work by Hesameddin Mohammadi, Armin Zare, Mahdi Soltanolkotabi, and Mihailo R. Jovanovic, IEEE TAC 2020 (under review) / CDC 2019

Keywords: Data-driven control, gradient descent, gradient-flow dynamics, linear quadratic regulator, model-free control,

nonconvex optimization, Polyak-Lojasiewicz inequality, random search method, reinforcement learning, sample complexity

Summary

This work extends results on convergence of policy gradient methods for discrete-time systems of Fazel et al. to the case of continuous-time linear dynamics while also significantly improving the number of cost function evaluations and simulation time. These improvements were made possible by novel proof techniques which included 1) relating the gradient-flow dynamics associated with the nonconvex formulation to that of a convex reparameterization, and 2) relaxing strict bounds on the gradient estimation error to probabilistic guarantees on high correlation between the gradient and its estimate. This echoes the notion that indeed “policy gradient is nothing more than random search“, albeit a random search with compelling convergence properties.

Dr. Zare recently joined UT Dallas as a faculty member and we look forward to working with him!

Read the paper on arXiv here.