Convergence and sample complexity of gradient methods for the model-free linear quadratic regulator problem

Work by Hesameddin Mohammadi, Armin Zare, Mahdi Soltanolkotabi, and Mihailo R. Jovanovic, IEEE TAC 2020 (under review) / CDC 2019

Keywords: Data-driven control, gradient descent, gradient-flow dynamics, linear quadratic regulator, model-free control,

nonconvex optimization, Polyak-Lojasiewicz inequality, random search method, reinforcement learning, sample complexity

Summary

This work extends results on convergence of policy gradient methods for discrete-time systems of Fazel et al. to the case of continuous-time linear dynamics while also significantly improving the number of cost function evaluations and simulation time. These improvements were made possible by novel proof techniques which included 1) relating the gradient-flow dynamics associated with the nonconvex formulation to that of a convex reparameterization, and 2) relaxing strict bounds on the gradient estimation error to probabilistic guarantees on high correlation between the gradient and its estimate. This echoes the notion that indeed “policy gradient is nothing more than random search“, albeit a random search with compelling convergence properties.

Dr. Zare recently joined UT Dallas as a faculty member and we look forward to working with him!

Read the paper on arXiv here.

Sparse LQR Synthesis via Information Regularization

Work by Jeb Stefan and Takashi Tanaka, CDC 2019

Discussion by Ben Gravell, January 31, 2020

Keywords: Linear quadratic regulator, information, theory, regularization, matrix inequality, iterative

Summary

Researchers from UT Austin formulate a problem of jointly optimizing quadratic cost of a linear system and an information-theoretic communication cost which accommodates partial channel capacity. It is demonstrated empirically that this optimization can be solved with an iterative semidefinite program (SDP), and that the communication cost acts as a regularizer on the control gains, which in some cases promotes sparsity. This is similar to our own work on learning sparse control; we would love to see how data driven approaches could augment info-theoretic notions in the case of unknown dynamics.

Paper link is forthcoming once the CDC proceedings have been published. Author website is here.

MakeSense: Automated Sensor Design for Proprioceptive Soft Robots

Work by Javier Tapia, Espen Knoop , Mojmir Mutny, Miguel A. Otaduy, and Moritz Bacher, 2020

Keywords: Convex, optimization, feasible, design

Summary

Researchers at Disney have created a method for doing optimized sensor selection for soft robotics. A large number of presumptive fabricable sensors are virtually generated and are then culled down to a small set which give good pose estimation in simulation. They then verify the method experimentally by fabricating physical soft robots with highly elastic strain sensors embedded in a flexible polymer. Although our own research is geared toward theoretical analysis of sensor and actuator selection in the case of well-defined networked linear systems, we found this study a fascinating tangent.

Watch the video and read the paper here.

Automatic Repair of Convex Optimization Problems

Work by Shane Barratt, Guillermo Angeris, Stephen Boyd, 2020

Keywords: Convex, optimization, feasible, design

Summary

This work looks at the meta-optimization setting where the original convex problem is infeasible, unbounded, or pathological (bad) and the problem is changed by a small (minimum) amount to become feasible, bounded, and nonpathological (good). The problem parameter perturbation is itself minimized by the meta-optimization. The authors give examples from control and economic theory, showing that their method can be used as a design tool e.g. slightly changing the mass properties of an aerospace vehicle such that a constrained trajectory planning problem becomes feasible.

Read the paper on arXiv here.

Guaranteed Margins for LQG Regulators

Work by John C. Doyle, IEEE TAC 1978

Keywords: Robust control, optimal control, robustness margins, linear quadratic Gaussian, linear quadratic regulator, Kalman filter

Summary

In this classic paper with the famously short 3-word long abstract “there are none”, it is shown by counterexample that linear-quadratic Gaussian (LQG) regulators do not possess any gain or phase stability robustness margins. This stands in stark contrast to the case of linear quadratic regulators (LQR) which were shown a year prior by Safanov and Athens to possess impressive guaranteed 6dB gain and 60 degree phase margins.

Read the paper here or from IEEE Xplore.

See Stephen Tu’s nice walkthrough of this paper.

Adaptive Kalman Filter for Detectable Linear Time-Invariant Systems

Work by Rahul Moghe, Renato Zanetti, & Maruthi R. Akella, AIAA JGCD 2019

Discussion by Ben Gravell and Venkatraman Renganathan

Keywords: Adaptive, Kalman filter, linear system, learning, optimal, estimation

Summary

Unlike the classic Kalman filter requires knowledge of the measurement and process noise covariance matrices, the adaptive Kalman filter proposed by the authors does not. It is shown that under mild assumptions, the online data-driven estimates of the noise covariances asymptotically converge to the true values, rendering the proposed adaptive Kalman filter asymptotically optimal.

Read the paper on AIAA here.

Play with our demo code on Binder, view the Jupyter Notebook, or clone the GitHub repo.

Presentation + Code

We discuss and walk through the paper in a presentation and Python code.