Finite-Time Performance Bounds and Adaptive Learning Rate Selection for TD Learning

Monday, November 25, 2019 - 4:00pm to Tuesday, November 26, 2019 - 4:55pm

Event Calendar Category

LIDS Seminar Series

Speaker Name

Rayadurgam Srikant

Affiliation

University of Illinois at Urbana-Champaign

Building and Room Number

32-155

Temporal difference learning is a widely-used algorithm to estimate the value function of an MDP under a given policy. Here, we consider TD learning with linear function approximation and a constant learning rate, and obtain bounds on its finite-time performance. Motivated by these bounds, we will present a heuristic to adapt the learning rate to achieve fast convergence. Joint work with Lei Ying and Harsh Gupta.

R. Srikant is the Fredric G. and Elizabeth H. Nearing Endowed Professor of Electrical and Computer Engineering and the Coordinated Science Lab at the University of Illinois at Urbana-Champaign. His research interests are in the areas of applied probability, stochastic networks, and control theory, with applications to machine learning, cloud computing, and communication networks. He is the recipient of the 2019 IEEE Koji Kobayashi Computers and Communications Award and the 2015 IEEE INFOCOM Achievement Award. He has also received several best paper awards, including the 2017 Applied Probability Society Best Publication Award, the 2015 IEEE INFOCOM Best Paper Award, and the 2015 WiOpt Best Paper Award. He was the Editor-in-Chief of the IEEE/ACM Transactions on Networking from 2013-2017.