Sampled Lookahead Approaches in Dynamic Programming | Department of Industrial and Systems Engineering

Dr. Daniel Jiang
Research Scientist – Facebook
Assistant Professor, IE, University of Pittsburgh
Friday, February 28, 2020–2:30-3:30 pm
Tickle Building 410

Abstract: In this talk, we discuss how sampled versions of the information relaxation bound can be incorporated into approximate dynamic programming techniques. We start in a finite-horizon setting and discuss an improvement to Monte Carlo tree search, where information relaxation bounds are used to guide the tree expansion process to produce decision trees that are deeper rather than wider, in effect concentrating computation toward more useful parts of the state space. We then discuss follow-up work, where sample-based information relaxation is used in an infinite-horizon setting to provide stochastic upper and lower bounds that can increase the effectiveness of Q-learning. Empirical performance of both techniques are illustrated on problems related to ride- and car-sharing services. Separately, we will also briefly introduce and discuss the software package BoTorch, which enables differential, Monte Carlo-based Bayesian optimization.

Bio: Daniel Jiang is a research scientist at Facebook and is also affiliated as an Assistant Professor in the Department of Industrial Engineering at the University of Pittsburgh, where he is currently on leave. Daniel’s research interests are in the area of approximate dynamic programming, reinforcement learning, and Bayesian optimization, with applications in the energy, the sharing economy, and public health. He received his PhD in Operations Research and Financial Engineering from Princeton University.