Title:
Advances in online convex optimization, games, and problems with bandit feedback

Thumbnail Image
Author(s)
Rivera Cardoso, Adrian
Authors
Advisor(s)
Wang, He
Xu, Huan
Advisor(s)
Person
Editor(s)
Associated Organization(s)
Series
Supplementary to
Abstract
In this thesis we study sequential decision making through the lens of Online Learning. Online Learning is a very powerful and general framework for multi-period decision making. Due to its simple formulation and effectiveness it has become a tool of daily use in multibillion companies. Moreover, due to its beautiful theory and its tight connections with other fields, Online Learning has caught the attention of academics all over the world and driven first-class research. In the first chapter of this thesis, joint work with Huan Xu, we study a problem called: Risk-Averse Convex Bandit. Risk-aversion makes reference to the fact that humans prefer consistent sequences of good rewards instead of highly variable sequences with slightly better rewards. The Risk-Averse Convex Bandit addresses the fact that, while human decision makers are risk-averse, most algorithms for Online Learning are not. In this thesis we provide the first efficient algorithms with strong theoretical guarantees for the Risk-Averse Convex Bandit problem. In the second chapter, joint work with Rachel Cummings, we study the problem of preserving privacy in the setting of online submodular minimization. Submodular functions have multiple applications in machine learning and economics, which usually involve sensitive data from individuals. Using tools from Online Convex Optimization, we provide the first $\epsilon$-differentially private algorithms for this problem which are almost as good as the non-private versions for this problem. In the third chapter, joint work with Jacob Abernethy, He Wang, and Huan Xu, we study a dynamic version of two player zero-sum games. Zero-sum games are ubiquitous in economics, and central to understanding Linear Programming Duality, Convex and Robust Optimization, and Statistics. For many decades it was thought that one could solve this kind of games using sublinear regret algorithms for Online Convex Optimization. We show that while the previous is true when the game does not change with time, a naive application of these algorithms can be fatal if the game changes and the players are trying to compete with the Nash Equilibrium of the sum of the games in hindsight. In the fourth chapter, joint work with He Wang and Huan Xu, we revisit the decade old problem of Markov Decision Processes (MDPs) with Adversarial Rewards. MDPs provide a general mathematical framework for sequential decision making under uncertainty when there is a notion of `state', moreover they are the backbone of all Reinforcement Learning. We provide an elegant algorithm for this problem using tools from Online Convex Optimization. The algorithm's performance is comparable with current state of the art. We also consider the problem under the large state-space regime, and provide the first algorithm with strong theoretical guarantees.
Sponsor
Date Issued
2019-12-16
Extent
Resource Type
Text
Resource Subtype
Dissertation
Rights Statement
Rights URI