Domain-Independent Planning for Markov Decision Processes with Factored State and Action Spaces

Raghavan, Aswin

Graduate Thesis Or Dissertation

Domain-Independent Planning for Markov Decision Processes with Factored State and Action Spaces

Public Deposited

Download PDF

Citeable URL: https://ir.library.oregonstate.edu/concern/graduate_thesis_or_dissertations/dz010s88c

Descriptions

Attribute Name	Values
Creator	Raghavan, Aswin
Abstract	Markov Decision Processes (MDPs) are the de-facto formalism for studying sequential decision making problems with uncertainty, ranging from classical problems such as inventory control and path planning, to more complex problems such as reservoir control under rainfall uncertainty and emergency response optimization for fire and medical emergencies. Most prior research has focused on exact and approximate solutions to MDPs with factored states, assuming a small number of actions. In contrast to this, many applications are most naturally modeled as having factored actions described in terms of multiple action variables. In this thesis we study domain-independent algorithms that leverage the factored action structure in the MDP dynamics and reward, and scale better than treating each of the exponentially many joint actions as atomic. Our contributions are three-fold based on three fundamental approaches to MDP planning namely exact solution using symbolic dynamic programming (DP), anytime online planning using heuristic search and online action selection using hindsight optimization. The first part is focused on deriving optimal policies over all states for MDPs whose state and action space are described in terms of multiple discrete random variables. In order to capture the factored action structure, we introduce new symbolic operators for computing DP updates over all states efficiently by leveraging the abstract and symbolic representation of Decision Diagrams. Addressing the potential bottleneck of diagrammatic blowup in these operators we present a novel and optimal policy iteration algorithm that emphasizes the diagrammatic compactness of the intermediate value functions and policies. The impact is seen in experiments on the well-studied problems of inventory control and system administration where our algorithm is able to exploit the increasing compactness of the optimal policy with increasing complexity of the action space. Under the framework of anytime planning, the second part expands the scalability of our approach to factored actions by restricting its attention to the reachable part of the state space. Our contribution is the introduction of new symbolic generalization operators that guarantee a more moderate use of space and time while providing non-trivial generalization. These operators yield anytime algorithms that guarantee convergence to the optimal value and action for the current world state, while guaranteeing bounded growth in the size of the symbolic representation. We empirically show that our online algorithm is successfully able to combine forward search from an initial state with backwards generalized DP updates on symbolic states. The third part considers a general class of hybrid (mixed discrete and continuous) state and action (HSA) MDPs. Whereas the insights from the above approaches are valid for hybrid MDPs as well, there are significant limitations inherent to the DP approach. Existing solvers for hybrid state and action MDPs are either limited to very restricted transition distributions, require knowledge of domain-specific basis functions to achieve good approximations, or do not scale. We explore a domain-independent approach based on the framework of hindsight optimization (HOP) for HSA-MDPs, which uses an upper bound on the finite-horizon action values for action selection. Our main contribution is a linear time reduction to a Mixed Integer Linear Program (MILP) that encodes the HOP objective, when the dynamics are specified as location-scale probability distributions parametrized by Piecewise Linear (PWL) functions of states and actions. In addition, we show how to use the same machinery to select actions based on a lower-bound generated by straight-line plans. Our empirical results show that the HSA-HOP approach effectively scales to high-dimensional problems and outperforms baselines that are capable of scaling to such large hybrid MDPs. In a concluding case study, we cast the real-time dispatch optimization problem faced by the Corvallis Fire Department as an HSA-MDP with factored actions. We show that our domain-independent planner significantly improves upon the responsiveness of the baseline that dispatches the nearest responders.
License	All rights reserved
Resource Type	Dissertation
Date Available	2017-03-22T22:48:30+00:00
Date Issued	2017-01-27
Degree Level	Doctoral
Degree Name	Doctor of Philosophy (Ph.D.)
Degree Field	Computer Science
Degree Grantor	Oregon State University
Commencement Year	2017
Advisor	Tadepalli, Prasad
Committee Member	Fern, Xiaoli Emerson, Sarah Khardon, Roni Fern, Alan Palmer, Todd
Academic Affiliation	Electrical Engineering and Computer Science
Non-Academic Affiliation	Oregon State University. Graduate School
Subject	Constrained optimization Markov processes Planning -- Mathematical models Decision making -- Mathematical models
Rights Statement	In Copyright
Publisher	Oregon State University
Peer Reviewed	No
Language	English [eng]
Replaces	http://hdl.handle.net/1957/60559

Relationships

Parents:

This work has no parents.

In Collection:

Graduate Theses and Dissertations (GTD)

Items

Thumbnail	Title	Date Uploaded	Visibility	Actions
	RaghavanAswin2017.pdf	2017-08-09	Public	Download

ScholarsArchive@OSU

Domain-Independent Planning for Markov Decision Processes with Factored State and Action Spaces

Downloadable Content

Descriptions

Relationships

Items