The options framework incorporates temporally extended actions (termed options)
to the reinforcement learning paradigm. A wide variety of prior works exist that
experimentally illustrate the significance of options on the performance of a learning
algorithm in a complex domains. However, the work by Fruit et al. on the
semi-Markov Decision Process (SMDP) version of the UCRL2 algorithm introduced
a formal understanding of circumstance that make options conducive to the performance
of a learning ...
The options framework incorporates temporally extended actions (termed options)
to the reinforcement learning paradigm. A wide variety of prior works exist that
experimentally illustrate the significance of options on the performance of a learning
algorithm in a complex domains. However, the work by Fruit et al. on the
semi-Markov Decision Process (SMDP) version of the UCRL2 algorithm introduced
a formal understanding of circumstance that make options conducive to the performance
of a learning algorithm. In this work we present our implementation of the
algorithm proposed by Fruit et al. We perform experimentation on a navigation
task characterized by a grid world domain. We achieve a sub-linear trend in accumulated
regret as well as a linear trend in accumulated reward in the grid world
domain using empirical Bernstein peeling as confidence bound.
+