Kısmi gözlemlenebilir takviye öğrenme için dolaysız soyutlama

Download
2015
Çilden, Erkin
Polat, Faruk
Şahin, Coşkun
Reinforcement learning defines a prominent family of unsupervised machine learning methods in autonomous agents perspective. Markov decision process model provides a solid formal basis for reinforcement learning algorithms. Temporal abstraction mechanisms can be built on reinforcement learning and significant performance gain can be achieved. If the full observability assumption of Markov decision process model is relaxed, the resulting model is partially observable Markov decision process, which constitutes a more realistic but difficult problem setting. Reinforcement learning research for partial observability focuses on techniques to reduce negative impact of perceptual aliasing and huge state-space. In the broadest sense, these studies can be divided into two categories. Model based approaches assume that the state transition model is available to the agent. In the model free approaches, states are completely hidden from the agent. In this project, we propose methods to generalize a known sequence based automatic temporal abstraction technique –namely, extended sequence tree method– to partial observability. We attack the problem in both model based and model free approaches, showing that our methods accelerate well known representatives of each perspective. Effectiveness of our methods are demonstrated by conducting experimentation on widely accepted benchmark problems.

Suggestions

Abstraction in reinforcement learning in partially observable environments
Çilden, Erkin; Polat, Faruk; Department of Computer Engineering (2014)
Reinforcement learning defines a prominent family of unsupervised machine learning methods in autonomous agents perspective. Markov decision process model provides a solid formal basis for reinforcement learning algorithms. Temporal abstraction mechanisms can be built on reinforcement learning and significant performance gain can be achieved. If the full observability assumption of Markov decision process model is relaxed, the resulting model is partially observable Markov decision process, which constitute...
Automatic identification of transitional bottlenecks in reinforcement learning under partial observability
Aydın, Hüseyin; Polat, Faruk; Department of Computer Engineering (2017)
Instance-based methods are proven tools to solve reinforcement learning problems with hidden states. Nearest Sequence Memory (NSM) is a widely known instance-based approach mainly based on k-Nearest Neighbor algorithm. NSM keeps track of raw history of action-observation-reward instances within a fixed length (or ideally unlimited) memory. It calculates the neighborhood for the current state through a recursive comparison of the matching action-observation-reward tuples with the previous ones. The ones with...
Abstraction in Model Based Partially Observable Reinforcement Learning using Extended Sequence Trees
Cilden, Erkin; Polat, Faruk (2012-12-07)
Extended sequence tree is a direct method for automatic generation of useful abstractions in reinforcement learning, designed for problems that can be modelled as Markov decision process. This paper proposes a method to expand the extended sequence tree method over reinforcement learning to cover partial observability formalized via partially observable Markov decision process through belief state formalism. This expansion requires a reasonable approximation of information state. Inspired by statistical ran...
Simple and complex behavior learning using behavior hidden Markov Model and CobART
Seyhan, Seyit Sabri; Alpaslan, Ferda Nur; Department of Computer Engineering (2013)
In this thesis, behavior learning and generation models are proposed for simple and complex behaviors of robots using unsupervised learning methods. Simple behaviors are modeled by simple-behavior learning model (SBLM) and complex behaviors are modeled by complex-behavior learning model (CBLM) which uses previously learned simple or complex behaviors. Both models have common phases named behavior categorization, behavior modeling, and behavior generation. Sensory data are categorized using correlation based...
Toward Generalization of Automated Temporal Abstraction to Partially Observable Reinforcement Learning
Cilden, Erkin; Polat, Faruk (2015-08-01)
Temporal abstraction for reinforcement learning (RL) aims to decrease learning time by making use of repeated sub-policy patterns in the learning task. Automatic extraction of abstractions during RL process is difficult but has many challenges such as dealing with the curse of dimensionality. Various studies have explored the subject under the assumption that the problem domain is fully observable by the learning agent. Learning abstractions for partially observable RL is a relatively less explored area. In...
Citation Formats
E. Çilden, F. Polat, and C. Şahin, “Kısmi gözlemlenebilir takviye öğrenme için dolaysız soyutlama,” 2015. Accessed: 00, 2020. [Online]. Available: https://app.trdizin.gov.tr/publication/project/detail/TVRRMU5qYzM.