Generalisation in deep reinforcement learning with multiple tasks and domains
View/ Open
Date
30/11/2020Author
Zhao, Chenyang
Metadata
Abstract
A long standing vision of robotics research is to build autonomous systems that can
adapt to unforeseen environmental perturbations and learn a set of tasks progressively.
Reinforcement learning (RL) has shown great success in a variety of robot control
tasks because of recent advances in hardware and learning techniques. To further fulfil this long term goal, generalisation of RL arises as a demanding research topic as
it allows learning agents to extract knowledge from past experience and transfer to
new situations. This covers generalisation against sampling noise to avoid overfitting,
generalisation against environmental changes to avoid domain shift, and generalisation
over different but related tasks to achieve lifelong knowledge transfer. This thesis investigates these challenges in the context of RL, with a main focus on cross-domain
and cross-task generalisation.
We first address the problem of generalisation across domains. With a focus on
continuous control tasks, we characterise the sources of uncertainty that may cause
generalisation challenges in Deep RL, and provide a new benchmark and thorough
empirical evaluation of generalisation challenges for state of the art Deep RL methods.
In particular, we show that, if generalisation is the goal, then the common practice of
evaluating algorithms based on their training performance leads to the wrong conclusions about algorithm choice. Moreover, we evaluate several techniques for improving
generalisation and draw conclusions about the most robust techniques to date.
From the evaluation, we can see that learning from multiple domains improves
generalisation performance across domains. However, aggregating gradient information from different domains may make learning unstable. In the second work, we propose to update the policy to minimise the sum of distances to the new policies learned
in each domain in every iteration, measured by Kullback-Leibler (KL) divergence of
output (action) distributions. We show that our method improves both the training
asymptotic reward and testing policy robustness against domain shifts in a variety of
control tasks.
We finally investigate generalisation across different classes of control tasks. In
particular, we introduce a class of neural network controllers that can realise four distinct tasks: reaching, object throwing, casting, and ball-in-cup. By factorising the
weights of the neural network, transferable latent skills are exacted which enable acceleration of learning in cross-task transfer. With a suitable curriculum, this allows
us to learn challenging dexterous control tasks like ball-in-cup from scratch with only
reinforcement learning.
Collections
Related items
Showing items related by title, author, creator and subject.
-
Automating the Generalisation of Geological Maps: Continuous Generalisation
Mann, John G (2008-12-05)The expanding use of digital geological map data is opening up new possibilities for delivering and interacting with geological maps. One such possibility is continuous, or seamless, generalisation, in which the level of ... -
Does a generalised binding deficit account for the visual-verbal association deficit seen in developmental dyslexia?
Ker, Shona E (The University of Edinburgh, 2010-06-30)Developmental dyslexia is defined as a problem with the decoding of the written word despite adequate intelligence, education, and socioeconomic status. It is characterised by a difficulty in making letter-sound (visual-verbal) ... -
Building Generalised Spoofed Speech Detectors
Zhou, Alicia (The University of Edinburgh, 2016)