Self-organisation of internal models in autonomous robots
View/ Open
Date
27/06/2016Author
Smith Bize, Simon Cristobal
Metadata
Abstract
Internal Models (IMs) play a significant role in autonomous robotics. They are mechanisms
able to represent the input-output characteristics of the sensorimotor loop. In
developmental robotics, open-ended learning of skills and knowledge serves the purpose
of reaction to unexpected inputs, to explore the environment and to acquire new
behaviours. The development of the robot includes self-exploration of the state-action
space and learning of the environmental dynamics.
In this dissertation, we explore the properties and benefits of the self-organisation
of robot behaviour based on the homeokinetic learning paradigm. A homeokinetic
robot explores the environment in a coherent way without prior knowledge of its
configuration or the environment itself. First, we propose a novel approach to self-organisation
of behaviour by artificial curiosity in the sensorimotor loop. Second, we
study how different forward models settings alter the behaviour of both exploratory
and goal-oriented robots. Diverse complexity, size and learning rules are compared
to assess the importance in the robot’s exploratory behaviour. We define the self-organised
behaviour performance in terms of simultaneous environment coverage and
best prediction of future sensori inputs. Among the findings, we have encountered
that models with a fast response and a minimisation of the prediction error by local
gradients achieve the best performance.
Third, we study how self-organisation of behaviour can be exploited to learn IMs
for goal-oriented tasks. An IM acquires coherent self-organised behaviours that are
then used to achieve high-level goals by reinforcement learning (RL). Our results
demonstrate that learning of an inverse model in this context yields faster reward maximisation
and a higher final reward. We show that an initial exploration of the environment
in a goal-less yet coherent way improves learning.
In the same context, we analyse the self-organisation of central pattern generators
(CPG) by reward maximisation. Our results show that CPGs can learn favourable
reward behaviour on high-dimensional robots using the self-organised interaction between
degrees of freedom. Finally, we examine an on-line dual control architecture
where we combine an Actor-Critic RL and the homeokinetic controller. With this
configuration, the probing signal is generated by the exertion of the embodied robot
experience with the environment. This set-up solves the problem of designing task-dependant
probing signals by the emergence of intrinsically motivated comprehensible
behaviour. Faster improvement of the reward signal compared to classic RL is
achievable with this configuration.
Collections
The following license files are associated with this item: