Sequence-Learning in a Self-Referential Closed-Loop Behavioural System

PhD Thesis

Bernd Porr

Download: PDF

This thesis focuses on the problem of "autonomous agents". It is assumed that such agents want to be in a desired state which can be assessed by the agent itself when it observes the consequences of its own actions. Therefore the feedback from the motor output via the environment to the sensor input is an essential component of such a system. As a consequence an agent is defined in this thesis as a self-referential system which operates within a closed sensor-motor-sensor feedback loop.

The generic situation is that the agent is always prone to unpredictable disturbances which arrive from the outside, i.e. from its environment. These disturbances cause a deviation from the desired state (for example the organism is attacked unexpectedly or the temperature in the environment changes, ...). The simplest mechanism for managing such disturbances in an organism is to employ a reflex loop which essentially establishes reactive behaviour. Reflex loops are directly related to closed loop feedback controllers. Thus, they are robust and they do not need a built-in model of the control situation.

However, reflexes have one main disadvantage, namely that they always occur "too late"; i.e., only after a (for example, unpleasant) reflex eliciting sensor event has occurred. This defines an objective problem for the organism. This thesis provides a solution to this problem which is called Isotropic Sequence Order (ISO-) learning. The problem is solved by correlating the primary \textsl{reflex} and a predictive sensor input: the result is that the system learns the temporal relation between the primary reflex and the earlier sensor input and creates a new predictive reflex. This (new) predictive reflex does not have the disadvantage of the primary reflex, namely of always being too late. As a consequence the agent is able to maintain its desired input-state all the time. In terms of engineering this means that ISO learning solves the inverse controller problem for the reflex, which is mathematically proven in this thesis. Summarising, this means that the organism starts as a reactive system and learning turns the system into a pro-active system.

It will be demonstrated by a real robot experiment that ISO learning can successfully learn to solve the classical obstacle avoidance task without external intervention (like rewards). In this experiment the robot has to correlate a reflex (retraction after collision) with signals of range finders (turn \textsl{before} the collision). After successful learning the robot generates a turning reaction before it bumps into an obstacle. Additionally it will be shown that the learning goal of "reflex avoidance" can also, paradoxically, be used to solve an attraction task.

Bernd Porr
Last modified: Tue Mar 9 22:27:33 GMT 2004