## Introduction

An important area of application of neural networks is in the field of robotics. Usually, these

networks are designed to direct a manipulator, which is the most important form of the industrial

robot, to grasp objects, based on sensor data. Another applications include the steering and

path-planning of autonomous robot vehicles.

In robotics, the major task involves making movements dependent on sensor data. There

are four, related, problems to be distinguished:

**Forward kinematics.**

Kinematics is the science of motion which treats motion without regard

to the forces which cause it. Within this science one studies the position, velocity, acceleration,

and all higher order derivatives of the position variables. A very basic problem in the study of

mechanical manipulation is that of forward kinematics. This is the static geometrical problem of

computing the position and orientation of the end-efector (‘hand’) of the manipulator. Specifi-

cally, given a set of joint angles, the forward kinematic problem is to compute the position and

orientation of the tool frame relative to the base frame.

An exemplar robot manipulator.

**Inverse kinematics.**

This problem is posed as follows: given the position and orientation of

the end-efector of the manipulator, calculate all possible sets of joint angles which could be used

to attain this given position and orientation. This is a fundamental problem in the practical use

of manipulators.

The inverse kinematic problem is not as simple as the forward one. Because the kinematic

equations are nonlinear, their solution is not always easy or even possible in a closed form. Also,

the questions of existence of a solution, and of multiple solutions, arise.

Solving this problem is a least requirement for most robot control systems.

**Dynamics.**

Dynamics is a field of study devoted to studying the forces required to cause

motion. In order to accelerate a manipulator from rest, glide at a constant end-efector velocity,

and finally decelerate to a stop, a complex set of torque functions must be applied by the joint

actuators. In dynamics not only the geometrical properties (kinematics) are used, but also the

physical properties of the robot are taken into account. Take for instance the weight (inertia)

of the robotarm, which determines the force required to change the motion of the arm. The

dynamics introduces two extra problems to the kinematic problems.

1. The robot arm has a ‘memory’. Its responds to a control signal depends also on its history

(e.g. previous positions, speed, acceleration).

2. If a robot grabs an object then the dynamics change but the kinematics don’t. This is

because the weight of the object has to be added to the weight of the arm (that’s why

robot arms are so heavy, making the relative weight change very small).

**Trajectory generation.**

To move a manipulator from here to there in a smooth, controlled

fashion each joint must be moved via a smooth function of time. Exactly how to compute these

motion functions is the problem of trajectory generation.

In the rst section of this chapter we will discuss the problems associated with the positioning

of the end-efector (in efect, representing the inverse kinematics in combination with sensory

transformation).

## End-efector positioning

The final goal in robot manipulator control is often the positioning of the hand or end-effector in

order to be able to, e.g., pick up an object. With the accurate robot arm that are manufactured,

this task is often relatively simple, involving the following steps:

- determine the target coordinates relative to the base of the robot. Typically, when this

position is not always the same, this is done with a number of fixed cameras or other

sensors which observe the work scene, from the image frame determine the position of the

object in that frame, and perform a pre-determined coordinate transformation; - with a precise model of the robot (supplied by the manufacturer), calculate the joint angles

to reach the target (i.e., the inverse kinematics). This is a relatively simple problem; - move the arm (dynamics control) and close the gripper.

**Involvement of neural networks**. So if these parts are relatively simple to solve with a

high accuracy, why involve neural networks? The reason is the applicability of robots. When

‘traditional’ methods are used to control a robot arm, accurate models of the sensors and manipulators

(in some cases with unknown parameters which have to be estimated from the system’s

behaviour; yet still with accurate models as starting point) are required and the system must

be calibrated. Also, systems which suffer from wear-and-tear (and which mechanical systems

don’t?) need frequent recalibration or parameter determination. Finally, the development of

more complex (adaptive!) control methods allows the design and use of more

exible (i.e., less

rigid) robot systems, both on the sensory and motory side.

### Camera-robot coordination

The system we focus on in this section is a work

oor observed by a fixed cameras, and a robot

arm. The visual system must identify the target as well as determine the visual position of the

end-effector.

The target position Xtarget together with the visual position of the hand Xhand are input to

the neural controller *N*(.). This controller then generates a joint position θ for the robot:

θ=

*N*(Xtarget,Xhand)

We can compare the neurally generated θ with the optimal θ0 generated by a fictitious perfect

controller *R(.)*:

The task of learning is to make the N generate an output ‘close enough’ to θ0

There are two problems associated with teaching *N*(.):

- generating learning samples which are in accordance with eq. θ0 = R(Xtarget,Xhand). This is not trivial,

since in useful applications R(.) is an unknown function. Instead, a form of self-supervised

or unsupervised learning is required. Some examples to solve this problem are given below; - constructing the mapping N(.)from the available learning samples. When the (usually

randomly drawn) learning samples are available, a neural network uses these samples to

represent the whole input space over which the robot is active. This is evidently a form

of interpolation, but has the problem that the input space is of a high dimensionality, and

the samples are randomly distributed.

#### Approach 1: Feed-forward networks

When using a feed-forward system for controlling the manipulator, a self-supervised learning

system must be used.

One such a system has been reported by Psaltis, Sideris and Yamamura (Psaltis, Sideris, &

Yamamura, 1988). Here, the network, which is constrained to two-dimensional positioning of

the robot arm, learns by experimentation. Three methods are proposed:

**Indirect learning.**In indirect learning, a Cartesian target point x in world coordinates is generated, e.g.,

by a two cameras looking at an object. This target point is fed into the network, which

generates an angle vector θ. The manipulator moves to position θ, and the cameras

determine the new position x’ of the end-effector in world coordinates. This x’ again is

input to the network, resulting in θ’. The network is then trained on the error ε1= θ-θ’

Indirect learning system for robotics. In each cycle, the network is used in two different

places: first in the forward step, then for feeding back the error.

However, minimisation of ε1 does not guarantee minimisation of the overall error ε = x-x’.

For example, the network often settles at a ‘solution’ that maps all x’s to a single θ (i.e.,

the mapping).**General learning.**

The method is basically very much like supervised learning, but here the plant input

θ must be provided by the user. Thus the network can directly minimise |θ – θ’|. The

success of this method depends on the interpolation capabilities of the network. Correct

choice of θ may pose a problem.**Specialised learning.**

Keep in mind that the goal of the training of the network is to minimise the error at

the output of the plant: ε = x – x’. We can also train the neural network by ‘backpropagating’

this error trough the plant (compare this with the backpropagation of the error)

. This method requires knowledge of the Jacobian matrix of the plant. A Jacobian

matrix of a multidimensional function F is a matrix of partial derivatives of F, i.e., the

multidimensional form of the derivative. For example, if we have Y = F(X), i.e.,

then

or

also written as

where J is the Jacobian matrix of F. So, the Jacobian matrix can be used to calculate the

change in the function when its parameters change.

Now, in this case we have

where Pi( θ ) the ith element of the plant output for input θ. The learning rule applied

here regards the plant as an additional and unmodiable layer in the neural network. The total error ε = x – x’ is propagated back through the plant by calculating the δ as in

eq.

where*i*iterates over the outputs of the plant. When the plant is an unknown function,

wherecan be approximated by

where**ej**is used to change the scalar θj into a vector. This approximate derivative can

be measured by slightly changing the input to the plant and measuring the changes in the

output.

Again a two-layer feed-forward network is trained with back-propagation. However,

instead of calculating a desired output vector the input vector which should have invoked the

current output vector is reconstructed, and back-propagation is applied to this new input vector

and the existing output vector.

The configuration used consists of a monocular manipulator which has to grasp objects. Due

to the fact that the camera is situated in the hand of the robot, the task is to move the hand

such that the object is in the centre of the image and has some predetermined size (in a later

article, a biologically inspired system is proposed (Smagt, Krose, & Groen, 1992) in which the

visual

flow-field is used to account for the monocularity of the system, such that the dimensions

of the object need not to be known anymore to the system).

One step towards the target consists of the following operations:

- Measure the distance from the current position to the target position in camera domain,

x. - Use this distance, together with the current state θ of the robot, as input for the neural

network. The network then generates a joint displacement vector Δθ. - Send Δθ to the manipulator.
- Again measure the distance from the current position to the target position in camera

domain, x’. - Calculate the move made by the manipulator in visual domain,whereis

the rotation matrix of the second camera image with respect to the rst camera image; - Teach the learning pairto the neural network.

This system has shown to learn correct behaviour in only tens of iterations, and to be very adaptive to changes in the sensor or manipulator.

By using a feed-forward network, the available learning samples are approximated by a single,

smooth function consisting of a summation of sigmoid functions. As mentioned in section 4, a

feed-forward network with one layer of sigmoid units is capable of representing practically any

function. But how are the optimal weights determined in finite time to obtain this optimal

representation? Experiments have shown that, although a reasonable representation can be

obtained in a short period of time, an accurate representation of the function that governs the

learning samples is often not feasible or extremely dificult. The reason

for this is the global character of the approximation obtained with a feed-forward network with

sigmoid units: every weight in the network has a global effect on the final approximation that

is obtained.

Building local representations is the obvious way out: every part of the network is responsible

for a small subspace of the total input space. Thus accuracy is obtained locally (Keep It Small

& Simple). This is typically obtained with a Kohonen neural network.

#### Approach 2: Topology conserving maps

The system described consists of a robot manipulator with three degrees of

freedom (orientation of the end-effector is not included) which has to grab objects in 3D-space.

The system is observed by two fixed cameras which output their (x; y) coordinates of the object

and the end effector

A Kohonen network merging the output of two cameras.

Each run consists of two movements. In the gross move, the observed location of the object

x (a four-component vector) is input to the network. As with the Kohonen neural network, the neuron

k with highest activation value is selected as winner, because its weight vector Wk is nearest to

x. The neurons, which are arranged in a 3-dimensional lattice, correspond in a 1-1 fashion with

subregions of the 3D workspace of the robot, i.e., the neuronal lattice is a discrete representation

of the workspace. With each neuron a vector θ and Jacobian matrix A are associated. During

gross move θk is fed to the robot which makes its move, resulting in retinal coordinates X g of

the end-effector. To correct for the discretisation of the working space, an additional move is made which is dependent of the distance between the neuron and the object in space wk – x;

this small displacement in Cartesian space is translated to an angle change using the Jacobian Ak:

which is a first-order Taylor expansion of . The final retinal coordinates of the end-effector

after this fine move are in xf .

Learning proceeds as follows: when an improved estimate (θ,A)* has been found, the following

adaptations are made for all neurons j:

If gjk(t) = g’jk(t) = δjk, this is similar to perceptron learning. Here, as with the Kohonen

learning rule, a distance function is used such that gjk(t) and g’jk(t)

are Gaussians depending on

the distance between neurons j and k with a maximum at j = k.

An improved estimate (θ,A)* is obtained as follows.