- End-efector positioning
An important area of application of neural networks is in the field of robotics. Usually, these
networks are designed to direct a manipulator, which is the most important form of the industrial
robot, to grasp objects, based on sensor data. Another applications include the steering and
path-planning of autonomous robot vehicles.
In robotics, the major task involves making movements dependent on sensor data. There
are four, related, problems to be distinguished:
Kinematics is the science of motion which treats motion without regard
to the forces which cause it. Within this science one studies the position, velocity, acceleration,
and all higher order derivatives of the position variables. A very basic problem in the study of
mechanical manipulation is that of forward kinematics. This is the static geometrical problem of
computing the position and orientation of the end-efector (‘hand’) of the manipulator. Specifi-
cally, given a set of joint angles, the forward kinematic problem is to compute the position and
orientation of the tool frame relative to the base frame.
An exemplar robot manipulator.
This problem is posed as follows: given the position and orientation of
the end-efector of the manipulator, calculate all possible sets of joint angles which could be used
to attain this given position and orientation. This is a fundamental problem in the practical use
The inverse kinematic problem is not as simple as the forward one. Because the kinematic
equations are nonlinear, their solution is not always easy or even possible in a closed form. Also,
the questions of existence of a solution, and of multiple solutions, arise.
Solving this problem is a least requirement for most robot control systems.
Dynamics is a field of study devoted to studying the forces required to cause
motion. In order to accelerate a manipulator from rest, glide at a constant end-efector velocity,
and finally decelerate to a stop, a complex set of torque functions must be applied by the joint
actuators. In dynamics not only the geometrical properties (kinematics) are used, but also the
physical properties of the robot are taken into account. Take for instance the weight (inertia)
of the robotarm, which determines the force required to change the motion of the arm. The
dynamics introduces two extra problems to the kinematic problems.
1. The robot arm has a ‘memory’. Its responds to a control signal depends also on its history
(e.g. previous positions, speed, acceleration).
2. If a robot grabs an object then the dynamics change but the kinematics don’t. This is
because the weight of the object has to be added to the weight of the arm (that’s why
robot arms are so heavy, making the relative weight change very small).
To move a manipulator from here to there in a smooth, controlled
fashion each joint must be moved via a smooth function of time. Exactly how to compute these
motion functions is the problem of trajectory generation.
In the rst section of this chapter we will discuss the problems associated with the positioning
of the end-efector (in efect, representing the inverse kinematics in combination with sensory
The final goal in robot manipulator control is often the positioning of the hand or end-effector in
order to be able to, e.g., pick up an object. With the accurate robot arm that are manufactured,
this task is often relatively simple, involving the following steps:
- determine the target coordinates relative to the base of the robot. Typically, when this
position is not always the same, this is done with a number of fixed cameras or other
sensors which observe the work scene, from the image frame determine the position of the
object in that frame, and perform a pre-determined coordinate transformation;
- with a precise model of the robot (supplied by the manufacturer), calculate the joint angles
to reach the target (i.e., the inverse kinematics). This is a relatively simple problem;
- move the arm (dynamics control) and close the gripper.
Involvement of neural networks. So if these parts are relatively simple to solve with a
high accuracy, why involve neural networks? The reason is the applicability of robots. When
‘traditional’ methods are used to control a robot arm, accurate models of the sensors and manipulators
(in some cases with unknown parameters which have to be estimated from the system’s
behaviour; yet still with accurate models as starting point) are required and the system must
be calibrated. Also, systems which suffer from wear-and-tear (and which mechanical systems
don’t?) need frequent recalibration or parameter determination. Finally, the development of
more complex (adaptive!) control methods allows the design and use of more
exible (i.e., less
rigid) robot systems, both on the sensory and motory side.
The system we focus on in this section is a work
oor observed by a fixed cameras, and a robot
arm. The visual system must identify the target as well as determine the visual position of the
The target position Xtarget together with the visual position of the hand Xhand are input to
the neural controller N(.). This controller then generates a joint position θ for the robot:
We can compare the neurally generated θ with the optimal θ0 generated by a fictitious perfect
The task of learning is to make the N generate an output ‘close enough’ to θ0
There are two problems associated with teaching N(.):
- generating learning samples which are in accordance with eq. θ0 = R(Xtarget,Xhand). This is not trivial,
since in useful applications R(.) is an unknown function. Instead, a form of self-supervised
or unsupervised learning is required. Some examples to solve this problem are given below;
- constructing the mapping N(.)from the available learning samples. When the (usually
randomly drawn) learning samples are available, a neural network uses these samples to
represent the whole input space over which the robot is active. This is evidently a form
of interpolation, but has the problem that the input space is of a high dimensionality, and
the samples are randomly distributed.
When using a feed-forward system for controlling the manipulator, a self-supervised learning
system must be used.
One such a system has been reported by Psaltis, Sideris and Yamamura (Psaltis, Sideris, &
Yamamura, 1988). Here, the network, which is constrained to two-dimensional positioning of
the robot arm, learns by experimentation. Three methods are proposed:
- Indirect learning.In indirect learning, a Cartesian target point x in world coordinates is generated, e.g.,
by a two cameras looking at an object. This target point is fed into the network, which
generates an angle vector θ. The manipulator moves to position θ, and the cameras
determine the new position x’ of the end-effector in world coordinates. This x’ again is
input to the network, resulting in θ’. The network is then trained on the error ε1= θ-θ’
Indirect learning system for robotics. In each cycle, the network is used in two different
places: first in the forward step, then for feeding back the error.
However, minimisation of ε1 does not guarantee minimisation of the overall error ε = x-x’.
For example, the network often settles at a ‘solution’ that maps all x’s to a single θ (i.e.,
- General learning.
The method is basically very much like supervised learning, but here the plant input
θ must be provided by the user. Thus the network can directly minimise |θ – θ’|. The
success of this method depends on the interpolation capabilities of the network. Correct
choice of θ may pose a problem.
- Specialised learning.
Keep in mind that the goal of the training of the network is to minimise the error at
the output of the plant: ε = x – x’. We can also train the neural network by ‘backpropagating’
this error trough the plant (compare this with the backpropagation of the error)
. This method requires knowledge of the Jacobian matrix of the plant. A Jacobian
matrix of a multidimensional function F is a matrix of partial derivatives of F, i.e., the
multidimensional form of the derivative. For example, if we have Y = F(X), i.e.,
also written as
where J is the Jacobian matrix of F. So, the Jacobian matrix can be used to calculate the
change in the function when its parameters change.
Now, in this case we have
where Pi( θ ) the ith element of the plant output for input θ. The learning rule applied
here regards the plant as an additional and unmodiable layer in the neural network. The total error ε = x – x’ is propagated back through the plant by calculating the δ as in
where i iterates over the outputs of the plant. When the plant is an unknown function,
wherecan be approximated by
where ej is used to change the scalar θj into a vector. This approximate derivative can
be measured by slightly changing the input to the plant and measuring the changes in the
Again a two-layer feed-forward network is trained with back-propagation. However,
instead of calculating a desired output vector the input vector which should have invoked the
current output vector is reconstructed, and back-propagation is applied to this new input vector
and the existing output vector.
The configuration used consists of a monocular manipulator which has to grasp objects. Due
to the fact that the camera is situated in the hand of the robot, the task is to move the hand
such that the object is in the centre of the image and has some predetermined size (in a later
article, a biologically inspired system is proposed (Smagt, Krose, & Groen, 1992) in which the
flow-field is used to account for the monocularity of the system, such that the dimensions
of the object need not to be known anymore to the system).
One step towards the target consists of the following operations:
- Measure the distance from the current position to the target position in camera domain,
- Use this distance, together with the current state θ of the robot, as input for the neural
network. The network then generates a joint displacement vector Δθ.
- Send Δθ to the manipulator.
- Again measure the distance from the current position to the target position in camera
- Calculate the move made by the manipulator in visual domain,whereis
the rotation matrix of the second camera image with respect to the rst camera image;
- Teach the learning pairto the neural network.
This system has shown to learn correct behaviour in only tens of iterations, and to be very adaptive to changes in the sensor or manipulator.
By using a feed-forward network, the available learning samples are approximated by a single,
smooth function consisting of a summation of sigmoid functions. As mentioned in section 4, a
feed-forward network with one layer of sigmoid units is capable of representing practically any
function. But how are the optimal weights determined in finite time to obtain this optimal
representation? Experiments have shown that, although a reasonable representation can be
obtained in a short period of time, an accurate representation of the function that governs the
learning samples is often not feasible or extremely dificult. The reason
for this is the global character of the approximation obtained with a feed-forward network with
sigmoid units: every weight in the network has a global effect on the final approximation that
Building local representations is the obvious way out: every part of the network is responsible
for a small subspace of the total input space. Thus accuracy is obtained locally (Keep It Small
& Simple). This is typically obtained with a Kohonen neural network.
The system described consists of a robot manipulator with three degrees of
freedom (orientation of the end-effector is not included) which has to grab objects in 3D-space.
The system is observed by two fixed cameras which output their (x; y) coordinates of the object
and the end effector
A Kohonen network merging the output of two cameras.
Each run consists of two movements. In the gross move, the observed location of the object
x (a four-component vector) is input to the network. As with the Kohonen neural network, the neuron
k with highest activation value is selected as winner, because its weight vector Wk is nearest to
x. The neurons, which are arranged in a 3-dimensional lattice, correspond in a 1-1 fashion with
subregions of the 3D workspace of the robot, i.e., the neuronal lattice is a discrete representation
of the workspace. With each neuron a vector θ and Jacobian matrix A are associated. During
gross move θk is fed to the robot which makes its move, resulting in retinal coordinates X g of
the end-effector. To correct for the discretisation of the working space, an additional move is made which is dependent of the distance between the neuron and the object in space wk – x;
this small displacement in Cartesian space is translated to an angle change using the Jacobian Ak:
which is a first-order Taylor expansion of . The final retinal coordinates of the end-effector
after this fine move are in xf .
Learning proceeds as follows: when an improved estimate (θ,A)* has been found, the following
adaptations are made for all neurons j:
If gjk(t) = g’jk(t) = δjk, this is similar to perceptron learning. Here, as with the Kohonen
learning rule, a distance function is used such that gjk(t) and g’jk(t)
are Gaussians depending on
the distance between neurons j and k with a maximum at j = k.
An improved estimate (θ,A)* is obtained as follows.