Training Models TheoryRelated Topics
Applications of Neural NetworksPerceptron & AdalineThis part describes single layer neural networks, including some of the classical approaches to the neural computing and learning problem. In the first part of this chapter we discuss the representational power of the single layer networks and their learning algorithms and will give some examples of using the networks. In the second part we will discuss the representational limitations of single layer networks. Two 'classical' models will be described in the first part of the chapter: the Perceptron, proposed by Rosenblatt (Rosenblatt, 1959) in the late 50's and the Adaline, presented in the early 60's by by Widrow and Hoff (Widrow & Hoff, 1960). Networks with threshold activation functionsA single layer feed-forward network consists of one or more output neurons o, each of which is connected with a weighting factor wio to all of the inputs i. In the simplest case the network has only two inputs and a single output, as sketched in figure: (we leave the output index o
out). The input of the neuron is the weighted sum of the inputs plus the bias term. The output of the network is formed by the activation of the output neuron, which is some function of the The activation function F can be linear so that we have a linear network, or nonlinear. In this The output of the network thus is either +1 or -1 depending on the input. The network
can now be used for a classication task: it can decide whether an input pattern belongs to
one of two classes. If the total input is positive, the pattern will be assigned to class +1, if the total input is negative, the sample will be assigned to class -1.The separation between the two
We will describe two learning methods for these types of networks: the 'perceptron' Perceptron learning rule and convergence theoremSuppose we have a set of learning samples consisting of an input vector x and a desired output
Note that the procedure is very similar to the Hebb rule; the only dierence is that, when the The adaptive linear element (Adaline)An important generalisation of the perceptron training algorithm was presented by Widrow and
Although the adaptive process is here exemplified in a case when there is only one output,
where θ = w0. The purpose of this device is to yield a given value y = dp at its output when the set of values xp i , i = 1,2..... , n, is applied at the inputs. The problem is to determine the coeficients wi, i = 0, 1......., n, in such a way that the input-output response is correct for a large number of arbitrarily chosen signal sets. If an exact mapping is not possible, the average error must be minimised, for instance, in the sense of least squares. An adaptive operation means that there exists a mechanism by which the wi can be adjusted, usually iteratively, to attain the correct values. Networks with linear activation functions: the delta ruleFor a single layer network with an output unit with a linear activation function the output is
Such a simple network is able to represent a linear relationship between the value of the
output unit and the value of the input units. By thresholding the output value, a classifier can
be constructed (such as Widrow's Adaline), but here we focus on the linear relationship and use
the network for a function approximation task. In high dimensional input spaces the network
represents a (hyper)plane and it will be clear that also multiple output units may be defined.
Suppose we want to train the network such that a hyperplane is fitted as well as possible
to a set of training samples consisting of input values xp and desired (or target) output values
dp. For every given input sample, the output of the network difers from the target value dp
by where the index p ranges over the set of input patterns and Ep represents the error on pattern p. The LMS procedure finds the values of all the weights that minimise the error function by a method called gradient descent. The idea is to make a change in the weight proportional to the negative of the derivative of the error as measured on the current pattern with respect to each weight: where γ is a constant of proportionality. The derivative is Because of the linear units where |
