Artificial neural network architectures such as backpropagation tend to have general applicability. We can use a single network type in many different applications by changing the network's size, parameters, and training sets. In contrast, the developers of the neocognitron set out to tailor architecture for a specific application: recognition of handwritten characters. Such a system has a great deal of practical application, although, judging from the introductions to some of their papers, Fukushima and his coworkers appear to be more interested in developing a model of the brain .To that end, their design was based on the seminal work performed by Hubel and Weisel elucidating some of the functional architecture of the visual cortex.
We could not begin to provide a complete accounting of what is knownabout the anatomy and physiology of the mammalian visual system. Nevertheless, we shall present a brief and highly simplified description of some of that system's features as an aid to understanding thejbasis of the neocognitron design.
The retinal ganglia and the cells of the lateral geniculate nucleus (LGN) appear to have circular receptive fields.
They respond most strongly to circular spots of light of a particular size on a particular part of the retina. The part of the retina responsible for stimulating a particular ganglion cell is called the receptive field of the ganglion. Some of these receptive fields give an excitatory response to a centrally located spot of light, and an inhibitory response to a larger, more diffuse spot of light. These fields have an on-center off-surround response characteristic. Other receptive fields have the opposite characteristic, with an inhibitory response to the centrally located spot—an off-center on-surround response characteristic.
The visual cortex itself is composed of six layers of neurons. Most of the neurons from the LGN terminate on cells in layer IV. These cells have circularly symmetric receptive fields like the retinal ganglia and the cells of the LGN. Further along the pathway, the response characteristic of the cells begins to increase in complexity. Cells in layer IV project to a group of cells directly above called simple cells. Simple cells respond to line segments having a particular orientation. Simple cells project to cells called complex cells. Complex cells respond to lines having the same orientation as their corresponding simple cells, although complex cells appear to integrate their response over a wider receptive field. In other words, complex cells are less sensitive to the position of the line on the retina than are the simple cells. Some complex cells are sensitive to line segments of a particular orientation that are moving in a particular direction.
Cells in different layers of area 17 project to different locations of the brain. For example, cells in layers II and III project to cells in areas 18 and 19. These areas contain cells called hypercomplex cells. Hypercomplex cells respond to lines that form angles or corners and that move in various directions across the receptive field.
The picture that emerges from these studies is that of a hierarchy of cells with increasingly complex response characteristics. It is not difficult to extrapolate this idea of a hierarchy into one where further data abstraction takes place at higher and higher levels. The neocognitron design adopts this hierarchical structure in a layered architecture, as illustrated schematically in Figure
We remind you that the description of the visual system that we have presented here is highly simplified. There is a great deal of detail that we have omitted. The visual system does not adhere to a strict hierarchical structure as presented here. Moreover, we do not subscribe to the notion that grandmother cells per se exist in the brain. We know from experience that strict adherence to biology often leads to a failed attempt to design a system to perform the same function as the biological prototype: Flight is probably the most significant example. Nevertheless, we do promote the use of neurobiological results if they prove to be appropriate. The neocognitron is an excellent example of how neurobiological results can be used to develop a new network architecture.
The neocognitron design evolved from an earlier model called the cognitron, and there are several versions of the neocognitron itself. The one that we shall describe has nine layers of PEs (Processing elements), including the retina layer. The system was designed to recognize the numerals 0 through 9, regardless of where they are placed in the field of view of the retina. Moreover, the network has a high degree of tolerance to distortion of the character and is fairly insensitive to the size of the character. This first architecture contains only feedforward connections.
The PEs of the neocognitron are organized into modules that we shall refer to as levels. A single level is shown in Figure 10.3. Each level consists of two layers: a layer of simple cells, or S-cells, followed by a layer of complex cells, or C-cells. Each layer, in turn, is divided into a number of planes, each of which consists of a rectangular array of PEs. On a given level, the S-layer and the C-layer may or may not have the same number of planes. All planes on a given layer will have the same number of PEs; however, the number of PEs on the S-planes can be different from the number of PEs on the C-planes at the same level. Moreover, the number of PEs per plane can vary from level to level. There are also PEs called Vs-cells and Vc-cells that are not shown in the figure. These elements play an important role in the processing, but we can describe the functionality of the system without reference to them.
We construct a complete network by combining an input layer, which we
shall call the retina, with a number of levels in a hierarchical fashion, as shown
That figure shows the number of planes on each layer for the
particular implementation that we shall describe here. We call attention to the fact that there is nothing, in principle, that dictates a limit to the size of the
network in terms of the number of levels.
Figure D shows a schematic illustration of the way units are
connected in the neocognitron.
As we look deeper into the network, the S-cells respond to features at higher
levels of abstraction; for example, corners with intersecting lines at various angles and orientations. The C-cells integrate the responses of groups of Scells.
Because each S-cell is looking for the same feature in a different location,
the C-cells' response is less sensitive to the exact location of the feature on the
input layer. This behavior is what gives the neocognitron its ability to identify
characters regardless of their exact position in the field of the retina. By the
time we have reached the final layer of C-cells, the effective receptive field of each cell is the entire retina. Figure 10.6 shows the character identification
Each cell in a plane on the first S-layer receives inputs from
a single input layer-namely, the retina. On subsequent layers, each S-cell
plane receives inputs from each of the C-cell planes immediately preceding
it. The situation is slightly different for the C-cell planes. Typically, each
cell on a C-cell plane examines a small region of S-cells on a single S-cell
plane. For example, the first C-cell plane on layer 2 would have connections
to only a region of S-cells on the first S-cell plane of the previous layer. Reference back to Figure D reveals that there is not necessarily a one-to-one
correspondence between C-cell planes and S-cell planes at each layer in the
system. This discrepancy occurs because the system designers found it advantageous
to combine the inputs from some S-planes to a single C-plane if the
features that the S-planes were detecting were similar. This tuning process
is evident in several areas of the network architecture and processing equations.
The weights on connections to S-cells are determined by a training process. Unlike in many other network architectures
(such as backpropagation), where each unit has a different weight vector,
all S-cells on a single plane share the same weight vector. Sharing weights
in this manner means that all S-cells on a given plane respond to the identical
feature in their receptive fields, as we indicated. Moreover, we need to train only one S-cell on each plane, then to distribute the resulting weights to the
NEOCOGNITRON DATA PROCESSING
In this section we shall discuss the various processing algorithms of the neocognitron cells. First we shall look at the S-cell data processing including the method used to train the network. Then, we shall describe processing on the C-layer.
We shall first concentrate on the cells in a single plane of Js, as indicated in
single plane of Vc-cells is associated with the S-layer, as indicated in
Figure H. The Vc-plane contains the same number of cells as does each Splane.
Vc-cells have the same receptive fields as the S-cells in corresponding
locations in the plane. The output of a Vc-cell goes to a single S-cell in every
plane in the layer. The S-cells that receive inputs from a particular Vc-cell
are those that occupy a position in the plane corresponding to the position of
the Vc-cell. The output of the Vc-cell has an inhibitory effect on the S-cells.
Figure I shows the details of a single S-cell along with its corresponding
Up to now, we have been discussing the first S-layer, in which cells receive
input connections from a single plane (in this case the retina) in the previous
layer. For what follows, we shall generalize our discussion to include the case of layers deeper in the network where an S-cell will receive input connections
from all the planes on the previous C-layer.
Let the index k refer to the kth plane on level 1. We can label each cell
on a plane with a two-dimensional vector, with n indicating its position on the
plane; then, we let the vector v refer to the relative position of a cell in the
previous layer lying in the receptive field of unit n. With these definitions, we can write the following equation for the output of any 5-cell:
Let's dissect these equations in some detail. The inner summation of first equation is the usual sum-of-products calculation of inputs, UCi-1(ki-1,n+v), and weights, ai(ki-1, v. ki). The sum extends over all units in the previous Clayer that lie within the receptive field of unit n. Those units are designated by the vector n + v. Because we shall assume that all weights and cell output values are nonnegative, the sum-of-products calculation yields a measure of how closely the input pattern matches the weight vector on a unit.2 We have labeled the receptive field Ai, indicating that the geometry of the receptive field is the same for all units on a particular layer. The outer summation of the equation extends over all of the ki-1planes of the previous C-layer. In the case of Us1 , there would be no need for this outer summation.
The product, bi(ki) . Vc,(n), in the denominator of Equation, represents the
inhibitory contribution of the Vc-cell. The parameter ri, where 0 < ri < ∞,
determines the cell's selectivity for a specific pattern. The factor ri/(1 + ri)
goes from zero to one as ri goes from zero to infinity. Thus, for small values
of r;, the denominator of Equation could be relatively small compared to the
numerator even if the input pattern did not exactly match the weight vector. This
situation could result in a positive argument to the Φ function. If ri were large,
then the match between the input pattern and the weights in the numerator of
the equation would have to be more exact to overcome the inhibitory effects of the
Vc-cell input. Notice also that this same ri parameter appears as a multiplicative
factor of the 0 function. If ri is small, and cell selectivity is small, this factor
ensures that the output from the cell itself cannot become very large.
We can view the function of r; in another way. We rewrite the argument
of the function in the equation as
Notice that neither of the weight expressions, ai(ki-1,v,ki), or bi(ki) depend
explicitly on the position, n, of the cell. Remember that all cells on a
plane share the same weights, even the bi(ki) weights, which we did not discuss
Training Weights on the S-Layers
In principle, training proceeds as it does for many
networks. First, an input pattern is presented at the input layer and the data are
propagated through the network. Then, weights are allowed to make incremental
adjustments according to the specified algorithm. After weight updates have
occurred, a new pattern is presented at the input layer, and the process is repeated
with all patterns in the training set until the network is classifying the input
Unsupervised training:With this model in mind, we now apply an input pattern and examine
the response of the S'-cells in each column. To ensure that each 5-cell provides
a distinct response, we can initialize the a; weights to small, positive
random values. The bi weights on the inhibitory connections can be initialized
to zero. We first note the plane and position of the S-cell whose response is
the strongest in each column. Then we examine the individual planes so that,
if one plane contains two or more of these S-cells, we disregard all but the
cell responding the strongest. In this manner, we will locate the S-cell on each
plane whose response is the strongest, subject to the condition that each of those
cells is in a different column. Those S-cells become the prototypes, or representatives,
of all the cells on their respective planes. Likewise, the strongest
responding Vf-cell is chosen as the representative for the other cells on the
Other Learning Methods. The designers of the original neocognitron knew
to what features they wanted each level, and each plane on a level, to respond.
Under these circumstances, a set of training vectors can be developed for each
layer, and the layers can be trained independently.