Writing the Backpropagation Algorithm into C++ Source Code


Understanding a complex algorithm such as backpropagation can be confusing. You probably have browsed many pages just to find lots of confusing math formulas. Well unfortunately, that’s the way engineers and scientists designed these neural networks. However, there is always a way to port each formula to a program source code.

Porting the Backpropagation Neural Network to C++

In this short article, I am going to teach you how to port the backpropagation network to C++ source code. Please notice I am going to post only the basics here. You will have to do the rest.

First part: Network Propagation

The neural network propagation function is set by net=f(\sum\limits_{i=1}^n x_i.w_i + \theta_i.\theta_w)  where net is the output value of each neuron of the network and the f(x) is the activation function. For this implementation, I'll be using the sigmoid function f(x)=1/(1+e^{-x}) as the activation function. Please notice the training algorithm I am showing in this article is designed for this activation function.

Feed forward networks are composed by neurons and layers. So, to make this porting to source code easier, let's take the power of C++ classes and structures, and use them to represent each portion of the neural network with them.

Neural Network Data Structures

A feed forward network as many neural networks, is comprised by layers. In this case the backpropagation is a multi-layer network so we must find the way to implement each layer as a separated unit as well as each neuron. Let’s begin from the simplest structures to the complex ones.

Neuron Structure

The neuron structure should contain everything what a neuron represents:

  • An array of floating point numbers as the “synaptic connector” or weights
  • The output value of the neuron
  • The gain value of the neuron this is usually 1
  • The weight or synaptic connector of the gain value
  • Additionally an array of floating point values to contain the delta values which is the last delta value update from a previous iteration. Please notice these values are using only during training. See delta rule for more details on http://www.learnartificialneuralnetworks.com/backpropagation.html.
struct neuron
    float *weights; // neuron input weights or synaptic connections
    float *deltavalues; //neuron delta values
    float output; //output value
    float gain;//Gain value
    float wgain;//Weight gain value

    void create(int inputcount);//Allocates memory and initializates values

Layer Structure

Our next structure is the “layer”. Basically, it contains an array of neurons along with the layer input. All neurons from the layer share the same input, so the layer input is represented by an array of floating point values.

struct layer
    neuron **neurons;//The array of neurons
    int neuroncount;//The total count of neurons
    float *layerinput;//The layer input
    int inputcount;//The total count of elements in layerinput

    layer();//Object constructor. Initializates all values as 0

    ~layer();//Destructor. Frees the memory used by the layer

    void create(int inputsize, int _neuroncount);//Creates the layer and allocates memory
    void calculate();//Calculates all neurons performing the network formula

The “layer” structure contains a block of neurons representing a layer of the network. It contains the pointer to array of “neuron” structure the array containing the input of the neuron and their respective count descriptors. Moreover, it includes the constructor, destructor and creation functions.

The Neural Network Structure

class bpnet
    layer m_inputlayer;//input layer of the network
    layer m_outputlayer;//output layer..contains the result of applying the network
    layer **m_hiddenlayers;//Additional hidden layers
    int m_hiddenlayercount;//the count of additional hidden layers

//function tu create in memory the network structure
    bpnet();//Construction..initialzates all values to 0
    ~bpnet();//Destructor..releases memory
    //Creates the network structure on memory
    void create(int inputcount,int inputneurons,int outputcount,int *hiddenlayers,int hiddenlayercount);

    void propagate(const float *input);//Calculates the network values given an input pattern
    //Updates the weight values of the network given a desired output and applying the backpropagation
    float train(const float *desiredoutput,const float *input,float alpha, float momentum);

    //Updates the next layer input values
    void update(int layerindex);

    //Returns the output layer..this is useful to get the output values of the network
    inline layer &getOutput()
        return m_outputlayer;


The “bpnet” class represents the entire neural network. It contains its basic input layer, output layer and optional hidden layers.
Picturing the network structure it isn’t that difficult. The trick comes when implementing the training algorithm. Let’s focus in the primary function bpnet::propagate(const float *input) and the member function layer::calculate(); These functions what they do is to propagate and calculate the neural network output values. Function propagate is the one you should use on your final application.

Calculating the network values

Calculating a layer using the net=f(\sum\limits_{i=1}^n x_i.w_i + \theta_i.\theta_w)function

Our first goal is to calculate each layer neurons, and there is no better way than implementing a member function in the layer object to do this job. Function layer::calculate() shows how to implement this formula net=f(\sum\limits_{i=1}^n x_i.w_i + \theta_i.\theta_w) applied to the layer.

void layer::calculate()
    int i,j;
    float sum;
    //Apply the formula for each neuron
        sum=0;//store the sum of all values here
        //Performing function
            sum+=neurons[i]->weights[j] * layerinput[j]; //apply input * weight
        sum+=neurons[i]->wgain * neurons[i]->gain; //apply the gain or theta multiplied by the gain weight.
        //sigmoidal activation function
        neurons[i]->output= 1.f/(1.f + exp(-sum));//calculate the sigmoid function

Calculating and propagating the network values

Function propagate, calculates the network value given an input. It starts calculating the input layer then propagating to the next layer, calculating the next layer until it reaches the output layer. This is the function you would use in your application. Once the network has been propagated and calculated you would only take care of the output value.

void bpnet::propagate(const float *input)
    //The propagation function should start from the input layer
    //first copy the input vector to the input layer Always make sure the size
    //"array input" has the same size of inputcount
    memcpy(m_inputlayer.layerinput,input,m_inputlayer.inputcount * sizeof(float));
    //now calculate the inputlayer

    update(-1);//propagate the inputlayer out values to the next layer
        //Calculating hidden layers if any
        for(int i=0;i<m_hiddenlayercount;i++)

    //calculating the final statge: the output layer

Training the network

Finally, training the network is what makes the neural network useful. A neural network without training does not really do anything. The training function is what applies the backpropagation algorithm. I'll do my best to let you understand how this is ported to a program.

The training process consist on the following:

  • First, calculate the network with function propagate
  • We need a desired output for the given pattern so we must include this data
  • Calculate the quadratic error and the layer error for the output layer. The quadratic error is determined by where d_o^p,y_o^p are the desired and current output respectively
  • Calculate the error value of the current layer by .
  • Update weight values for each neuron applying the delta rule where \gamma is the learning rate constant \delta the layer error and y the layer input value. \alpha is the learning momentum and \Delta w is the previous delta value.
    The next weight value would be w(t+1)=w(t)_i +\Delta w(t+1)_i
  • Same rule applies for the hidden and input layers. However, the layer error is calculated in a different way.
    lerror_c=nout_c * (1-nout_c).\sum\limits_{i=1}^n lerror_l . w_l where lerror_l and w_l are the error and weight values from the previous processed layer. nout_c is the output of the neuron currently processed
//Main training function. Run this function in a loop as many times needed per pattern
float bpnet::train(const float *desiredoutput, const float *input, float alpha, float momentum)
    //function train, teaches the network to recognize a pattern given a desired output
    float errorg=0; //general quadratic error
    float errorc; //local error;
    float sum=0,csum=0;
    float delta,udelta;
    float output;
    //first we begin by propagating the input
    int i,j,k;
    //the backpropagation algorithm starts from the output layer propagating the error  from the output
    //layer to the input layer
        //calculate the error value for the output layer
        output=m_outputlayer.neurons[i]->output; //copy this value to facilitate calculations
        //from the algorithm we can take the error value as
        errorc=(desiredoutput[i] - output) * output * (1 - output);
        //and the general error as the sum of delta values. Where delta is the squared difference
        //of the desired value with the output value
        //quadratic error
        errorg+=(desiredoutput[i] - output) * (desiredoutput[i] - output) ;

        //now we proceed to update the weights of the neuron
            //get the current delta value
            //update the delta value
            udelta=alpha * errorc * m_outputlayer.layerinput[j] + delta * momentum;
            //update the weight values

            //we need this to propagate to the next layer
            sum+=m_outputlayer.neurons[i]->weights[j] * errorc;

        //calculate the weight gain
        m_outputlayer.neurons[i]->wgain+= alpha * errorc * m_outputlayer.neurons[i]->gain;


    for(i=(m_hiddenlayercount - 1);i>=0;i--)
            //calculate the error for this layer
            errorc= output * (1-output) * sum;
            //update neuron weights
                udelta= alpha * errorc * m_hiddenlayers[i]->layerinput[k] + delta * momentum;
                csum+=m_hiddenlayers[i]->neurons[j]->weights[k] * errorc;//needed for next layer


            m_hiddenlayers[i]->neurons[j]->wgain+=alpha * errorc * m_hiddenlayers[i]->neurons[j]->gain;


    //and finally process the input layer
        errorc=output * (1 - output) * sum;

            udelta=alpha * errorc * m_inputlayer.layerinput[j] + delta * momentum;
            //update weights
        //and update the gain weight
        m_inputlayer.neurons[i]->wgain+=alpha * errorc * m_inputlayer.neurons[i]->gain;

    //return the general error divided by 2
    return errorg / 2;


Sample Application

The complete source code can be found at the end of this article. I also included a sample application that shows how to use the class "bpnet" and how you may use it on an application. The sample shows how to teach the neural network to learn the XOR (or exclusive) gate.
There isn't much complexity to create any application.

#include <iostream>
#include "bpnet.h"
using namespace std;
#define PATTERN_SIZE 2
#define EPOCHS 20000

int main()
    //Create some patterns
    //playing with xor
    //XOR input values
    float pattern[PATTERN_COUNT][PATTERN_SIZE]=

    //XOR desired output values
    float desiredout[PATTERN_COUNT][NETWORK_OUTPUT]=

    bpnet net;//Our neural network object
    int i,j;
    float error;
    //We create the network

    //Start the neural network training
        //display error
        cout << "ERROR:" << error << "\r";


    //once trained test all patterns



    //display result
        cout << "TESTED PATTERN " << i << " DESIRED OUTPUT: " << *desiredout[i] << " NET RESULT: "<< net.getOutput().neurons[0]->output << endl;

    return 0;

Download the source as ZIP File here. Please notice this code is only for educational purposes and it's not allowed to use it for commercial purposes.
UPDATE: Source code is available on GitHub too here is the link https://github.com/danielrioss/bpnet_wpage

44 thoughts on “Writing the Backpropagation Algorithm into C++ Source Code

  1. I replaced the binary sigmoid function with the bipolar one: neurons[i]->output = (1.f – exp(-sum)) / (1.f + exp(-sum)); so the input data could be in the interval [-1, 1] rather than [0, 1]. I’ve also updated the radom() for the initial weights. Still when I run the program, I get a lot of “nan” (infinity) instead of real numbers for outputs. What am I doing wrong? Is the error calculation going to be different?

    • Hello Justin, thank you for stopping by. Are those errors after the modification or before?
      Unfortunately, the algorithm I am showing up there only works for the sigmoid function as it is closely related to the training algorithm.

  2. An addition to my prev comment…

    Under the “training” loop, the output is as following:
    ERROR: 0.713038
    ERROR: 1.17499
    ERROR: 1.24992
    ERROR: 1.25
    ERROR: 1.25
    ERROR: 1.25
    ERROR: 1.25
    ERROR: 1.25
    ERROR: 1.25
    ERROR: nan
    ERROR: -nan
    ERROR: -nan
    ERROR: -nan

  3. Also, how would you make your algorithm multi-threaded?
    On a different forum I read this: “Basically I create a number of threads and divide up the training data so each thread has a near equal amount. Then at regular intervals I merge the weights back together from the independent threads.”
    How would you merge the weights back togehter?

    • Actually, I have developed a library that works with multithreading.

      I posted here the basics to show in the most simplistic way the algorithm.

      I hope I can post it here soon.
      But basically if you want to do it multithreading this is the way to do it:

      Take the layer in process and assign a group of neurons of the same layer to each thread:
      For example if you have 30 neurons and you are using 4 threads for processing: then you would assign 7 neurons per thread, and reassign to the first ending thread the two lasting neurons. And leave waiting all threads that ended their process until the last finish.

      Once you have processed the current layer you move to the next one:
      Update the next layer inputs and make the same process as with the last layer: Assign neurons to process to each thread..
      ..until you reach the output layer.

      This actually takes all the power of your CPU and increases speed.

  4. Hello
    im mohsen fron iran
    I can not download the source of code for Backpropagation Algorithm into C++.
    can you sen me?
    do you have the source code for learning a sin function in neural network?
    thanks very much.

  5. Hi, could you explain me on what depends the wgain value. It should be a positive or negative number?

    Thank you for your great tutorial 🙂

    • Hello, Andre my apologies for my very delayed answer. The value wgain is the weight value for the bias according to the net formula net=f(\sum\limits_{i=1}^n x_i.w_i + \theta_i.\theta_w). Pretty much is yet another weight that modifies the bias value \theta_i.

      It is initialized as random. In this source code all weights are initialized on a range from -0.5 to 0.5 so it does not matter the sign.

  6. Hello.

    thank you for the information sir, i have a question.

    I want to put more Network inputs and Network outputs(like 5 or 10 more)

    #define NETWORK_OUTPUT 10

    but i dont know how can i display the result of the ouputs when they are more than 1

    //display result
    cout << "TESTED PATTERN " << i << " DESIRED OUTPUT: " << *desiredout[i] << " NET RESULT: " <output << endl;

    Do i have to modify just the previous line of code or there is more to modify than just that ?

    Thanks in advance.

  7. When i try to use 1 hidden layer i get this error:

    errorC2664: ‘void bpnet::create(int,int,int,int *,int)’ : cannot convert argument 4 from ‘int’ to ‘int *’

    can you tell me how to fix it ?


    • you are getting this error because parameter 4 needs you to put a a pointer to an array of integer values which in this case is the neuron count per hidden layer and parameter 5 the total count of hidden layers.
      In this case you want 1 hidden layer
      you should do something like this

      #define HIDDEN_LAYER_COUNT 1
      int hiddenlayerNeuronCount[HIDDEN_LAYER_COUNT]={layer_neuron_count};
      int hiddenlayercount=HIDDEN_LAYER_COUNT;

      if you want more layers, increase HIDDEN_LAYER_COUNT and specify the count of neurons per hiddenlayer. Each element of the array would be the neuron count per hiddenlayer.

      • Thank you sir.
        Can you tell me how can i now the neuron count per layer in this part

        int hiddenlayerNeuronCount[HIDDEN_LAYER_COUNT]={layer_neuron_count};

        i dont know what number or variable to use instead of “layer_neuron_count”

  8. Sir basically I am control engineer. I want to develop expertise in the area of neurofuzzy control. I have background in fuzzy however i am new to neural networks. I just learned gradient descent rule and how it can adjust weights to reach the minima and a bit of back propagation algorithm . But the problem is that I read lot of stuff, all of them are trying to use mathematical language.I appreciate their effort but now I want to program for example a neural network which I can train with gradient descent for any thing let say to find the coefficients of a reference linear function.Please recommend me some book which can take me step wise by giving me the basic understanding of different networks , implementation in matlab and applications and so on and so forth. I dont have words for the contribution which you are doing in terms of imparting knowledge and helping students.

    • The best book I can recommend you is: Neural Networks Algorithms, Applications and Programming Techniques…It is quite old but shows the basics of neural networks with some code examples. this is the amazon link in case you want to check it out.

      • Thank you. I will look into the link. Also can you advise me on the matlab version of this code. At the moment I am focusing on Gradient descent and back propagation. Finally I want to develop a Neurofuzzy controller for speed control of motor

        • Unfortunately, I don’t have the matlab version of this code. It is not the first time people requested it to me. Porting this C++ code to Matlab isn’t difficult.

          • Thank you, I just jumped into matlab and started to write my own code.At the moment I am just following my rough understanding what i read in the books in the following way for two inputs and single hidden layer.
            1) two inputs x1 and x2
            2) weights like w1,……..,w4 between inputs and layer 1.
            3) output of two neuron in the hidden layer
            o2=sigmoid(x2w4 + x1w2)

            4) simillarly weights between the hidden layer and the output. and finally output neurons.

            5) Finally i will calculate the error by the difference of desired and actual output.

            6) finally i will try to implement the equation of the update rule for weights.

            7) The only way i know is gradient descent. I will try that

            8) But what is the difference between LMS,NLMS and gradient descent. I think all of them are doing the same thing.Please correct me.

            9) Incase i found some problems i will come back to you.


  9. Hello sir,
    I’m doing character recognition using back propagation in java for my PG degree project. The letters are not recognized properly and i’m unable to sort the mistake. Can u help me by some sample code for character recognition. Else i want a clear step by step procedure for character recognition using back propagation. I’m using the binary data as input and output.

  10. Sir,
    Why do u calculate the input layer?the input layer is calculated with multiplication with random weights and then sigmoid used.isn’t this wrong as only the inputs should be propagated to the hidden layer?

    • That’s what I do there, multiply the input with neuron weights and then pass them to the sigmoid function. If you set random weights each time you calculate the input layer then the training on that layer would be in bane, because you would be overwriting the weights of the input layer that have been trained.

      Please check the code and you’ll see weights are initialized as random when the network is created. Later those weights are adjusted by the training function.


      does exactly what you said. It multiplies the weights of the layer with the input of it and pass it to sigmoid function.

  11. Hi there, thanks a lot for sharing your code and so detailed explanation! It is so clear! Really appreciate it!

    I have a question. With the same XOR example, I changed the desired out to be
    float desiredout[PATTERN_COUNT][NETWORK_OUTPUT]=

    And the result is not right anymore.

    Could you help me? Thanks!!

    • Hi there, when you change the output or input size and you don’t get the result you want, you have to change some parameters of the network.
      I tested the network with the default parameters and yes, it wasn’t converging. So I tested different neuron counts on the input layer and even added a hidden layer and came up with the solution.
      I basically added more neurons to the input layer to make a total of 6 and increased training iterations.

      EPOCHS 100000

      I tried with different input neuron counts and 6 was the perfect solution. Even 5 was not converging. Moreover I had to increase training epochs.

      Unfortunately, there is no formula to find out the perfect network configuration for a given problem. You have to test different configurations until you find the solution.

  12. Hi there, I have another question on the back-prop. It is about calculating gradient from output layer to hidden layer.

    Would you refer this link? It is about equation (18).

    According to the equation (18), for each hidden neuron, the sum sign is summing all output neurons that are connected to one hidden neuron. According to your code, in the “train()” method, the variable “sum” is summing all connections between any two neurons between hidden layer and output layer.

    Did I understand right? I would really appreciate if you could answer my questions. Thanks!

    • Yes that’s right, the variable “sum” is used to calculate the gradient of the hidden layer.

      errorc= output * (1-output) * sum;
  13. Hello,
    I tried your application and I have weird results when I add hidden layers for non linearly separable data. For example I have 150 patterns each with 2 values from range 0..1 and 3 outputs (0 or 1). I add one hidden layer which contains 3 neurons and almost all test fails.

    I tried write backpropagation algorithm myself based on this article and I got the same result.

    I noticed that problem is with weights. In my case in first epoch they values are around 0.9. After 500 epochs weights values is around 190! This makes sum of weight*input really high and sigmoid function returns value close to 1.

    Could you give me some advice what can I do?

    • Hello, for most cases it is the best to have only the input and output layers..if your application does not converge you can adjust the quantity of neurons of the input layer….starting with the lowest value and start rising them until you get some results. Can you give me more details about your application so I can reproduce myself? Feel free to send me an email with details at daniel dot rios at learnartificialneuralnetworks dot com at any time.

  14. #include
            float wa1=.1,wa2=.4,wb1=.8,wb2=.6,wd1=.3,wd2=.9;
            float inputA=.35,inputB=.9;
            float t=.5;
            int c=0;
            float inputUpperNode=(wa1*inputA)+(wb1*inputB);
            printf("Input for Upper Node = %f", inputUpperNode);
            float inputLowerNode=(wa2*inputA)+(wb2*inputB);
            printf("Input for Lower Node = %f", inputLowerNode);
            float outputUpperNode=1/(1+(exp(-((wa1*inputA)+(wb1*inputB)))));
            printf("Output for Upper Node = %f", outputUpperNode);
            float outputLowerNode=1/(1+(exp(-((wa2*inputA)+(wb2*inputB)))));
            printf("Output for Lower Node = %f", outputLowerNode);
            float inputFinalNode=(outputUpperNode*wd1)+(outputLowerNode*wd2);
            printf("Input For Final Node = %f", inputFinalNode);
            float outputFinalNode=1/(1+(exp(-((outputUpperNode*wd1)+(outputLowerNode*wd2)))));
            printf("Output For Final Node = %f",outputFinalNode);
            float outputError=(t-outputFinalNode)*(1-outputFinalNode)*outputFinalNode;
            printf("Output Error = %.3f",outputError);
                printf("new Weight of wd1 = %f ",wd1);
                printf("new Weight of wd2 = %f",wd2);
                float outputErrorUpperNode=(outputError*wd1)*(1-outputUpperNode)*outputUpperNode;
                printf("Output Error for Upper Node = %f",outputErrorUpperNode);
                float outputErrorLowerNode=(outputError*wd2)*(1-outputLowerNode)*outputLowerNode;
                printf("Output Error for Lower Node = %f",outputErrorLowerNode);
                printf("New Weight of A for Upper Node = %f",wa1);
                printf("New Weight of A for Lower Node = %f",wa2);
                printf("New Weight of B for Upper Node = %f",wb1);
                printf("New Weight of B for Lower Node = %f",wb2);
                printf("Input for Upper Node = %f",inputUpperNode);
                printf("Input for Lower Node = %f",inputLowerNode);
                printf("Output For Upper Node = %f",outputUpperNode);
                printf("Output For Lower Node = %f",outputLowerNode);
                printf("Input For Final Node = %f",inputFinalNode);
                printf("Output For Final Node = %f",outputFinalNode);
                printf("Output Error = %f",outputError);
            printf("c = %d",c);
            return 0;

    why this is going to infant time

Leave a Reply

Your email address will not be published. Required fields are marked *