Emulated Neurons

Neural networks are built on units called neurons, and for this exercise a special neuron called a perceptron is used. Perceptrons are special in that they can represent fundamental logic functions: AND, OR, NAND, NOR. Though a perceptron can’t represent XAND or XOR, layered perceptrons can, thus all logic functions can potentially be built using a layered network structure.

images showing perceptrons' logic structure

Perceptrons work by multiplying a vector of inputs by a weight vector and passing the sum of that input-weight vectors through an activation function. For this exercise I used the sigmoid function, but there are many others. Weights are [nxm] matrices, where n is the dimension of the input and m is the dimension of the output.

image showing perceptron model

Here is a sketch algorithm to implement a perceptron node:

\(\Sigma (x_iw_i) = x_1w_1 + x_2w_2 + ... + x_nw_n\) \(\sigma = \frac{1}{1+e^{\Sigma (x_iw_i )}}\)

  • x is the sample input
  • w is the the associated weight for the input sample

For the perceptron to work properly, the weights need to be adjusted according to the desired output. To calculate and adjust the error we first subtract the predicted output from the actual output.


  • ϵ is the error
  • y is the acutal output
  • σ is defined above

Using gradient descent, we find the adjustment needed for the weights by computing the derivative of the sigmoid function and multiplying that by the error to give us the final adjustment for the weights:

\(\sigma' = \sigma (1- \sigma)\)

  • σ’ is the sigmoid derivative when given σ as above

\(adjustment = \epsilon*\sigma'\)

\[w_i=w_i+ \hat{x}^T \cdot adjustments\]

Networked together, perceptrons can be immensely powerful and are the foundations by which many neural nets are built. These new weights wouldn’t have changed much, but over many iterations they converge to their proper values of minimizing error. This method of adjusting the weights is called backpropagation.

To test the algorithm a small, simple sample set was used to provide easy-to-interpret results. The table below shows the following dataset such that the output is 1 if first or second columns contained a 1, disregrading the third column:

  Variable 1 Variable 2 Variable 3 Output
Input 1 0 0 1 0
Input 2 1 1 1 1
Input 3 1 0 1 1
Input 4 0 1 1 1 demonstrates the algorithm and predicted output. Given the input array and initial weights adjusted 200​ times, the predicted results are as follows:

  Variable 1 Variable 2 Variable 3 Output
Input 1 0 0 1 0.135
Input 2 1 1 1 0.999
Input 3 1 0 1 0.917
Input 4 0 1 1 0.917

Given an infinite number of iterations the algorithm would converge to either 0 or 1, but in 200 iterations our results are close enough to see a clear distinction.

Applied to a larger dataset,, we can create a linear classifier.

Left: Initial 2D dataset, Right: Perceptron classifier results

Left: Initial validation dataset, Right: Perceptron validation classifier results

The graph below shows the network error over 500 iterations. As expected the initial error is very high due to the weights being initially random. The error quicky drops after ~30 iterations, but never quite reaches zero. In this case error is ~4%, reflected in the misclassifed samples in both images on the right.

Network error percentage drops after each epoch, indicating a model is being learned