At its core a perceptron model is one of the simplest **supervised learning** algorithms for binary classification. It is a type of linear classifier, i.e. a classification algorithm that makes its predictions based on a linear predictor function combining a set of weights with the feature vector.
A more intuitive way to think about is like a **Neural Network with only one neuron**.

The way it works is very simple. It gets a vector of input values ** x** of which each element is a feature of our data set.

Say that we want to classify whether an object is a bicycle or a car. For the sake of this example let's say that we want to select 2 features. The height and the width of the object. In that case ** x = [x1, x2]** where

Then once we have our input vector ** x** we want to multiply each element in that vector with a weight. Usually the higher the value of the weight the more important the feature is. If for example we used

Alright so we have multiplied the 2 vectors ** x** and

But how do we get the right weights so that we do correct predictions? In other words, how do we ** train** our perceptron model?

Well in the case of the perceptron we do not need fancy math equations to ** train** our model. Our weights can be adjusted by the following equation:

*Δw = eta * (y - prediction) * x(i)*

where ** x(i)** is our feature (x1 for example for weight 1, x2 for w2 and so on...).

Also noticed that there is a variable called ** eta** that is the learning rate. You can imagine the learning rate as how big we want the change of the weights to be. A good learning rate results in a fast learning algorithm. A too high value of

Finally some of you might have noticed that the first input is a constant ( 1 ) and is multiplied by w0. So what exactly is that? In order to get a good prediction we need to add a bias. And that's exactly what that constant is.

To modify the weight of the bias term we use the same equation as we did for the other weights but in this case we do not multiply it by the input (because the input is a constant 1 and so we don't have to):

*Δw = eta * (y - prediction)*

So that is basically how a simple perceptron model works! Once we train our weights we can give it new data and have our predictions.

The Perceptron model has an important disadvantage! It will never converge (e.i find the perfect weights) if the data isn't linearly separable, which means being able to separate the 2 classes in a feature space by a straight line. So in order to avoid that it is a good practice to add a fixed number of iterations so that the model isn't stuck at adjusting weights that will never be perfectly tuned.