Perceptrons: Introduction

2 min read


In this and some of the following posts, I’ll be trying something a little bit different (and a little more aligned with the original intent of this site). I’m going to be writing about perceptrons and the perceptron learning algorithm. The perceptron learning algorithm was developed by Frank Rosenblatt in the late 1950s, making it one of the oldest machine learning algorithms.

I’m tentatively planning four posts on this topic: the first will provide some background and context, explaining what the “perceptron” is and what motivated its development. The second will explore Rosenblatt’s original papers on the topic, with their focus on learning machines, automata, and artificial intelligence; the third will address the criticisms made by Marvin Minsky and Seymour Papert in their 1969 book Perceptrons: an Introduction to Computational Geometry; and the fourth will discuss a few contemporary uses of perceptrons. Illustrative R code will be provided throughout.

Why perceptrons?

There are all kinds of interesting classification methods out there, many of them more sophisticated and more widely-used than the perceptron learning algorithm. Why, then, would anyone want to read about – or write about – the perceptron? I have a few different reasons:

  • The perceptron helps to explain some of the biological (“neural networks”) and artificial intelligence (“learning”) metaphors and analogies so often used in discussions of classification algorithms.
  • It is historically interesting and important. The perceptron, at first, seemed to promise a future of brilliant learning machines with unlimited potential. Later criticisms of the perceptron, however, were at least partially responsible for the “AI Winter,” a period of time in which funding and research interest in AI were greatly diminished.
  • The perceptron is far from irrelevant in contemporary practice. Even if the simple perceptron learning algorithm is generally no longer a part of one’s classification toolkit, some of the general ideas behind it are foundational to a wide range of classification techniques. It can provide some background to a wide range of topics, including linear separability, linear classifiers, and feedforward neural networks.