Computational models of neural networks have been around for more than half a century, beginning with the simple model that McCulloch and Pitts developed in 1943 [1]. Hebb subsequently contributed a learning algorithm to train such models [2], summed up by the familiar refrain: 'Neuron that fire together, wire together'. Hebb's rule, and a popular variant known as the Delta rule, were crucial for early models of cognition, but they quickly ran into trouble with respect to their computational power. In their extremely influencial book, Perceptrons, Minsky and Papert proved that these networks couldn't even learn the boolean XOR function, due to due only being able to learn a single layer of weights[3].Luckily, a more complex learning algorithm, backpropagation, eventually emerged, which could learn across arbitrarily many layers, and was later proven to be capable of approximating any computable function.