With the release of backprop, I’ve been exploring the space of parameterized models of all sorts, from linear and logistic regression and other statistical models to artificial neural networks, feed-forward and recurrent (stateful). I wanted to see to what extent we can really apply automatic differentiation and iterative gradient decent-based training to all of these different models. Basically, I wanted to see how far we can take differentiable programming (a la Yann LeCun) as a paradigm for writing trainable models.
Building on other writers, I’m starting to see a picture unifying all of these models, painted in the language of purely typed functional programming. I’m already applying these to models I’m using in real life and in my research, and I thought I’d take some time to put my thoughts to writing in case anyone else finds these illuminating or useful.
As a big picture, I really believe that a purely functional typed approach to differentiable programming is the way to move forward in the future for models like artificial neural networks. In this light, the drawbacks of object-oriented and imperative approaches becomes very apparent.
I’m not the first person to attempt to build a conceptual framework for these types of models in a purely functional typed sense – Christopher Olah’s famous post wrote a great piece in 2015 that this post heavily builds off of, and is definitely worth a read! We’ll be taking some of his ideas and seeing how they work in real code!
This will be a three-part series, and the intended audience is people who have a passing familiarity with statistical modeling or machine learning/deep learning. The code in these posts is written in Haskell, using the backprop and hmatrix (with hmatrix-backprop) libraries, but the main themes and messages won’t be about haskell, but rather about differentiable programming in a purely functional typed setting in general. This isn’t a Haskell post as much as it is an exploration, using Haskell syntax/libraries to implement the points. The backprop library is roughly equivalent to autograd in python, so all of the ideas apply there as well.
The source code for the written code in this module is available on github, if you want to follow along!
Read more …