Automatic Propagation of Uncertainty with AD
This post and series is a walk-through of the implementation of my uncertain library, now on hackage!
Some of my favorite Haskell “tricks” involve working with exotic numeric types with custom “overloaded” numeric functions and literals that let us work with data in surprisingly elegant and expressive ways.
Here is one example — from my work in experimental physics and statistics, we often deal with experimental/sampled values with inherent uncertainty. If you ever measure something to be \(12.3\,\mathrm{cm}\), that doesn’t mean it’s \(12.300000\,\mathrm{cm}\) — it means that it’s somewhere between \(12.2\,\mathrm{cm}\) and \(12.4\,\mathrm{cm}\)…and we don’t know exactly. We can write it as \(12.3 \pm 0.1\,\mathrm{cm}\). The interesting thing happens when we try to add, multiply, divide numbers with uncertainty. What happens when you “add” \(12 \pm 3\) and \(19 \pm 6\)?
The initial guess might be \(31 \pm 9\), because one is \(\pm 3\) and the other is \(\pm 6\). But! If you actually do experiments like this several times, you’ll see that this isn’t the case. If you tried this out experimentally and simulate several hundred trials, you’ll see that the answer is actually something like \(31 \pm 7\). (We’ll explain why later, but feel free to stop reading this article now and try this out yourself!1)
Let’s write ourselves a Haskell data type that lets us work with “numbers with inherent uncertainty”:
> let x = 14.6 +/- 0.8
ghci> let y = 31 +/- 2
ghci> x + y
ghci46 +/- 2
> x * y
ghci450 +/- 40
> sqrt (x + y)
ghci6.8 +/- 0.2
> logBase y x
ghci0.78 +/- 0.02
> log (x**y)
ghci85.9 +/- 0.3
You can simulate noisy data by using uniform noise distributions, Gaussian distributions, or however manner you like that has a given expected value (mean) and “spread”. Verify by checking the standard deviation of the sums!↩︎