Streaming Huffman Compression in Haskell (Part 1: Trees and State)
by Justin Le ♦
Source ♦ Markdown ♦ LaTeX ♦ Posted in Haskell, Tutorials ♦ Comments
So you’re learning Haskell and are looking for some projects that aren’t super trivial, are familiar enough that you can use what you already know, and are difficult enough to maybe help you learn new things. Hey, maybe this is for you :)
Let’s take a go at Huffman encoding in Haskell. We will look at two types of binary trees, which we use to implement immutable/persistent priority queues and prefix trees. We’ll play around with the State monad a bit, explore some useful typeclasses, learn how to serialize, marshal, and unmarshal data structures using the binary library, and also look at how to load data from a file and write to another in a pure way, avoiding lazy IO using the ever-more-popular pipes library. And hopefully we learn some neat Haskell idioms!
We’re going to be assuming some basic Haskell knowledge, like algebraic data types, higher order functions, basic monad usage, and some basic familiarity with the functions in Prelude/base, the standard library. If you have any questions, feel free to leave a comment, drop by on #haskell on freenode, throw me a tweet, or give the great Learn You A Haskell a quick read!
Prefix trees & Huffman coding
You might have encountered this in the past, but Huffman encoding solves the problem of finding the optimal binary prefix code to encode a string.
I’ll leave you to read the wikipedia article, which explains it much better than I could. Basically, binary prefix codes are nice because you don’t have to encode any “stop” symbol — as soon as you reach a leaf of the tree, you know that you have found a letter, and can move on.
Huffman trees are built from the bottom-up using priority queues. The two lowest-frequency nodes are continually “popped” from the queue, combined into a new node, and placed back in the queue.
Our first challenge — representing a Huffman tree as a data structure!
The Tree
(All the code in this section on is available for download for you to try it out yourself!)
So some properties about prefix trees that might be useful to us — all data is stored in the leaves, and all internal nodes have exactly two children. This sounds like the perfect candidate for an Algebraic Data Structure.
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/huffman/PreTree.hs#L19-L21
-- interactive: https://www.fpcomplete.com/user/jle/huffman-encoding
data PreTree a = PTLeaf a
| PTNode (PreTree a) (PreTree a)
deriving (Show, Eq, Generic)
We leave the type parameterized on a
(which is like a template/generic in C++/Java) so we can decide what to put into it later.
PreTree
operations
So, what sort of things are we going to want to do with our PreTree
?
Well..first of all, we might want a way to put something into an empty tree — create a leaf containing just that data.
That function is sort of embarrassingly easy:
makePT' :: a -> PreTree a
= PTLeaf x makePT' x
Remember, that’s PTLeaf
is a data constructor that “creates” a PreTree
when you use PTLeaf x
.
However, something like this is just begging to be eta-reduced, and we can simplify it as:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/huffman/PreTree.hs#L46-L47
-- interactive: https://www.fpcomplete.com/user/jle/huffman-encoding
makePT :: a -> PreTree a
= PTLeaf makePT
Which does the same thing. Basically, PTLeaf
is already a function a -> PreTree a
…so makePT
is literally just PTLeaf
.
> let pt = makePT 'c'
ghci> :t pt
ghciPreTree Char
> pt
ghciPTLeaf 'c'
Now, we might also want a way to “merge” two PreTree a
’s. This is at the heart of building the tree in the first place…successively merge two trees until everything is in one giant tree.
This isn’t too bad either:
mergePT' :: PreTree a -> PreTree a -> PreTree a
= PTNode t1 t2 mergePT' t1 t2
Which, from what we saw before, can just be written as:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/huffman/PreTree.hs#L50-L51
-- interactive: https://www.fpcomplete.com/user/jle/huffman-encoding
mergePT :: PreTree a -> PreTree a -> PreTree a
= PTNode mergePT
> let pt1 = makePT 'c'
ghci> let pt2 = makePT 't'
ghci> let pt3 = pt1 `mergePT` pt2
ghci> :t pt3
ghciPreTree Char
> pt3
ghciPTNode (PTLeaf 'c') (PTLeaf 't')
Hm. Maybe that’s a bit too easy. Feels a little unsettling, isn’t it?
Welcome to Haskell!
Weighting things
We’re going to need some way of comparing the weights/priorities of two PreTree
s when we are assembling the tree. Let’s introduce a data type that includes both a PreTree
and an (integer) weight.
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/huffman/Weighted.hs#L13-L15
-- interactive: https://www.fpcomplete.com/user/jle/huffman-encoding
data Weighted a = WPair { _wWeight :: Int
_wItem :: a
,deriving (Show, Functor) }
(Code for the Weighted module is available for download)
We will say that a Weighted a
is some a
associated with an integer weight.
We can create, say, a PreTree
containing the character ‘a’, weighted with integer 1:
> WPair 1 (makePTLeaf 'a')
ghciWPair 1 (makePTLeaf 'a') :: Weighted (PreTree Char)
This weighted PreTree
is pretty useful, let’s give it an alias/typedef:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/huffman/PreTree.hs#L54-L54
-- interactive: https://www.fpcomplete.com/user/jle/huffman-encoding
type WeightedPT a = Weighted (PreTree a)
Let’s make the same functions for WeightedPT
as we did for PreTree
:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/huffman/PreTree.hs#L58-L59
-- interactive: https://www.fpcomplete.com/user/jle/huffman-encoding
makeWPT :: Int -> a -> WeightedPT a
= WPair w . makePT makeWPT w
The above basically says “to make a WeightedPT
with weight w
, first makePT
it, and then add that result it to a WPair w
.
> let pt = makeWPT 1 'w'
ghci> :t pt
ghciWeightedPT Char
> pt
ghciWPair 1 (PTLeaf 'w')
We will also want to merge two WeightedPT
s:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/huffman/PreTree.hs#L62-L64
-- interactive: https://www.fpcomplete.com/user/jle/huffman-encoding
mergeWPT :: WeightedPT a -> WeightedPT a -> WeightedPT a
WPair w1 pt1) (WPair w2 pt2)
mergeWPT (= WPair (w1 + w2) (mergePT pt1 pt2)
so that the total weight is the sum of the weights of the two subtrees.
Finally, the entire point of having weighted things is so that we can compare them and impose some total ordering. Haskell has a typeclass that abstracts these comparing operations, Ord
:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/huffman/Weighted.hs#L17-L21
-- interactive: https://www.fpcomplete.com/user/jle/huffman-encoding
instance Eq (Weighted a) where
WPair w1 _ == WPair w2 _ = w1 == w2
instance Ord (Weighted a) where
compare (WPair w1 _) (WPair w2 _) = compare w1 w2
Which says that Weighted a
is an Ord
(is orderable/comparable), and to compare two WPair w x
’s, you compare the w
’s.
> makeWPT 2 'a' > makeWPT 3 'b'
ghciFalse
> makeWPT 4 't' == makeWPT 4 'k'
ghciTrue
Priority Queues
There are some great priority queue libraries on Hackage, like PSQueue. However, for fun, we’re going to be making our own! Yay!
Our Priority Queue code module is available for download to try out!
Skew heaps
A traditional approach to making efficient priority queues is to use a heap, a tree with insertion algorithms that make sure the root of the tree is the most prioritized element and that the tree stays balanced. Heaps make heavy use of stateful mutation to do this, and while it’s not so hard to do this in Haskell, we might consider a ‘pure’ version of a heap: a skew heap.
A skew heap is a heap that doesn’t explicitly maintain its balance, but maintains “heap ordering” (parents are always higher priority than their children).
I’ll leave it to the wikipedia article to do most of the explaining because they have pretty pictures, but here is the gist of it — skew heaps have only three operations: making new (singleton) one, merging two skew heaps, and popping off the root. Traditional “insert” is done by making a new skew heap with one element, and merging it with the main heap.
Merging is simple enough: The higher-priority root becomes the new root, and the lower-priority root is merged with the child tree of that new root. (Some left-right flipping of branches is done to make sure things tend to stay balanced. Pictures provided in the wikipedia article.)
Popping the root is simple too; just take the root, and merge its two sub-trees to make the new tree.
This is a new type of binary tree, so let’s define a new data type:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/huffman/PQueue.hs#L20-L22
-- interactive: https://www.fpcomplete.com/user/jle/huffman-encoding
data SkewHeap a = SEmpty
| SNode a (SkewHeap a) (SkewHeap a)
deriving (Show, Eq, Foldable)
Creating a new SkewHeap
with one item:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/huffman/PQueue.hs#L25-L26
-- interactive: https://www.fpcomplete.com/user/jle/huffman-encoding
makeSH :: a -> SkewHeap a
= SNode x SEmpty SEmpty makeSH x
Popping the root off of a skew tree:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/huffman/PQueue.hs#L31-L33
-- interactive: https://www.fpcomplete.com/user/jle/huffman-encoding
popSH :: Ord a => SkewHeap a -> (Maybe a, SkewHeap a)
SEmpty = (Nothing, SEmpty)
popSH SNode r h1 h2) = (Just r , mergeSH h1 h2) popSH (
We make it return a potential result (Maybe a
), and the resulting new popped tree. The result is Maybe a
because we might potentially not be able to pop anything! We also require an Ord
constraint because in order to merge two skew heaps, the data must be comparable.
Finally, the hardest piece of code so far: merging two skew heaps:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/huffman/PQueue.hs#L37-L42
-- interactive: https://www.fpcomplete.com/user/jle/huffman-encoding
mergeSH :: Ord a => SkewHeap a -> SkewHeap a -> SkewHeap a
SEmpty h = h
mergeSH SEmpty = h
mergeSH h @(SNode xA lA rA) hB@(SNode xB lB rB)
mergeSH hA| xA < xB = SNode xA (mergeSH rA hB) lA
| otherwise = SNode xB (mergeSH rB hA) lB
Hopefully this is very pleasing to read — it reads a lot like a specification, or a math formula:
- Merging any skew heap with an empty heap is that same skew heap.
- When merging two heaps, the new heap is an
SNode
with the smaller root, whose children are the merge of the smaller tree and the original children. (Admittedly, the math/code is a bit more expressive than English in this case)
(Remember that in our case, the lower value/weight is the higher priority.)
We require an Ord
constraint because we compare the node element on the third case.
Priority Queue interface
Ok, neat!
Let’s wrap this up in a tidy interface/API for a PQueue
type:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/huffman/PQueue.hs#L48-L71
-- interactive: https://www.fpcomplete.com/user/jle/huffman-encoding
newtype PQueue a = PQ (SkewHeap a) deriving Show
emptyPQ :: PQueue a
= PQ SEmpty
emptyPQ
insertPQ :: Ord a => a -> PQueue a -> PQueue a
PQ h) = PQ (mergeSH h (makeSH x))
insertPQ x (
popPQ :: Ord a => PQueue a -> (Maybe a, PQueue a)
PQ h) = (res, PQ h')
popPQ (where
= popSH h
(res, h')
sizePQ :: PQueue a -> Int
PQ h) = length (toList h) sizePQ (
(Notice toList
, from the Foldable module; we derived Foldable
so that we can use toList
on our SkewHeap
s. If your Haskell implementation cannot derive foldable (if you are not using GHC, for example) — and even if your implementation can — it might be fun to think about how to implement sizePQ
without it!)
We do this so that we hide our low-level skew heap implementation over a “high-level” priority queue interface. We do not export the PQ
constructor, so users cannot ever directly access the underlying skew heap. In this case, the high level isn’t much higher of a level, but it’s good practice to hide away the implementation details when you can in Haskell, a language whose power lies so much in abstraction.
Building our Huffman encoding tree
Now that we have what we need in place, let’s get to doing building our tree. (Again, all available for download.)
Frequency Tables
First, we need to have some sort of frequency table. We will use Data.Map.Strict
’s Map
type:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/huffman/Huffman.hs#L19-L19
-- interactive: https://www.fpcomplete.com/user/jle/huffman-encoding
type FreqTable a = Map a Int
and we’ll import the operations from Data.Map.Strict
qualified:
import qualified Data.Map.Strict as M
Just to work with things now, let’s make a way to generate a FreqTable
from an arbitrary string:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/huffman/Huffman.hs#L22-L25
-- interactive: https://www.fpcomplete.com/user/jle/huffman-encoding
listFreq :: Ord a => [a] -> FreqTable a
= foldr f M.empty
listFreq where
= M.insertWith (+) x 1 m f x m
This says that listFreq
is a fold, where you start with M.empty
(an empty FreqTable
) and for every element, you insert it into the map as a key with value 1. If the key already exists, add one to its current value instead.
> listFreq "hello world"
ghci' ',1),('d',1),('e',1),('h',1),('l',3),('o',2),('r',1),('w',1)] fromList [(
Building the queue
Next, we would like to create Huffman leaves out of all of these elements, with associated weights, and insert them all into a PQueue
. We can do this by using M.foldrWithKey
, which is a foldr
over the map, giving the folding function both the key and the value.
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/huffman/Huffman.hs#L43-L46
-- interactive: https://www.fpcomplete.com/user/jle/huffman-encoding
listQueue :: Ord a => [a] -> PQueue (Weighted a)
= M.foldrWithKey f emptyPQ . listFreq
listQueue where
= insertPQ (WPair v k) pq f k v pq
> let pq = listQueue "hello world"
ghci> :t pq
ghcipq :: PQueue (WPair Int (PreTree Char))
> sizePQ pq
ghci8
> let (popped1, pq') = popPQ pq
ghci> popped1
ghciJust (WPair 1 ' ')
> let (popped2, pq'') = popPQ pq'
ghci> popped2
ghciJust (WPair 1 'd')
> sizePQ pq''
ghci6
Building the tree
Building the tree is going to be a bit harder than a simple fold over the queue, because we have to “branch” based on the state of the queue. Depending on the state of the queue, we make decisions on “control flow”.
The experienced Haskelleur will recognize that this language is very evocative of the Monad design pattern.
The State monad
In particular, we will be using the State monad, which is basically a plain ol’ newtype wrapper around functions s -> (a, s)
. Basically, functions that act on a state and return a value with a modified state.
These functions are actually surprisingly useful, and as it turns out, all stateful computations can be described as “compositions” of these functions.
What do I mean by “compositions”?
Let’s say I have two functions:
f1 :: s -> (a, s)
f2 :: s -> (b, s)
And I wanted to sequence them:
`andThen` f2 f1
What would that even “look like”?
Well, I expect that sequencing two state functions will return a new, “giant” state function that does both functions “one after the other”. That is:
`andThen` f2 :: s -> (b, s) f1
This new function will first run the input state on f1
, and take that resulting state and pass it into f2
, and then return the result of f2
and the resulting modified state of f2
.
So we have something like
andThen :: (s -> (a,s)) -> (s -> (b,s)) -> (s -> (b,s))
= \st -> let (_,st') = f1 s
andThen f1 f2 in f2 st'
Think of andThen
like a semicolon, of sorts.
Notice that we “lose” the result of f1
with andThen
. What if we wanted to use it? We might write a combinator:
andThenWith :: (s -> (a, s))
-> (a -> (s -> (b, s)))
-> (s -> (b, s))
Which you would use like
`andThenWith` (\x -> f2 x) f1
where f2
is a function that takes an a
and returns a s -> (a,s)
.
Basically, it would be exactly the same as andThen
, except the second argument gets access to the result of the first. Writing it is almost as simple, actually —
andThenWith :: (s -> (a,s)) -> (a -> (s -> (b, s))) -> (s -> (b, s))
= \st -> let (x,st') = f1 s
andThenWith f1 f2 in (f2 x) st'
As it turns out…if you squint hard enough, the type signature andThenWith
looks a lot like the type signature for (>>=)
:
(>>=) :: Monad m => m a -> (a -> m b) -> m b
Hm. Let’s create a type synonym for our s -> (a, s)
, to make things more clear.
type State s a = s -> (a, s)
So now our andThenWith
looks like:
andThenWith :: State s a -> (a -> State s b) -> State s b
If we let m ~ State s
:
andThenWith :: m a -> (a -> m b) -> m b
Neat!
As it turns out, we can turn our state functions into a Monad, which encapsulates “sequencing” state functions one after another.
We just need return
:
returnState :: a -> State s a
= \st -> (x, st) returnState x
And return
is returnState
, (>>)
is andThen
, and (>>=)
is andThenWith
.
In real life, we can’t define typeclass instances on type synonyms, so we actually use a newtype
. The standard implementation comes from the transformers library. Because State s
is a member of the Monad
typeclass, we can use normal monad combinators, operators, and do notation. The transformers implementation comes with a few useful primitives:
-- wrap a normal state function into the State wrapper
state :: (s -> (a, s)) -> State s a
-- get grabs the state as the result.
get :: State s s
= state (\st -> (st, st))
get
-- put sets the state to the input
put :: s -> State s ()
= state (\_ -> ((), st))
put s
-- modifies the state with the given function
modify :: (s -> s) -> State s ()
= state (\st -> ((), f st))
modify f
-- alternative implementation of `modify`
f :: (s -> s) -> State s ()
modify'= do
modify' f <- get
st put (f st)
If you’re still lost, check out Brandon Simmon’s state monad tutorial, which was the article that eventually cleared it all up for myself. And feel free to ask questions!
The big usefulness for this “composing stateful functions” business, instead of manually unwrapping and re-wrapping the state, is that now State
actions are first-class, and you can freely compose them and pass them around as objects, and you can write individual “sub-routines”, are little packets of commands that modify state, and then “call them” and compose them from other stateful computations.
Why monads?
One might pause to wonder why we would want to instance our s -> (a, s)
functions as a Monad. Why can’t we just always sequence our state functions using andThen
and andThenWith
?
Using monads, we can now use
do
notation, which is pretty nice sugar.We now have access to the wide library of useful Haskell monad combinators. And boy are there a lot —
sequence
,mapM
,when
,filterM
, etc.We also get an Applicative instance for free, so we can do arbitrary-arity lifting with things like
f <$> x <*> y
, wheref
is a pure function like(+)
andx
andy
are stateful functions. We also get a free Functor instance as well, so we canfmap
.We can now reason with our stateful functions with all of the powerful equational reasoning tools that the monad laws offer.
As you can see, monads are not just a curiosity — they are a powerful and expressive tool!
A quick look back
This is a bit of an unrelated aside…but notice that we could have actually done our previous fold
s as state monad operations; like listFreq
:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/huffman/Huffman.hs#L29-L36
-- interactive: https://www.fpcomplete.com/user/jle/huffman-encoding
runListFreq :: forall a. Ord a => [a] -> FreqTable a
= execState listFreqState M.empty
runListFreq xs where
listFreqState :: State (FreqTable a) ()
= mapM_ addFreq xs
listFreqState
addFreq :: a -> State (FreqTable a) ()
= modify (M.insertWith (+) x 1) addFreq x
execState
runs the given State
computation with the given initial state, and returns the final state s
at the end of it all. Basically, it takes an s -> (a, s)
(the State s a
), an s
, applies the function to it, and returns just the s
in the tuple.
Remember that the best way to read State s a
is just “a type synonym for s -> (a,s)
”. So when we say listFreqState :: State (FreqTable a) ()
, we mean that listFreqState
is a function from a FreqTable a
to ((), FreqTable a)
.
How about listQueue
? We could do it with the state monad too, if we wanted to.
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/huffman/Huffman.hs#L50-L59
-- interactive: https://www.fpcomplete.com/user/jle/huffman-encoding
listQueueState :: Ord a => [a] -> State (PQueue (WeightedPT a)) ()
= M.traverseWithKey addNode (listFreq xs) >> return ()
listQueueState xs where
addNode :: a -> Int -> State (PQueue (WeightedPT a)) ()
= modify (insertPQ (WPair i (makePT x)))
addNode x i
runListQueue :: Ord a => [a] -> PQueue (WeightedPT a)
= execState (listQueueState xs) emptyPQ runListQueue xs
In these cases, the monadic usage isn’t quite necessary or useful on its own. A fold would have probably been more expressive and easier to read. The above examples were just for demonstrations/exercises.
But when do we “need” the state monad? (Or rather, when is a fold not powerful enough or much messier?)
It’s when we want to make decisions or “branch” based on the current state, or the results of our state actions. “Fold for three items; if the next list item is even then do this fold afterwards, otherwise do that fold”. This is when the state monad shines as a monad.
Another case where we might want to use a state monad over a fold is if we forsee us wanting to “compose” our folds into bigger stateful computations. For example, in listQueueState
, we “process” a state, and leave it modified for another state monad action to use.
For example:
prepareQueue :: State (PQueue (WeightedPT a)) ()
useQueue :: State (PQueue (WeightedPT a)) a
doAllTogether :: Ord a => [a] -> State (PQueue (WeightedPT a)) a
= prepareQueue >> listQueueState xs >> useQueue
doAllTogether xs
-- alternatively, the same thing but in do notation
doAllTogether' :: Ord a => [a] -> State (PQueue (WeightedPT a)) a
= do
doAllTogether' xs
prepareQueue
listQueueState xs
useQueue
runDoAllTogether :: Ord a => [a] -> a
= evalState (doAllTogether xs) emptyPQ runDoAllTogether xs
(Remember that (>>)
is just our andThen
, and when we sequence using (>>)
we mean “combine these two actions into one big action that feeds the resulting state of the left side into the beginning state of the right side.”)
Anyways, see that we can just plop a call to listQueueState
inside a sequence of stateful actions, and it’ll just process the queue and leave it for the next action to use.
If we had used listQueue
as a “pure” fold…this is a bit harder to do. You’d have to rewrite listQueue
to take in any arbitrary “starting queue”…extract the starting queue using get
after prepareQueue
, use a let
to bind it as a pure function, then use put
to pop the result back into the state for useQueue
to use. Or use modify
in a just-as-convoluted way.
Moving on, we actually won’t be using runListFreq
in the future (it was mostly for fun), but (spoilers) we might want to hold onto listQueueState
:)
Building with State
So let’s remember how the building process works:
- Pop an item from the queue.
- Pop another item. If the queue was actually empty, and nothing was poppable, you are done; return the result of step 1.
- Merge the two popped items, and push them back into the queue. Go back to step 1.
Sounds simple enough. We should take into account that we would fail to build a tree if the queue was empty to begin with, by returning a Maybe (PreTree a)
instead of a PreTree a
.
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/huffman/Huffman.hs#L75-L98
-- interactive: https://www.fpcomplete.com/user/jle/huffman-encoding
buildTree :: State (PQueue (WeightedPT a)) (Maybe (PreTree a))
= do
buildTree <- state popPQ
t1' case t1' of
Nothing ->
-- queue was empty to begin with, so this fails.
return Nothing
Just t1 -> do
<- state popPQ
t2' case t2' of
Nothing ->
-- We're done, there was only one item! Return a `Just` to
-- indicate success.
return (Just (_wItem t1)) -- break out of the loop
Just t2 -> do
-- merge and push
let combined = mergeWPT t1 t2
modify (insertPQ combined)-- recursive call
buildTree
runBuildTree :: Ord a => [a] -> (Maybe (PreTree a))
= evalState (listQueueState xs >> buildTree) emptyPQ runBuildTree xs
Note that due to our uncanny foresight, popPQ :: PQueue a -> (Maybe a, PQueue a)
is already a state function s -> (a, s)
, where the state is PSQueue a
and the return value is Maybe a
. So all we need to do is say state popPQ
to wrap it in the State s a
newtype wrapper/container, and it becomes an “official” State (PQueue a) (Maybe a)
.
Remember that State s a
is just a thin wrapper/container over a function s -> (a, s)
, anyway, so the two should be somewhat equivalent in your mind; the requirement to wrap it in State
using state
is only because of Haskell’s own language limitations (namely, that you can’t define a Monad instance for s -> (a, s)
in a clean way). When you read State s a
, you should really read s -> (a, s)
, because they are for the most part completely equivalent.
Again, (>>)
is Monad-speak for our andThen
function we defined earlier, so for buildTree
, we do “listQueueState xs
and then buildTree
”. (>>)
joins two s -> (a, s)
functions into one giant s -> (a,s)
, by feeding the resulting state of the first action into the next one. listQueueState
takes an empty priority queue and ‘fills’ it with nodes generated from xs
, leaving a filled priority queue. buildTree
then takes that filled queue and performs our building operations on it, modifying it as it goes along, and ends up with an empty queue as a state and returning the finished tree as a result.
evalState
is like partner of execState
— it runs the state operation on the given starting state, and outputs the final result (instead of the final state). It takes an s -> (a, s)
, an s
, and applies the function to it and gives the resulting a
of the tuple.
Putting it all together
Let’s try it out, shall we?
> fromJust $ runBuildTree "hello world"
ghciPTNode (PTNode (PTNode (PTLeaf 'h')
PTLeaf 'e')
(
)PTNode (PTLeaf 'w')
(PTLeaf 'r')
(
)
)PTNode (PTLeaf 'l')
(PTNode (PTNode (PTLeaf 'd')
(PTLeaf ' ')
(
)PTLeaf 'o')
(
) )
Congrats, we built a Huffman encoding tree! Notice that the most commonly used letter ('l'
, occurring 3 times) is only at depth 2 (and is most accessible), while the others are at depths 3 and 4.
Next steps
That’s it for this post, it’s already long enough!
In the next posts we will look at how we would use this Huffman tree to encode and decode text, and general bytes (Word8
s), and then hook it all up to make a “streaming” compressor and uncompressor that reads a file byte-by-byte and outputs a compressed file as it goes. We’ll then figure out how to store this huffman tree in a compact, serialized binary way, and load it cleanly.
In the mean time, try downloading the source, or playing with it online on fpcomplete!