Fixed-Length Vector Types in Haskell, 2015
by Justin Le ♦
Source ♦ Markdown ♦ LaTeX ♦ Posted in Haskell, Reference, Tutorials ♦ Comments
Update: This post was written by me when I was just starting to learn about type-level things in Haskell, and reflects my own inexperience at the time of writing it. I have released an update, which presents what I hope to be an introduction that is more grounded in modern Haskell and dependent type idioms.
Original Article (written in 2015)
Fixed-length vector types (vector types that indicate the length of the vector in the type itself) are one of the more straightforward applications of the “super-Haskell” GHC type extensions. There’s a lot of magic you can do with GHC’s advanced type mechanisms, but I think fixed length vectors are a good first step to beginning to understand several extensions, including (potentially):
- ConstraintKinds
- DataKinds
- GADTs
- KindSignatures
- TypeFamilies
- TypeOperators
- OverloadedLists
And using type system plugins. (And of course the usual UndecidableInstances
etc.) We’ll be discussing two different ways to implement this — using type-level nats, and using the GHC.TypeLits model to actually be able to use numeric literals in your types. These things are seen in the wild like with the popular linear package’s V
type.
There are a few great tutorials/writeups on this topic, but many of them are from the time before we had some of these extensions, or only discuss a few. I hope to provide a nice comprehensive look about the tools available today to really approach this topic. That being said, I am no expert myself, so I would appreciate any tips/edits/suggestions for things that I’ve missed or done not-the-best :) This post has a lot of open questions that I’m sure people who know more about this than me can answer.
Most of the code in this article can be downloaded and tried out, so follow along if you want!
The Idea
The basic idea is we’ll have a type:
Vec n a
Which is a vector with items of type a
, whose length is somehow encoded in the n
. We’ll then discuss ways to do useful operations on this, as if it were a list.
n
can really only be a certain “kind” of thing — a type that encodes a length. We can represent this by giving it a “kind signature”:
data Vec :: Nat -> * -> *
Which says that our Vec
type constructor takes two arguments: something of kind Nat
(so it can’t be any type…it has to be a type of kind Nat
), something of kind *
(the “normal” kind, of things that have values, like Int
, Maybe Bool
, etc.), and returns something of kind *
(our vector itself).
Using DataKinds for Type-Level Nats
(The code in this section for this type is available online, if you wanted to play along!)
There are a couple of ways to find something for that n
Nat
kind, and one way is to use the simple inductive Nat
:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec/FVTypeNats.hs#L26-L27
data Nat = Z | S Nat
deriving Show
You might have seen this type before…it gives us value-level natural numbers, where Z
is zero, S Z
is one, S (S Z)
is two, S (S (S Z))
is three, etc. So if we had something of type Nat
, it could represent any natural number. This declaration gives you:
- A type
Nat
- A value constructor
Z :: Nat
- A value constructor
S :: Nat -> Nat
However, with the DataKinds extension, when you define this, you also define some extra fancy things. You also define a kind Nat
! More specifically, you get:
- A kind
Nat
- A type
Z :: Nat
(Z
, of kindNat
) - A type constructor
S :: Nat -> Nat
(S
, which takes something of kindNat
, and returns a new thing of kindNat
)
(Note that, to be principled, GHC would prefer us to use 'Z
and 'S
when we are referring to the types, and this is how it’ll print them out in error messages. But we’re going to run with this for now…mostly for aesthetic reasons)
We can check this out in GHCi:
> :set -XDataKinds
ghci> data Nat = Z | S Nat
ghci> :k Z
ghciNat
> :k S Z
ghciNat
> :k S (S Z)
ghciNat
So now we have a type that can encode numbers. Something of type Z
represents zero…something of type S Z
represents 1…something of type S (S Z)
represents two.
Note that you can’t ever have anything like S Bool
…that doesn’t work, because Bool
is of kind *
, but S
expects only Nat
s.
Now we can make our Vec
data type, with the GADTs extension, or “generalized algebraic data types”:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec/FVTypeNats.hs#L37-L44
data Vec :: Nat -> * -> * where
Nil :: Vec Z a
(:#) :: a -> Vec n a -> Vec (S n) a
infixr 5 :#
deriving instance Show a => Show (Vec n a)
deriving instance Eq a => Eq (Vec n a)
If you’ve never seen GADTs before, think of it as a way of declaring a type by giving the type of your constructors instead of just the normal boring form. It’s nothing too crazy…it’s basically like defining Maybe
as:
data Maybe :: * -> * where
Nothing :: Maybe a
Just :: a -> Maybe a
instead of
data Maybe a = Nothing | Just a
In both cases, they create constructors of type Nothing :: Maybe a
and Just :: a -> Maybe a
anyway…so the GADT form just gives us a way to state it explicitly.
Oh, we also used the KindSignatures extension to be able to give a kind signature to Vec
…this is important because we want to make sure the first argument has to be a Nat
. That is, we can’t have anything silly like Vec Bool Int
. We also have to put a separate StandaloneDeriving-extension standalone deriving clause instead of just having deriving Show
because Vec
isn’t a type that can be expressed in “normal Haskell”.
Note that our type is basically like a list:
data [] :: * -> * where
:: [a]
[] (:) :: a -> [a] -> [a]
Except now our type constructor actually has a new Nat
This means that, because of type erasure, everything “runtime” on our new type is basically going to be identical to []
(not considering compiler tricks). In-memory, this new type is essentially exactly []
, but its type has an extra tag that is erased at compile-time.
Okay, let’s define some useful methods:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec/FVTypeNats.hs#L93-L97
headV :: Vec (S n) a -> a
:# _) = x
headV (x
tailV :: Vec (S n) a -> Vec n a
:# xs) = xs tailV (_
Ah, the classic head
/tail
duo from the days pre-dating Haskell. head
and tail
are somewhat of a sore spot or wart in Haskell’s list API1, because they’re partial functions. You tell people all about how Haskell is great because it can prevent run-time errors by ensuring completeness and having the type system enforce null-pointer checks…but then you go ahead and put unsafe functions that throw errors for empty lists anyways in Prelude.
But here…this will never happen! We can only use headV
and tailV
on non-empty lists…it won’t typecheck for empty lists. Do you see why?
It’s because all empty lists are of type Vec Z a
. But headV
and tailV
only take things of type Vec (S n) a
, for any Nat
n
. So, if you ever try to use it on an empty list, it won’t even compile! No more pesky runtime bugs. headV
and tailV
are safe and will never crash at runtime!
Note that the return type of tailV
is a vector of a length one less than the given vector. tailV :: Vec (S Z) a -> Vec Z a
, for instance…or tailV :: Vec (S (S Z)) a -> Vec (S Z) a
. Just like we want!
If you tried implementing this yourself, you might notice that you actually get an error from GHC if you even try to handle the Nil
case for tailV
or headV
. GHC will know when you’ve handled all possible cases, and get mad at you if you try to handle a case that doesn’t even make sense!
Type families and appending
We can also “append” vectors. But we need a way to add Nat
s together first. For that, we can use a type family, using the TypeFamilies extension (with TypeOperators
):
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec/FVTypeNats.hs#L29-L31
type family (x :: Nat) + (y :: Nat) where
'Z + y = y
'S x + y = 'S (x + y)
A “type family” is like a type level function. Compare this to defining (+)
on the value level to the Nat
data type:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec/FVTypeNats.hs#L33-L35
(+#) :: Nat -> Nat -> Nat -- types!
Z +# y = y
S x +# y = S (x +# y)
Basically, we’re defining a new type-level function (+)
on two types x
and y
, both of kind Nat
…and the result is their “sum”. Convince yourself that this “addition” is actually addition. Now, let’s use it for appendV
:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec/FVTypeNats.hs#L99-L101
appendV :: Vec n a -> Vec m a -> Vec (n + m) a
Nil ys = ys
appendV :# xs) ys = x :# appendV xs ys appendV (x
> let v1 = 1 :# 2 :# 3 :# Nil
ghci> let v2 = 0 :# 1 :# Nil
ghci> v1 `appendV` v2
ghci1 :# 2 :# 3 :# 0 :# 1 :# Nil
> :t v1 `appendV` v2
ghci`appendV` v2 :: Vec (S (S (S (S (S Z))) Int v1
Generating
It’d be nice to have type-safe methods of generating these things, too…functions like iterate
, or enumFrom
. One of the ways to do this is by using a typeclass. (Available in a separate file to try out).
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec/Unfoldable.hs#L7-L8
class Unfoldable v where
unfold :: (b -> (a, b)) -> b -> v a
We’re going to call v
an Unfoldable
if you can build a v
from an “unfolding function” and an “initial state”. Run the function on the initial value and get the first item and a new state. Run the function on the new state and get the second item and the next state.
The list instance should make it more clear:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec/Unfoldable.hs#L11-L13
instance Unfoldable [] where
= let (y, x1) = f x0
unfold f x0 in y : unfold f x1
> take 5 $ unfold (\x -> (x `mod` 3 == 2, x^2 - 1)) 2
ghciTrue, False, True, False, True] [
Note that we can have an instance for any fixed-length vector type…where the thing “cuts off” after it’s filled the entire vector:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec/FVTypeNats.hs#L46-L51
instance Unfoldable (Vec Z) where
= Nil
unfold _ _
instance Unfoldable (Vec n) => Unfoldable (Vec (S n)) where
= let (y, x1) = f x0
unfold f x0 in y :# unfold f x1
Take a moment to think about what these instances are doing.
You can create a Vec Z a
from an unfolding function pretty easily, because the only thing with type Vec Z a
is Nil
. So just ignore the function/initial state and return Nil
.
The instance for Vec (S n)
is slightly more involved. To make a Vec (S n) a
, you need an a
and a Vec n a
. You can get the a
from the unfolding function…but where will you get the Vec n a
from? Well, you can use unfold
to make a Vec n a
! But that only makes sense if Vec n
is an Unfoldable
.
So, that’s why in the instance for Vec (S n)
, we constrain that Vec n
must also be an Unfoldable
. We make our result by using our function to create an a
and unfold
to create a Vec n a
(provided Vec n
is an Unfoldable
).
Note that this style of declaration looks a lot like induction. We define our instance for zero…and then we say, “if n
is an instance, then so is S n
”. Induction!
Let’s see this in action.
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec/Unfoldable.hs#L15-L24
replicateU :: Unfoldable v => a -> v a
= unfold (\x -> (x, x))
replicateU
iterateU :: Unfoldable v => (a -> a) -> a -> v a
= unfold (\x -> (x, f x))
iterateU f
fromListMaybes :: Unfoldable v => [a] -> v (Maybe a)
= unfold $ \l -> case l of
fromListMaybes -> (Nothing, [])
[] :xs -> (Just x , xs) x
> replicateU 'a' :: Vec (S (S (S Z))) Char
ghci'a' :# 'a' :# 'a' :# Nil
> replicateU 'a' :: Vec Z Char
ghciNil
> iterateU succ 1 :: Vec (S (S (S (S Z)))) Int
ghci1 :# 2 :# 3 :# 4 :# Nil
> fromListMaybes [1,2] :: Vec (S (S (S Z))) (Maybe Int)
ghciJust 1 :# Just 2 :# Nothing :# Nil
> tailV (iterateU succ 1 :: Vec (S Z) Int)
ghciNil
Note that replicateU
doesn’t need to take in an Int
parameter, like the on in Prelude, to say how many items to have. It just replicates enough to fill the entire vector we want!
Common Typeclasses
We can go in and implement common typeclasses, too. All the ones you’d expect.
We can actually use the DeriveFunctor extension to write a Functor
instance, but let’s write one on our own just for learning purposes:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec/FVTypeNats.hs#L53-L55
instance Functor (Vec n) where
fmap _ Nil = Nil
fmap f (x :# xs) = f x :# fmap f xs
For Applicative
, it isn’t so simple. The Applicative instance is going to be the “ZipList” instance…so we have to be able to make a pure
that depends on the type, and a (<*>)
that depends on the type, too.
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec/FVTypeNats.hs#L57-L63
instance Applicative (Vec Z) where
pure _ = Nil
Nil <*> _ = Nil
instance Applicative (Vec n) => Applicative (Vec (S n)) where
pure x = x :# pure x
:# fs) <*> (x :# xs) = f x :# (fs <*> xs) (f
For Vec Z
, it’s just Nil
. For Vec (S n)
…for pure, you need x :#
something…and that something has to be a Vec n a
. That’s just pure
for Vec n
! Remember, we can’t assume that Vec n
is an Applicative
just because Vec (S n)
is. So we need to add a constraint, that Vec n
an Applicative. Induction, again!
For (<*>)
, we can get the first item easily, it’s just f x
. But for the next item, we need a Vec n a
. Luckily…we have exactly that with the (<*>)
for Vec n
!
Remember, at the end, we’re saying “We have an Applicative
instance for any type Vec n
”. The instance for Vec Z
has pure _ = Nil
. The instance for Vec (S Z)
has pure x = x :# Nil
. The instance for Vec (S (S Z))
has pure x = x :# x :# Nil
, etc. etc.
> fmap (*2) (1 :# 2 :# 3 :# Nil)
ghci2 :# 4 :# 6 :# Nil
> pure 10 :: Vec (S (S Z)) Int
ghci10 :# 10 :# Nil -- like replicateV!
> liftA2 (+) (1 :# 2 :# 3 :# Nil) (100 :# 201 :# 302 :# Nil)
ghci101 :# 203 :# 305 :# Nil
I’ll leave the Monad
instance as an exercise, but it’s in the source files for this post. join
for this instance should be a “diagonal” — the first item of the first vector, the second item of the second vector, the third item of the third vector, etc.
We can define Foldable
and Traversable
the same way. Like for Functor
, GHC can derive these with DeriveFoldable and DeriveTraversable…but we’ll do it again here just to demonstrate.
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec/FVTypeNats.hs#L65-L75
instance Foldable (Vec Z) where
foldMap _ Nil = mempty
instance Foldable (Vec n) => Foldable (Vec (S n)) where
foldMap f (x :# xs) = f x <> foldMap f xs
instance Traversable (Vec Z) where
traverse _ Nil = pure Nil
instance Traversable (Vec n) => Traversable (Vec (S n)) where
traverse f (x :# xs) = liftA2 (:#) (f x) (traverse f xs)
Note that we can only use foldMap f xs
on xs :: Vec n a
, if Vec n
is a Foldable
. So that’s why we add that constraint.
Again, liftA2 (:#) :: Applicative f => f a -> f (Vec n a) -> f (Vec (S n) a)
…so this only makes sense if traverse f s
gives us a Vec n a
. So we have to add that as a constraint.
> toList $ 1 :# 2 :# 3 :# Nil
ghci1,2,3]
[> traverse Identity $ 1 :# 2 :# 3 :# Nil
ghciIdentity (1 :# 2 :# 3 :# Nil)
> sequence_ $ putStrLn "hello" :# putStrLn "world" :# Nil
ghci"hello"
"world"
> sequence $ Just 1 :# Just 2 :# Nil
ghciJust (1 :# 2 :# Nil)
> sequence $ Just 1 :# Nothing :# Nil
ghciNothing
Traversable
of course opens a whole lot of doors. For example, we can write a “safe fromList
”:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec/Unfoldable.hs#L26-L27
fromListU :: (Unfoldable v, Traversable v) => [a] -> Maybe (v a)
= sequence . fromListMaybes fromListU
> fromListU [1,2,3] :: Maybe (Vec (S Z) Int)
ghciJust (1 :# Nil)
> fromListU [1,2,3] :: Maybe (Vec (S (S (S Z))) Int)
ghciJust (1 :# 2 :# 3 :# Nil)
> fromListU [1,2,3] :: Maybe (Vec (S (S (S (S Z)))) Int)
ghciNothing
And, if you’re on GHC 7.8+, you have access to the OverloadedLists language extension, where you can interpret list literals as if they were other structures.
We’ve already already implemented both fromList
and toList
, in a way, already, so this should be a breeze. The only trick you might see is that the IsList
typeclass asks for a type family to return the type of the element in the container from the container type.
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec/FVTypeNats.hs#L86-L91
instance (Unfoldable (Vec n), Traversable (Vec n)) => L.IsList (Vec n a) where
type Item (Vec n a) = a
= case fromListU xs of
fromList xs Nothing -> error "Demanded vector from a list that was too short."
Just ys -> ys
= Data.Foldable.toList toList
> :set -XOverloadedLists
ghci> [1,2,3] :: Vec (S (S Z)) Int
ghci1 :# 2 :# Nil
> [1,2,3] :: Vec (S (S (S (S Z)))) Int
ghci*** Exception: Demanded vector from a list that was too short.
> [1,3..] :: Vec (S (S (S (S Z)))) Int
ghci1 :# 3 :# 5 :# 7 :# Nil
Neat! All of the benefits of list literals that OverloadedLists offers is now available to us.2 Unfortunately, you now open yourself up to runtime errors, so…it’s actually a really bad idea for safety purposes unless you stick to only using it with infinite lists or are very disciplined. (Unless you really want to use list syntax, fromListU
is probably a safer choice for finite lists!)
Indexing
It’d be nice to be able to index into these, of course. For type-safe indexing, we can take advantage of a trick using the Proxy
type.
Many might remember having to get a TypeRep
for a Typeable
instance by doing something like typeOf (undefined :: IO Double)
. That’s because typeOf :: Typeable a => a -> TypeRep
. If you wanted to get the typeRep
for an IO Double
using typeOf
, you have to pass in an IO Double
. But if you don’t have one at hand, you can just use undefined
with a type annotation. It’s a bit of a dirty hack, but it works because typeOf
doesn’t care about the first argument’s value…just its type.
These days, we like to be a bit less embarrassing and use something called Proxy
:
data Proxy a = Proxy
Proxy a
is a bit like ()
. It only has one constructor, and doesn’t take any arguments. But we can use the type signature to “pass in types” to functions, as “arguments”.
We have a couple of options here. One is to make a typeclass for type level nats to turn them into an Integer
or a value-level Nat
, and then do an “unsafe indexing” after verifying, through types, that the index is smaller than the length.
However, this is a little bit silly because we’re just doing an unsafe indexing in the end anyway, so the compiler can’t help us at all. Wouldn’t it be nice if we could get the compiler on our side and write a real safe index?
There are many ways to approach this problem, but one way is to make a specific Index
typeclass: (or make another typeclass like Take
, and write index
in terms of it)
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec/FVTypeNats.hs#L77-L78
class Index (n :: Nat) v where
index :: Proxy n -> v a -> a
Here, we can say that n
and v
are instances of Index n v
if and only if you can safely (totally) index into v a
at index n
. That is, if every value of type v a
ever has an index at n
, a Nat
. (By the way, we need MultiParamTypeClasses to be able to make a type class with two parameters)
So, n ~ S Z
and v ~ Vec (S (S Z)) a
has an instance, because you can get the \(n = 1\) element (the second element) from any value of type Vec (S (S Z)) a
(a length-two vector).
But n ~ S Z
and v ~ Vec (S Z) a
does not. There are actually no length-1 vectors that have a \(1\) index (second element).
Note that we use the Proxy
trick we discussed, so that we can indicate somehow what index we really want. It is a trick that basically allows us to pass a type (S Z
, S (S Z)
, etc.) as a “value”.
Let’s write our instances — but only the instances that make sense.
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec/FVTypeNats.hs#L80-L84
instance Index Z (Vec (S n)) where
index _ (x :# _) = x
instance forall n m. Index n (Vec m) => Index (S n) (Vec (S m)) where
index _ (_ :# xs) = index (Proxy :: Proxy n) xs
The first case instance makes sense. We can definitely index at index Z
(zero) of any Vec (S n) a
— the only thing we can’t index Z
into is Vec Z a
. So, if our vector is of length 1 or higher, we can index at position 0.
The second case says that, if we can index into n
of a Vec m a
, then of course we can index into an S n
of a Vec (S m) a
. To index into S n
of a Vec (S m) a
, all we need to do is index into n
of the Vec m a
tail!
We have to use the ScopedTypeVariables extension to enable us to use, with the forall
statement, the n
in our instance when we are writing our type for Proxy
. If we didn’t, the n
in Proxy n
in our index
definition would be considered unrelated by GHC to the n
in the instance statement, Index (S n) (Vec (S m))
. (This is the only reason we need the forall
)
In any case, note the similarity of this algorithm to the actual indexing function on lists:
0 !! (x:_ ) = x
!! (_:xs) = (n - 1) !! xs n
trying it out…
> index (Proxy :: Proxy (S (S Z))) (1 :# 2 :# 3 :# Nil)
ghci3
> index (Proxy :: Proxy (S (S Z))) (1 :# 2 :# Nil)
ghci*** Compile error!
It’s an error, but remember, it’s a compiler error, that happens before any code is ever even run! No more indexing errors at runtime! Kiss your days of hunting segfault errors in C goodbye!
Aside
This is something I haven’t really been able to find a good answer too. But notice that we actually could have written a “bad” instance of the second instance of Index
:
instance Index (S n) (Vec (S m)) where
index _ (x :# _) = x
And this compiles fine…but gives the wrong behavior, or at least the behavior we don’t want!
Does anybody know a way to state the type of Index
or index
in a way that implementations like this are impossible?
There’s a “fundamental” problem here, it seems, because we can’t really demand or specify anything by the return type, like we could in the other examples. In the other examples, we sort of restricted the implementation by choosing our return type carefully…but for here, it’s just a
. I’d love to hear if anyone has any thoughts on this.
You might notice that it’s a bit of a plain to write S (S (S (S Z)))
, etc., especially for large numbers. And I wouldn’t even think about writing it for the hundreds.
We’ll “fix” this in the next section. However, even before this, you actually can generate these “automatically” with template haskell, using techniques from Functional Pearls: Implicit Configurations, and the linear package does just this. (This path slipped my mind before I posted because I didn’t really consider template Haskell, and I think I’ll edit in a section here soon). With this in mind, I still don’t really consider Template Haskell an optimal or clean approach :)
Using TypeLits and Type Checker Plugins
(This next section uses code that is also available online, as well!)
Using a custom Nat
kind and DataKinds is nice and all, but it’s a bit of a hassle to express large numbers like 100, 1000, etc. However, as of GHC 7.8, we’ve had the ability to actually use numeric (integer) literals in our types. Instead of writing S (S Z)
, we can write 2
.
GHC can’t yet quite work with that well by default. It has trouble proving statements about variables, like (n + 1) ~ (1 + n)
(that n + 1
is “the same as” 1 + n
). Fortunately for us, since GHC 7.10, we have a way to “extend” the type checker with custom plugins that can prove things like this for us. (Note that this +
is the one from GHC.TypeLits
…not the one we defined earlier.)
The ghc-typelits-natnormalise package is a package providing such a plugin. We can have GHC use it to extend its type checking by passing in -fplugin GHC.TypeLits.Normalise
when we execute our code, or by adding a pragma:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec/FVTypeLits.hs#L14-L14
{-# OPTIONS_GHC -fplugin GHC.TypeLits.Normalise #-}
to the top of our file, along with our LANGUAGE
pragmas. (Assuming, of course, a GHC 7.10+)
> :set -XDataKinds -XTypeOperators -XTypeFamilies
ghci> import GHC.TypeLits
ghci> Proxy :: ((n + 1) ~ (1 + n)) => Proxy n
ghci*** Compile error: Cannot match `1 + n` with `n + 1`
> :set -fplugin GHC.TypeLits.Normalise
ghci> Proxy :: ((n + 1) ~ (1 + n)) => Proxy n
ghciProxy -- success!
GHC now uses the plugin to prove that the two are really equivalent.
If you wanted to play along or try out the code samples, I recommend you use a sandbox:
# in directory of your choice
$ cabal sandbox init
$ cabal install ghc-typelits-natnormalise
$ cabal exec bash
# now the package is in scope, when you use ghci or runghc
With that in mind, let’s start restating everything in terms of TypeLits and see what it gains us.
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec/FVTypeLits.hs#L33-L40
data Vec :: Nat -> * -> * where
Nil :: Vec 0 a
(:#) :: a -> Vec (n - 1) a -> Vec n a
infixr 5 :#
deriving instance Show a => Show (Vec n a)
deriving instance Eq a => Eq (Vec n a)
A little nicer, right? Nil
is a Vec 0 a
, and x :# xs
is an element with a Vec (n - 1) a
, which overall is a Vec n a
. Let’s go over everything again to see how it’d look in the new regime. (Note that the kind of the type number literals is also called Nat
…unrelated to our Nat
we used before.)
A new look
First of all, we’re going to have to define TypeLit comparison operators, as they aren’t built in in a useful way.
We have the type family (remember those?) CmpNat x y
, which returns an Ordering
(LT
, EQ
, or GT
) type (of kind Ordering
, using DataKinds…lifting a type and its value constructors to a kind and its types), which is provided and defined for us by GHC in GHC.TypeLits
.
So defining a x > y
constraint is pretty straightforward:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec/FVTypeLits.hs#L31-L31
type x > y = CmpNat x y ~ 'GT
Note that we need the ConstraintKinds extension for this to work, as 1 > 2
is now a constraint, of kind Constraint
.
Given this, let’s do our favorite list functions, headV
and tailV
:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec/FVTypeLits.hs#L89-L93
headV :: (n > 0) => Vec n a -> a
:# _) = x
headV (x
tailV :: (n > 0) => Vec n a -> Vec (n - 1) a
:# xs) = xs tailV (_
Magnificent!
> headV (Nil :: Vec 0 ())
ghci-- Error! Cannot unite 'EQ with 'GT
Neat! The error, remember, is at compile time, and not at runtime. If we ever tried to do an unsafe head, our code wouldn’t even compile! The error message comes from the fact that we need \(n > 0\), but we have \(n = 0\) instead. We have EQ
, but we need GT
.
There is one problem here, though — GHC gives us a warning for not pattern matching on Nil
. But, if we do try to pattern match on Nil
, we get a type error, like the same one we got when using our custom type nats. I think this is probably something that a plugin or sufficiently smart CmpNat
might be able to handle…but I’m not totally sure. Right now, the best thing I can think of is just to do a wildcard match, headV _ = error "What?"
, knowing that that case will never be reached if your program compiles successfully.
Moving on, we see that we don’t even have to do any extra work to define our own type family x + y
…because GHC.TypeLits
already defines it for us! So, we can instantly write….
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec/FVTypeLits.hs#L95-L97
appendV :: Vec n a -> Vec m a -> Vec (n + m) a
Nil ys = ys
appendV :# xs) ys = x :# appendV xs ys appendV (x
> let v1 = 1 :# 2 :# 3 :# Nil
ghci> let v2 = iterateU succ 0 :: Vec 2 Int
ghci> v1 `appendV` v2
ghci1 :# 2 :# 3 :# 0 :# 1 :# Nil
> :t v1 `appendV` v2 :: Vec 5 Int
ghci`appendV` v2 :: Vec 5 Int v1
And our list generating typeclasses —
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec/FVTypeLits.hs#L42-L47
instance Unfoldable (Vec 0) where
= Nil
unfold _ _
instance (Unfoldable (Vec (n - 1)), n > 0) => Unfoldable (Vec n) where
= let (y, x1) = f x0
unfold f x0 in y :# unfold f x1
The translation is pretty mechanical, but I think that this new formulation looks…really nice, and really powerful. “If you can build a list from \(n - 1\) and \(n > 0\), then you can build a list for \(n\)!
Note that because our definitions of replicateU
, iterateU
, and fromListMaybes
was polymorphic over all Unfoldable
, we can actually re-use them from before:
> iterateU succ 1 :: Vec 3 int
ghci1 :# 2 :# 3 :# Nil
> iterateU succ 1 :: Vec 10 Int
ghci1 :# 2 :# 3 :# 4 :# 5 :# 6 :# 7 :# 8 :# 9 :# 10 :# Nil
> replicateU 'a' :: Vec 4 Char
ghci'a' :# 'a' :# 'a' :# 'a' :# Nil
The actual types are much nicer, too — we can write Vec 10 Int
instead of Vec (S (S (S (S (S (S (S (S (S (S Z)))))))))) Int
without resorting to template haskell.
Going through all of our other typeclasses/functions and making the adjustments… (remembering that we can also derive Functor
, Traversable
, and Foldable
using GHC)
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec/FVTypeLits.hs#L49-L87
instance Functor (Vec n) where
fmap _ Nil = Nil
fmap f (x :# xs) = f x :# fmap f xs
instance Applicative (Vec 0) where
pure _ = Nil
Nil <*> _ = Nil
instance (Applicative (Vec (n - 1)), n > 0) => Applicative (Vec n) where
pure x = x :# pure x
:# fs) <*> (x :# xs) = f x :# (fs <*> xs)
(f
instance Foldable (Vec 0) where
foldMap _ Nil = mempty
instance (Foldable (Vec (n - 1)), n > 0) => Foldable (Vec n) where
foldMap f (x :# xs) = f x <> foldMap f xs
instance Traversable (Vec 0) where
traverse _ Nil = pure Nil
instance (Traversable (Vec (n - 1)), n > 0) => Traversable (Vec n) where
traverse f (x :# xs) = liftA2 (:#) (f x) (traverse f xs)
class Index (n :: Nat) v where
index :: Proxy n -> v a -> a
instance (m > 0) => Index 0 (Vec m) where
index _ (x :# _) = x
instance forall n m. (Index (n - 1) (Vec (m - 1)), n > 0, m > 0) => Index n (Vec m) where
index _ (_ :# xs) = index (Proxy :: Proxy (n - 1)) xs
instance (Unfoldable (Vec n), Traversable (Vec n)) => L.IsList (Vec n a) where
type Item (Vec n a) = a
= case fromListU xs of
fromList xs Nothing -> error "Demanded vector from a list that was too short."
Just ys -> ys
= Data.Foldable.toList toList
(Remember, we use the forall
here with ScopedTypeVariables to be able to say that the n
in the type signature is the same n
that is in the type of Proxy
)
> fromListU [1,2,3,4] :: Vec 10 Int
ghciNothing
> fromListU [1,2,3,4] :: Vec 3 Int
ghciJust (1 :# 2 :# 3 :# Nil)
> index (Proxy :: Proxy 2) (1 :# 2 :# 3 :# Nil)
ghci3
> index (Proxy :: Proxy 2) (1 :# 2 :# Nil)
ghci*** Type Error: Couldn't match 'EQ with 'GT
> :set -XOverloadedLists
ghci> [1,2,3] :: Vec 2 Int
ghci1 :# 2 :# Nil
> [1,2,3] :: Vec 4 Int
ghci*** Exception: Demanded vector from a list that was too short.
> [1,3..] :: Vec 5 Int
ghci1 :# 3 :# 5 :# 7 :# 9 :# Nil
I think, overall, this formulation gives a much nicer interface. Being able to just write \(10\) is pretty powerful. The usage with OverloadedLists is pretty clean, too, especially when you can do things like [1,3..] :: Vec 10 Int
and take full advantage of list syntax and succinct vector types. (Minding your runtime errors, of course)
However, you do again get the problem that GHC is not able to do real completeness checking and asks for the Nil
cases still of everything…but adding a Nil
case will cause a type error. The only solution is to add a _
wildcard chase, but…again, this isn’t quite satisfactory.3 If anybody has a way to get around this, I’d love to know :)
Alternative Underlying Representations
Recall that our Vec
was basically identically the normal list type, with an extra field in the type. Due to type erasure, the two are represented exactly the same in memory. So we have \(O(n)\) appends, \(O(n)\) indexing, etc. Our type is essentially equal to
newtype Vec :: Nat -> * -> * where
VecList :: [a] -> Vec n a
For this type, though, we’d need to use “smart constructors” and extractors instead of 1 :# 2 :# Nil
etc.
We could, however, chose a more efficient type, like Vector
from the vector package:
newtype Vec :: Nat -> * -> * where
VecVector :: Vector a -> Vec n a
And, if you made sure to wrap everything with smart constructors, you now have type safe \(O(1)\) random indexing!
(This is representation is similar to the one used by the linear package.)
More Operations
One really weird quirk with this is that many functions you’d normally write using pattern matching you’d now might start writing using typeclasses. One example would be our implementation of indexing, using an IndexV
typeclass.
A bunch of one-shot typeclasses is sort of unideal, as typeclasses are sort of ugly and non-first-class. Ideally you’d only have a few typeclasses for as generic an interface as possible, and then be able to do everything from those. Sometimes this just isn’t practical. I did mention one way around it, which was to make a typeclass to “reify” or turn your type into actual data, and then manipulate your data in an “unsafe” way knowing that the type checker checked that the data matched.
We’ll demonstrate with SomeNat
from GHC.TypeLits
, but you can also make our own for our inductive Nat
type we used in the first half, too.
If we use our “wrapped Vector
approach”, we can just do:
newtype Vec :: Nat -> * -> * where
Vec :: Vector a -> Vec n a
index :: (KnownNat n, m > n) => Proxy n -> Vec m a -> a
index p (Vec v) = v ! fromInteger (natVal p)
That is, index
internally uses (!)
, an unsafe operator…but only after we assure properly that it’s safe to use by stating m > n
in the constraint. We can be sure that GHC will catch any instance where someone tries to index into a Vec m a
whose m
is not greater than the index desired.
The rest is up to you, though — to prove that indexing into a number smaller than m
will always provide an answer. We have to make sure our smart constructors are okay and that (!)
behaves like we think it does.
Singletons
Another answer to these sort of ad-hoc typeclasses is to use techniques involving singletons. Going all into how to use singletons to work with these is an article on its own…luckily, this article has already been written as Part 1: Dependent Types in Haskell by Hiromi ISHII. A major advantage is that you replace typeclasses with type families and more parameterized types. You’ll have to work with an understanding of how singletons work, and accept using some template haskell to generate singleton types for your data types (or write them yourself!). But it’s a powerful way to bring something like dependent types into Haskell, and there’s already a lot of infrastructure of support on it on hackage and in the haskell dev ecosystem in general. I recommend looking at the linked article!
Conclusion
Hopefully you’ll see that we are able to apply the full type-safety of the Haskell compiler to our programs regarding lists by encoding the length of the list in its type and limiting its operations by specifically typed functions and choice of instances. I also hope that you’ve been able to become familiar with seeing a lot of GHC’s basic type extensions in real applications :)
Feel free to download and run any of the samples
Please let me know if I got anything wrong, or if there are any techniques that I should mention here that are out and in the wild today :)
Can we get them out of Prelude? Please? :)↩︎
By the way, the GHC wiki seems to claim that using OverloadedLists this way is impossible. Anyone know what’s going on here? Did we move fast and break everything?↩︎
Interestingly enough, I think this is something where you could have the best of both situations with the Template Haskell method. But I’d hope for something that works on the beautiful TypeLits :’(↩︎