Fixed-Length Vector Types in Haskell, 2015

SourceMarkdownLaTeXPosted in Haskell, Reference, TutorialsComments


Update: This post was written by me when I was just starting to learn about type-level things in Haskell, and reflects my own inexperience at the time of writing it. I have released an update, which presents what I hope to be an introduction that is more grounded in modern Haskell and dependent type idioms.

Original Article (written in 2015)

Fixed-length vector types (vector types that indicate the length of the vector in the type itself) are one of the more straightforward applications of the “super-Haskell” GHC type extensions. There’s a lot of magic you can do with GHC’s advanced type mechanisms, but I think fixed length vectors are a good first step to beginning to understand several extensions, including (potentially):

  • ConstraintKinds
  • DataKinds
  • GADTs
  • KindSignatures
  • TypeFamilies
  • TypeOperators
  • OverloadedLists

And using type system plugins. (And of course the usual UndecidableInstances etc.) We’ll be discussing two different ways to implement this — using type-level nats, and using the GHC.TypeLits model to actually be able to use numeric literals in your types. These things are seen in the wild like with the popular linear package’s V type.

There are a few great tutorials/writeups on this topic, but many of them are from the time before we had some of these extensions, or only discuss a few. I hope to provide a nice comprehensive look about the tools available today to really approach this topic. That being said, I am no expert myself, so I would appreciate any tips/edits/suggestions for things that I’ve missed or done not-the-best :) This post has a lot of open questions that I’m sure people who know more about this than me can answer.

Most of the code in this article can be downloaded and tried out, so follow along if you want!

The Idea

The basic idea is we’ll have a type:

Vec n a

Which is a vector with items of type a, whose length is somehow encoded in the n. We’ll then discuss ways to do useful operations on this, as if it were a list.

n can really only be a certain “kind” of thing — a type that encodes a length. We can represent this by giving it a “kind signature”:

data Vec :: Nat -> * -> *

Which says that our Vec type constructor takes two arguments: something of kind Nat (so it can’t be any type…it has to be a type of kind Nat), something of kind * (the “normal” kind, of things that have values, like Int, Maybe Bool, etc.), and returns something of kind * (our vector itself).

Using DataKinds for Type-Level Nats

(The code in this section for this type is available online, if you wanted to play along!)

There are a couple of ways to find something for that n Nat kind, and one way is to use the simple inductive Nat:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec/FVTypeNats.hs#L26-L27

data Nat = Z | S Nat
         deriving Show

You might have seen this type before…it gives us value-level natural numbers, where Z is zero, S Z is one, S (S Z) is two, S (S (S Z)) is three, etc. So if we had something of type Nat, it could represent any natural number. This declaration gives you:

  • A type Nat
  • A value constructor Z :: Nat
  • A value constructor S :: Nat -> Nat

However, with the DataKinds extension, when you define this, you also define some extra fancy things. You also define a kind Nat! More specifically, you get:

  • A kind Nat
  • A type Z :: Nat (Z, of kind Nat)
  • A type constructor S :: Nat -> Nat (S, which takes something of kind Nat, and returns a new thing of kind Nat)

(Note that, to be principled, GHC would prefer us to use 'Z and 'S when we are referring to the types, and this is how it’ll print them out in error messages. But we’re going to run with this for now…mostly for aesthetic reasons)

We can check this out in GHCi:

ghci> :set -XDataKinds
ghci> data Nat = Z | S Nat
ghci> :k Z
Nat
ghci> :k S Z
Nat
ghci> :k S (S Z)
Nat

So now we have a type that can encode numbers. Something of type Z represents zero…something of type S Z represents 1…something of type S (S Z) represents two.

Note that you can’t ever have anything like S Bool…that doesn’t work, because Bool is of kind *, but S expects only Nats.

Now we can make our Vec data type, with the GADTs extension, or “generalized algebraic data types”:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec/FVTypeNats.hs#L37-L44

data Vec :: Nat -> * -> * where
    Nil  :: Vec Z a
    (:#) :: a -> Vec n a -> Vec (S n) a

infixr 5 :#

deriving instance Show a => Show (Vec n a)
deriving instance Eq a => Eq (Vec n a)

If you’ve never seen GADTs before, think of it as a way of declaring a type by giving the type of your constructors instead of just the normal boring form. It’s nothing too crazy…it’s basically like defining Maybe as:

data Maybe :: * -> * where
    Nothing :: Maybe a
    Just    :: a -> Maybe a

instead of

data Maybe a = Nothing | Just a

In both cases, they create constructors of type Nothing :: Maybe a and Just :: a -> Maybe a anyway…so the GADT form just gives us a way to state it explicitly.

Oh, we also used the KindSignatures extension to be able to give a kind signature to Vec…this is important because we want to make sure the first argument has to be a Nat. That is, we can’t have anything silly like Vec Bool Int. We also have to put a separate StandaloneDeriving-extension standalone deriving clause instead of just having deriving Show because Vec isn’t a type that can be expressed in “normal Haskell”.

Note that our type is basically like a list:

data [] :: * -> * where
    []  :: [a]
    (:) :: a -> [a] -> [a]

Except now our type constructor actually has a new Nat

This means that, because of type erasure, everything “runtime” on our new type is basically going to be identical to [] (not considering compiler tricks). In-memory, this new type is essentially exactly [], but its type has an extra tag that is erased at compile-time.

Okay, let’s define some useful methods:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec/FVTypeNats.hs#L93-L97

headV :: Vec (S n) a -> a
headV (x :# _)  = x

tailV :: Vec (S n) a -> Vec n a
tailV (_ :# xs) = xs

Ah, the classic head/tail duo from the days pre-dating Haskell. head and tail are somewhat of a sore spot or wart in Haskell’s list API1, because they’re partial functions. You tell people all about how Haskell is great because it can prevent run-time errors by ensuring completeness and having the type system enforce null-pointer checks…but then you go ahead and put unsafe functions that throw errors for empty lists anyways in Prelude.

But here…this will never happen! We can only use headV and tailV on non-empty lists…it won’t typecheck for empty lists. Do you see why?

It’s because all empty lists are of type Vec Z a. But headV and tailV only take things of type Vec (S n) a, for any Nat n. So, if you ever try to use it on an empty list, it won’t even compile! No more pesky runtime bugs. headV and tailV are safe and will never crash at runtime!

Note that the return type of tailV is a vector of a length one less than the given vector. tailV :: Vec (S Z) a -> Vec Z a, for instance…or tailV :: Vec (S (S Z)) a -> Vec (S Z) a. Just like we want!

If you tried implementing this yourself, you might notice that you actually get an error from GHC if you even try to handle the Nil case for tailV or headV. GHC will know when you’ve handled all possible cases, and get mad at you if you try to handle a case that doesn’t even make sense!

Type families and appending

We can also “append” vectors. But we need a way to add Nats together first. For that, we can use a type family, using the TypeFamilies extension (with TypeOperators):

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec/FVTypeNats.hs#L29-L31

type family (x :: Nat) + (y :: Nat) where
    'Z   + y = y
    'S x + y = 'S (x + y)

A “type family” is like a type level function. Compare this to defining (+) on the value level to the Nat data type:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec/FVTypeNats.hs#L33-L35

(+#) :: Nat -> Nat -> Nat       -- types!
Z   +# y = y
S x +# y = S (x +# y)

Basically, we’re defining a new type-level function (+) on two types x and y, both of kind Nat…and the result is their “sum”. Convince yourself that this “addition” is actually addition. Now, let’s use it for appendV:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec/FVTypeNats.hs#L99-L101

appendV :: Vec n a -> Vec m a -> Vec (n + m) a
appendV Nil       ys = ys
appendV (x :# xs) ys = x :# appendV xs ys
ghci> let v1 = 1 :# 2 :# 3 :# Nil
ghci> let v2 = 0 :# 1 :# Nil
ghci> v1 `appendV` v2
1 :# 2 :# 3 :# 0 :# 1 :# Nil
ghci> :t v1 `appendV` v2
v1 `appendV` v2 :: Vec (S (S (S (S (S Z))) Int

Generating

It’d be nice to have type-safe methods of generating these things, too…functions like iterate, or enumFrom. One of the ways to do this is by using a typeclass. (Available in a separate file to try out).

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec/Unfoldable.hs#L7-L8

class Unfoldable v where
    unfold :: (b -> (a, b)) -> b -> v a

We’re going to call v an Unfoldable if you can build a v from an “unfolding function” and an “initial state”. Run the function on the initial value and get the first item and a new state. Run the function on the new state and get the second item and the next state.

The list instance should make it more clear:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec/Unfoldable.hs#L11-L13

instance Unfoldable [] where
    unfold f x0 = let (y, x1) = f x0
                  in  y : unfold f x1
ghci> take 5 $ unfold (\x -> (x `mod` 3 == 2, x^2 - 1)) 2
[True, False, True, False, True]

Note that we can have an instance for any fixed-length vector type…where the thing “cuts off” after it’s filled the entire vector:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec/FVTypeNats.hs#L46-L51

instance Unfoldable (Vec Z) where
    unfold _ _ = Nil

instance Unfoldable (Vec n) => Unfoldable (Vec (S n)) where
    unfold f x0 = let (y, x1) = f x0
                  in  y :# unfold f x1

Take a moment to think about what these instances are doing.

You can create a Vec Z a from an unfolding function pretty easily, because the only thing with type Vec Z a is Nil. So just ignore the function/initial state and return Nil.

The instance for Vec (S n) is slightly more involved. To make a Vec (S n) a, you need an a and a Vec n a. You can get the a from the unfolding function…but where will you get the Vec n a from? Well, you can use unfold to make a Vec n a! But that only makes sense if Vec n is an Unfoldable.

So, that’s why in the instance for Vec (S n), we constrain that Vec n must also be an Unfoldable. We make our result by using our function to create an a and unfold to create a Vec n a (provided Vec n is an Unfoldable).

Note that this style of declaration looks a lot like induction. We define our instance for zero…and then we say, “if n is an instance, then so is S n”. Induction!

Let’s see this in action.

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec/Unfoldable.hs#L15-L24

replicateU :: Unfoldable v => a -> v a
replicateU = unfold (\x -> (x, x))

iterateU :: Unfoldable v => (a -> a) -> a -> v a
iterateU f = unfold (\x -> (x, f x))

fromListMaybes :: Unfoldable v => [a] -> v (Maybe a)
fromListMaybes = unfold $ \l -> case l of
                                  []   -> (Nothing, [])
                                  x:xs -> (Just x , xs)
ghci> replicateU 'a'       :: Vec (S (S (S Z))) Char
'a' :# 'a' :# 'a' :# Nil
ghci> replicateU 'a'       :: Vec Z Char
Nil
ghci> iterateU succ 1      :: Vec (S (S (S (S Z)))) Int
1 :# 2 :# 3 :# 4 :# Nil
ghci> fromListMaybes [1,2] :: Vec (S (S (S Z))) (Maybe Int)
Just 1 :# Just 2 :# Nothing :# Nil
ghci> tailV (iterateU succ 1 :: Vec (S Z) Int)
Nil

Note that replicateU doesn’t need to take in an Int parameter, like the on in Prelude, to say how many items to have. It just replicates enough to fill the entire vector we want!

Common Typeclasses

We can go in and implement common typeclasses, too. All the ones you’d expect.

We can actually use the DeriveFunctor extension to write a Functor instance, but let’s write one on our own just for learning purposes:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec/FVTypeNats.hs#L53-L55

instance Functor (Vec n) where
    fmap _ Nil       = Nil
    fmap f (x :# xs) = f x :# fmap f xs

For Applicative, it isn’t so simple. The Applicative instance is going to be the “ZipList” instance…so we have to be able to make a pure that depends on the type, and a (<*>) that depends on the type, too.

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec/FVTypeNats.hs#L57-L63

instance Applicative (Vec Z) where
    pure _    = Nil
    Nil <*> _ = Nil

instance Applicative (Vec n) => Applicative (Vec (S n)) where
    pure x = x :# pure x
    (f :# fs) <*> (x :# xs) = f x :# (fs <*> xs)

For Vec Z, it’s just Nil. For Vec (S n)…for pure, you need x :# something…and that something has to be a Vec n a. That’s just pure for Vec n! Remember, we can’t assume that Vec n is an Applicative just because Vec (S n) is. So we need to add a constraint, that Vec n an Applicative. Induction, again!

For (<*>), we can get the first item easily, it’s just f x. But for the next item, we need a Vec n a. Luckily…we have exactly that with the (<*>) for Vec n!

Remember, at the end, we’re saying “We have an Applicative instance for any type Vec n”. The instance for Vec Z has pure _ = Nil. The instance for Vec (S Z) has pure x = x :# Nil. The instance for Vec (S (S Z)) has pure x = x :# x :# Nil, etc. etc.

ghci> fmap (*2) (1 :# 2 :# 3 :# Nil)
2 :# 4 :# 6 :# Nil
ghci> pure 10 :: Vec (S (S Z)) Int
10 :# 10 :# Nil         -- like replicateV!
ghci> liftA2 (+) (1 :# 2 :# 3 :# Nil) (100 :# 201 :# 302 :# Nil)
101 :# 203 :# 305 :# Nil

I’ll leave the Monad instance as an exercise, but it’s in the source files for this post. join for this instance should be a “diagonal” — the first item of the first vector, the second item of the second vector, the third item of the third vector, etc.

We can define Foldable and Traversable the same way. Like for Functor, GHC can derive these with DeriveFoldable and DeriveTraversable…but we’ll do it again here just to demonstrate.

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec/FVTypeNats.hs#L65-L75

instance Foldable (Vec Z) where
    foldMap _ Nil = mempty

instance Foldable (Vec n) => Foldable (Vec (S n)) where
    foldMap f (x :# xs) = f x <> foldMap f xs

instance Traversable (Vec Z) where
    traverse _ Nil = pure Nil

instance Traversable (Vec n) => Traversable (Vec (S n)) where
    traverse f (x :# xs) = liftA2 (:#) (f x) (traverse f xs)

Note that we can only use foldMap f xs on xs :: Vec n a, if Vec n is a Foldable. So that’s why we add that constraint.

Again, liftA2 (:#) :: Applicative f => f a -> f (Vec n a) -> f (Vec (S n) a)…so this only makes sense if traverse f s gives us a Vec n a. So we have to add that as a constraint.

ghci> toList $ 1 :# 2 :# 3 :# Nil
[1,2,3]
ghci> traverse Identity $ 1 :# 2 :# 3 :# Nil
Identity (1 :# 2 :# 3 :# Nil)
ghci> sequence_ $ putStrLn "hello" :# putStrLn "world" :# Nil
"hello"
"world"
ghci> sequence $ Just 1 :# Just 2 :# Nil
Just (1 :# 2 :# Nil)
ghci> sequence $ Just 1 :# Nothing :# Nil
Nothing

Traversable of course opens a whole lot of doors. For example, we can write a “safe fromList”:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec/Unfoldable.hs#L26-L27

fromListU :: (Unfoldable v, Traversable v) => [a] -> Maybe (v a)
fromListU = sequence . fromListMaybes
ghci> fromListU [1,2,3] :: Maybe (Vec (S Z) Int)
Just (1 :# Nil)
ghci> fromListU [1,2,3] :: Maybe (Vec (S (S (S Z))) Int)
Just (1 :# 2 :# 3 :# Nil)
ghci> fromListU [1,2,3] :: Maybe (Vec (S (S (S (S Z)))) Int)
Nothing

And, if you’re on GHC 7.8+, you have access to the OverloadedLists language extension, where you can interpret list literals as if they were other structures.

We’ve already already implemented both fromList and toList, in a way, already, so this should be a breeze. The only trick you might see is that the IsList typeclass asks for a type family to return the type of the element in the container from the container type.

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec/FVTypeNats.hs#L86-L91

instance (Unfoldable (Vec n), Traversable (Vec n)) => L.IsList (Vec n a) where
    type Item (Vec n a) = a
    fromList xs = case fromListU xs of
                    Nothing -> error "Demanded vector from a list that was too short."
                    Just ys -> ys
    toList      = Data.Foldable.toList
ghci> :set -XOverloadedLists
ghci> [1,2,3] :: Vec (S (S Z)) Int
1 :# 2 :# Nil
ghci> [1,2,3] :: Vec (S (S (S (S Z)))) Int
*** Exception: Demanded vector from a list that was too short.
ghci> [1,3..] :: Vec (S (S (S (S Z)))) Int
1 :# 3 :# 5 :# 7 :# Nil

Neat! All of the benefits of list literals that OverloadedLists offers is now available to us.2 Unfortunately, you now open yourself up to runtime errors, so…it’s actually a really bad idea for safety purposes unless you stick to only using it with infinite lists or are very disciplined. (Unless you really want to use list syntax, fromListU is probably a safer choice for finite lists!)

Indexing

It’d be nice to be able to index into these, of course. For type-safe indexing, we can take advantage of a trick using the Proxy type.

Many might remember having to get a TypeRep for a Typeable instance by doing something like typeOf (undefined :: IO Double). That’s because typeOf :: Typeable a => a -> TypeRep. If you wanted to get the typeRep for an IO Double using typeOf, you have to pass in an IO Double. But if you don’t have one at hand, you can just use undefined with a type annotation. It’s a bit of a dirty hack, but it works because typeOf doesn’t care about the first argument’s value…just its type.

These days, we like to be a bit less embarrassing and use something called Proxy:

data Proxy a = Proxy

Proxy a is a bit like (). It only has one constructor, and doesn’t take any arguments. But we can use the type signature to “pass in types” to functions, as “arguments”.

We have a couple of options here. One is to make a typeclass for type level nats to turn them into an Integer or a value-level Nat, and then do an “unsafe indexing” after verifying, through types, that the index is smaller than the length.

However, this is a little bit silly because we’re just doing an unsafe indexing in the end anyway, so the compiler can’t help us at all. Wouldn’t it be nice if we could get the compiler on our side and write a real safe index?

There are many ways to approach this problem, but one way is to make a specific Index typeclass: (or make another typeclass like Take, and write index in terms of it)

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec/FVTypeNats.hs#L77-L78

class Index (n :: Nat) v where
    index :: Proxy n -> v a -> a

Here, we can say that n and v are instances of Index n v if and only if you can safely (totally) index into v a at index n. That is, if every value of type v a ever has an index at n, a Nat. (By the way, we need MultiParamTypeClasses to be able to make a type class with two parameters)

So, n ~ S Z and v ~ Vec (S (S Z)) a has an instance, because you can get the \(n = 1\) element (the second element) from any value of type Vec (S (S Z)) a (a length-two vector).

But n ~ S Z and v ~ Vec (S Z) a does not. There are actually no length-1 vectors that have a \(1\) index (second element).

Note that we use the Proxy trick we discussed, so that we can indicate somehow what index we really want. It is a trick that basically allows us to pass a type (S Z, S (S Z), etc.) as a “value”.

Let’s write our instances — but only the instances that make sense.

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec/FVTypeNats.hs#L80-L84

instance Index Z (Vec (S n)) where
    index _ (x :# _) = x

instance forall n m. Index n (Vec m) => Index (S n) (Vec (S m)) where
    index _ (_ :# xs) = index (Proxy :: Proxy n) xs

The first case instance makes sense. We can definitely index at index Z (zero) of any Vec (S n) a — the only thing we can’t index Z into is Vec Z a. So, if our vector is of length 1 or higher, we can index at position 0.

The second case says that, if we can index into n of a Vec m a, then of course we can index into an S n of a Vec (S m) a. To index into S n of a Vec (S m) a, all we need to do is index into n of the Vec m a tail!

We have to use the ScopedTypeVariables extension to enable us to use, with the forall statement, the n in our instance when we are writing our type for Proxy. If we didn’t, the n in Proxy n in our index definition would be considered unrelated by GHC to the n in the instance statement, Index (S n) (Vec (S m)). (This is the only reason we need the forall)

In any case, note the similarity of this algorithm to the actual indexing function on lists:

0 !! (x:_ ) = x
n !! (_:xs) = (n - 1) !! xs

trying it out…

ghci> index (Proxy :: Proxy (S (S Z))) (1 :# 2 :# 3 :# Nil)
3
ghci> index (Proxy :: Proxy (S (S Z))) (1 :# 2 :# Nil)
*** Compile error!

It’s an error, but remember, it’s a compiler error, that happens before any code is ever even run! No more indexing errors at runtime! Kiss your days of hunting segfault errors in C goodbye!

Aside

This is something I haven’t really been able to find a good answer too. But notice that we actually could have written a “bad” instance of the second instance of Index:

instance Index (S n) (Vec (S m)) where
    index _ (x :# _) = x

And this compiles fine…but gives the wrong behavior, or at least the behavior we don’t want!

Does anybody know a way to state the type of Index or index in a way that implementations like this are impossible?

There’s a “fundamental” problem here, it seems, because we can’t really demand or specify anything by the return type, like we could in the other examples. In the other examples, we sort of restricted the implementation by choosing our return type carefully…but for here, it’s just a. I’d love to hear if anyone has any thoughts on this.

You might notice that it’s a bit of a plain to write S (S (S (S Z))), etc., especially for large numbers. And I wouldn’t even think about writing it for the hundreds.

We’ll “fix” this in the next section. However, even before this, you actually can generate these “automatically” with template haskell, using techniques from Functional Pearls: Implicit Configurations, and the linear package does just this. (This path slipped my mind before I posted because I didn’t really consider template Haskell, and I think I’ll edit in a section here soon). With this in mind, I still don’t really consider Template Haskell an optimal or clean approach :)

Using TypeLits and Type Checker Plugins

(This next section uses code that is also available online, as well!)

Using a custom Nat kind and DataKinds is nice and all, but it’s a bit of a hassle to express large numbers like 100, 1000, etc. However, as of GHC 7.8, we’ve had the ability to actually use numeric (integer) literals in our types. Instead of writing S (S Z), we can write 2.

GHC can’t yet quite work with that well by default. It has trouble proving statements about variables, like (n + 1) ~ (1 + n) (that n + 1 is “the same as” 1 + n). Fortunately for us, since GHC 7.10, we have a way to “extend” the type checker with custom plugins that can prove things like this for us. (Note that this + is the one from GHC.TypeLits…not the one we defined earlier.)

The ghc-typelits-natnormalise package is a package providing such a plugin. We can have GHC use it to extend its type checking by passing in -fplugin GHC.TypeLits.Normalise when we execute our code, or by adding a pragma:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec/FVTypeLits.hs#L14-L14

{-# OPTIONS_GHC -fplugin GHC.TypeLits.Normalise #-}

to the top of our file, along with our LANGUAGE pragmas. (Assuming, of course, a GHC 7.10+)

ghci> :set -XDataKinds -XTypeOperators -XTypeFamilies
ghci> import GHC.TypeLits
ghci> Proxy :: ((n + 1) ~ (1 + n)) => Proxy n
*** Compile error: Cannot match `1 + n` with `n + 1`
ghci> :set -fplugin GHC.TypeLits.Normalise
ghci> Proxy :: ((n + 1) ~ (1 + n)) => Proxy n
Proxy   -- success!

GHC now uses the plugin to prove that the two are really equivalent.

If you wanted to play along or try out the code samples, I recommend you use a sandbox:

# in directory of your choice
$ cabal sandbox init
$ cabal install ghc-typelits-natnormalise
$ cabal exec bash
# now the package is in scope, when you use ghci or runghc

With that in mind, let’s start restating everything in terms of TypeLits and see what it gains us.

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec/FVTypeLits.hs#L33-L40

data Vec :: Nat -> * -> * where
    Nil  :: Vec 0 a
    (:#) :: a -> Vec (n - 1) a -> Vec n a

infixr 5 :#

deriving instance Show a => Show (Vec n a)
deriving instance Eq a => Eq (Vec n a)

A little nicer, right? Nil is a Vec 0 a, and x :# xs is an element with a Vec (n - 1) a, which overall is a Vec n a. Let’s go over everything again to see how it’d look in the new regime. (Note that the kind of the type number literals is also called Nat…unrelated to our Nat we used before.)

A new look

First of all, we’re going to have to define TypeLit comparison operators, as they aren’t built in in a useful way.

We have the type family (remember those?) CmpNat x y, which returns an Ordering (LT, EQ, or GT) type (of kind Ordering, using DataKinds…lifting a type and its value constructors to a kind and its types), which is provided and defined for us by GHC in GHC.TypeLits.

So defining a x > y constraint is pretty straightforward:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec/FVTypeLits.hs#L31-L31

type x > y = CmpNat x y ~ 'GT

Note that we need the ConstraintKinds extension for this to work, as 1 > 2 is now a constraint, of kind Constraint.

Given this, let’s do our favorite list functions, headV and tailV:

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec/FVTypeLits.hs#L89-L93

headV :: (n > 0) => Vec n a -> a
headV (x :# _)  = x

tailV :: (n > 0) => Vec n a -> Vec (n - 1) a
tailV (_ :# xs) = xs

Magnificent!

ghci> headV (Nil :: Vec 0 ())
-- Error!  Cannot unite 'EQ with 'GT

Neat! The error, remember, is at compile time, and not at runtime. If we ever tried to do an unsafe head, our code wouldn’t even compile! The error message comes from the fact that we need \(n > 0\), but we have \(n = 0\) instead. We have EQ, but we need GT.

There is one problem here, though — GHC gives us a warning for not pattern matching on Nil. But, if we do try to pattern match on Nil, we get a type error, like the same one we got when using our custom type nats. I think this is probably something that a plugin or sufficiently smart CmpNat might be able to handle…but I’m not totally sure. Right now, the best thing I can think of is just to do a wildcard match, headV _ = error "What?", knowing that that case will never be reached if your program compiles successfully.

Moving on, we see that we don’t even have to do any extra work to define our own type family x + y…because GHC.TypeLits already defines it for us! So, we can instantly write….

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec/FVTypeLits.hs#L95-L97

appendV :: Vec n a -> Vec m a -> Vec (n + m) a
appendV Nil       ys = ys
appendV (x :# xs) ys = x :# appendV xs ys
ghci> let v1 = 1 :# 2 :# 3 :# Nil
ghci> let v2 = iterateU succ 0 :: Vec 2 Int
ghci> v1 `appendV` v2
1 :# 2 :# 3 :# 0 :# 1 :# Nil
ghci> :t v1 `appendV` v2 :: Vec 5 Int
v1 `appendV` v2 :: Vec 5 Int

And our list generating typeclasses —

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec/FVTypeLits.hs#L42-L47

instance Unfoldable (Vec 0) where
    unfold _ _ = Nil

instance (Unfoldable (Vec (n - 1)), n > 0) => Unfoldable (Vec n) where
    unfold f x0 = let (y, x1) = f x0
                  in  y :# unfold f x1

The translation is pretty mechanical, but I think that this new formulation looks…really nice, and really powerful. “If you can build a list from \(n - 1\) and \(n > 0\), then you can build a list for \(n\)!

Note that because our definitions of replicateU, iterateU, and fromListMaybes was polymorphic over all Unfoldable, we can actually re-use them from before:

ghci> iterateU succ 1 :: Vec 3 int
1 :# 2 :# 3 :# Nil
ghci> iterateU succ 1 :: Vec 10 Int
1 :# 2 :# 3 :# 4 :# 5 :# 6 :# 7 :# 8 :# 9 :# 10 :# Nil
ghci> replicateU 'a' :: Vec 4 Char
'a' :# 'a' :# 'a' :# 'a' :# Nil

The actual types are much nicer, too — we can write Vec 10 Int instead of Vec (S (S (S (S (S (S (S (S (S (S Z)))))))))) Int without resorting to template haskell.

Going through all of our other typeclasses/functions and making the adjustments… (remembering that we can also derive Functor, Traversable, and Foldable using GHC)

-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec/FVTypeLits.hs#L49-L87

instance Functor (Vec n) where
    fmap _ Nil       = Nil
    fmap f (x :# xs) = f x :# fmap f xs

instance Applicative (Vec 0) where
    pure _    = Nil
    Nil <*> _ = Nil

instance (Applicative (Vec (n - 1)), n > 0) => Applicative (Vec n) where
    pure x = x :# pure x
    (f :# fs) <*> (x :# xs) = f x :# (fs <*> xs)

instance Foldable (Vec 0) where
    foldMap _ Nil = mempty

instance (Foldable (Vec (n - 1)), n > 0) => Foldable (Vec n) where
    foldMap f (x :# xs) = f x <> foldMap f xs

instance Traversable (Vec 0) where
    traverse _ Nil = pure Nil

instance (Traversable (Vec (n - 1)), n > 0) => Traversable (Vec n) where
    traverse f (x :# xs) = liftA2 (:#) (f x) (traverse f xs)

class Index (n :: Nat) v where
    index :: Proxy n -> v a -> a

instance (m > 0) => Index 0 (Vec m) where
    index _ (x :# _) = x

instance forall n m. (Index (n - 1) (Vec (m - 1)), n > 0, m > 0) => Index n (Vec m) where
    index _ (_ :# xs) = index (Proxy :: Proxy (n - 1)) xs

instance (Unfoldable (Vec n), Traversable (Vec n)) => L.IsList (Vec n a) where
    type Item (Vec n a) = a
    fromList xs = case fromListU xs of
                    Nothing -> error "Demanded vector from a list that was too short."
                    Just ys -> ys
    toList      = Data.Foldable.toList

(Remember, we use the forall here with ScopedTypeVariables to be able to say that the n in the type signature is the same n that is in the type of Proxy)

ghci> fromListU [1,2,3,4] :: Vec 10 Int
Nothing
ghci> fromListU [1,2,3,4] :: Vec 3 Int
Just (1 :# 2 :# 3 :# Nil)
ghci> index (Proxy :: Proxy 2) (1 :# 2 :# 3 :# Nil)
3
ghci> index (Proxy :: Proxy 2) (1 :# 2 :# Nil)
*** Type Error: Couldn't match 'EQ with 'GT
ghci> :set -XOverloadedLists
ghci> [1,2,3] :: Vec 2 Int
1 :# 2 :# Nil
ghci> [1,2,3] :: Vec 4 Int
*** Exception: Demanded vector from a list that was too short.
ghci> [1,3..] :: Vec 5 Int
1 :# 3 :# 5 :# 7 :# 9 :# Nil

I think, overall, this formulation gives a much nicer interface. Being able to just write \(10\) is pretty powerful. The usage with OverloadedLists is pretty clean, too, especially when you can do things like [1,3..] :: Vec 10 Int and take full advantage of list syntax and succinct vector types. (Minding your runtime errors, of course)

However, you do again get the problem that GHC is not able to do real completeness checking and asks for the Nil cases still of everything…but adding a Nil case will cause a type error. The only solution is to add a _ wildcard chase, but…again, this isn’t quite satisfactory.3 If anybody has a way to get around this, I’d love to know :)

Alternative Underlying Representations

Recall that our Vec was basically identically the normal list type, with an extra field in the type. Due to type erasure, the two are represented exactly the same in memory. So we have \(O(n)\) appends, \(O(n)\) indexing, etc. Our type is essentially equal to

newtype Vec :: Nat -> * -> * where
    VecList :: [a] -> Vec n a

For this type, though, we’d need to use “smart constructors” and extractors instead of 1 :# 2 :# Nil etc.

We could, however, chose a more efficient type, like Vector from the vector package:

newtype Vec :: Nat -> * -> * where
    VecVector :: Vector a -> Vec n a

And, if you made sure to wrap everything with smart constructors, you now have type safe \(O(1)\) random indexing!

(This is representation is similar to the one used by the linear package.)

More Operations

One really weird quirk with this is that many functions you’d normally write using pattern matching you’d now might start writing using typeclasses. One example would be our implementation of indexing, using an IndexV typeclass.

A bunch of one-shot typeclasses is sort of unideal, as typeclasses are sort of ugly and non-first-class. Ideally you’d only have a few typeclasses for as generic an interface as possible, and then be able to do everything from those. Sometimes this just isn’t practical. I did mention one way around it, which was to make a typeclass to “reify” or turn your type into actual data, and then manipulate your data in an “unsafe” way knowing that the type checker checked that the data matched.

We’ll demonstrate with SomeNat from GHC.TypeLits, but you can also make our own for our inductive Nat type we used in the first half, too.

If we use our “wrapped Vector approach”, we can just do:

newtype Vec :: Nat -> * -> * where
    Vec :: Vector a -> Vec n a

index :: (KnownNat n, m > n) => Proxy n -> Vec m a -> a
index p (Vec v) = v ! fromInteger (natVal p)

That is, index internally uses (!), an unsafe operator…but only after we assure properly that it’s safe to use by stating m > n in the constraint. We can be sure that GHC will catch any instance where someone tries to index into a Vec m a whose m is not greater than the index desired.

The rest is up to you, though — to prove that indexing into a number smaller than m will always provide an answer. We have to make sure our smart constructors are okay and that (!) behaves like we think it does.

Singletons

Another answer to these sort of ad-hoc typeclasses is to use techniques involving singletons. Going all into how to use singletons to work with these is an article on its own…luckily, this article has already been written as Part 1: Dependent Types in Haskell by Hiromi ISHII. A major advantage is that you replace typeclasses with type families and more parameterized types. You’ll have to work with an understanding of how singletons work, and accept using some template haskell to generate singleton types for your data types (or write them yourself!). But it’s a powerful way to bring something like dependent types into Haskell, and there’s already a lot of infrastructure of support on it on hackage and in the haskell dev ecosystem in general. I recommend looking at the linked article!

Conclusion

Hopefully you’ll see that we are able to apply the full type-safety of the Haskell compiler to our programs regarding lists by encoding the length of the list in its type and limiting its operations by specifically typed functions and choice of instances. I also hope that you’ve been able to become familiar with seeing a lot of GHC’s basic type extensions in real applications :)

Feel free to download and run any of the samples

Please let me know if I got anything wrong, or if there are any techniques that I should mention here that are out and in the wild today :)


  1. Can we get them out of Prelude? Please? :)↩︎

  2. By the way, the GHC wiki seems to claim that using OverloadedLists this way is impossible. Anyone know what’s going on here? Did we move fast and break everything?↩︎

  3. Interestingly enough, I think this is something where you could have the best of both situations with the Template Haskell method. But I’d hope for something that works on the beautiful TypeLits :’(↩︎

Comments powered by Disqus