This post is a follow-up to my fixed-length vectors in haskell in 2015 post! When I was writing the post originally, I was new to the whole type-level game in Haskell; I didn’t know what I was talking about, and that post was a way for me to push myself to learn more. Immediately after it was posted, of course, people taught me where I went wrong in the idioms I explained, and better and more idiomatic ways to do things. And that’s great! Learning is awesome!
Unfortunately, however, to my horror, I began noticing people referring to the post in a canonical/authoritative way…so the post became an immediate source of guilt to me. I tried correcting things with my practical dependent types in haskell series the next year. But I still saw my 2015 post being used as a reference even after that post, so I figured that writing a direct replacement/follow-up as the only way I would ever fix this!
So here we are in 2017. GHC 8.2 is here, and base is in version 4.10. What’s the “right” way to do fixed-length vectors in Haskell?
This post doesn’t attempt to present anything groundbreaking or new, but is meant to be a sort of reference/introduction to fixed-length vectors in Haskell, as of GHC 8.2 and the 2017 Haskell ecosystem.
We’ll be looking at two methods here: The first one we will be looking at is a performant fixed-length vector that you will probably be using for any code that requires a fixed-length container — especially for tight numeric code and situations where performance matters. We’ll see how to implement them using the universal native
KnownNat mechanisms, and also how we can implement them using singletons to help us make things a bit smoother and more well-integrated. For most people, this is all they actually need. (I claim the canonical haskell ecosystem source to be the vector-sized library)
The second method is a structural fixed-length inductive vector. It’s…actually more like a fixed-length (lazily linked) list than a vector. The length of the list is enforced by the very structure of the data type. This type is more useful as a streaming data type, and also in situations where you want take advantage of the structural characteristics of lengths in the context of a dependently typed program. (I claim the canonical haskell ecosystem source to be the type-combinators library)
The Non-Structural Way
(Code for this section (as well as exercise solutions) are available here)
In most situations, if you use vectors, you want some sort of constant-time indexed data structure. The best way to do this in Haskell is to wrap the heavily optimized vector library with a newtype wrapper that contains its length as a phantom type parameter.
Vec n a will represent an
n-element vector of
as. So, a
Vec 5 Int will be a vector of five
Vec 10 String is a vector of 10
For our numeric types, we’re using the fancy “type literals” that GHC offers us with the
DataKinds extension. Basically, alongside the normal kinds
* -> *, etc., we also have the
Nat kind; type literals
100, etc. are all types with the kind
Nat, from the GHC.TypeNats module1.
You can “reflect” the type-level numeral as a value using the
KnownNat typeclass, provided by GHC, which lets you gain back the number as a run-time value using
natVal: (This process is called “reflection”)
Natural, from Numeric.Natural, is a non-negative
The Smart Constructor
We can use
natVal with the
KnownNat typeclass to write a “smart constructor” for our type – make a
Vec from a
Vector, but only if the length is the correct type:
Here, we use
ScopedTypeVariables so we can refer to the
n in the type signature in the function body (for
natVal (Proxy @n)), and we need to use an explicit forall, then, to bring the
n into scope.
We also use
TypeApplications syntax (
Proxy @n) so we can say “We want a
Proxy :: Proxy n” and tell
natVal that we want the
Utilizing type-level guarantees
Another operation we might want to do with vectors is do things with them that might change their length in a predetermined way. We might want the type of our vectors to describe the nature of the operations they are undergoing. For example, if you saw a function:
You can see that it takes a function and a vector of length
n, and returns another vector of length
n. Clearly, this function might be a “map” function, which applies the function to all of the values in the
Vec! We know that it must have the same length, so it can’t drop or add items. (However, it could still be shuffling or duplicating or permuting the items, as long as the resulting length is the same)
In this situation, we can write such a mapping function in an “unsafe” way, and then give it our type signature as a form of documentation:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec-2/VecWrapped.hs#L28-L29 mapVec :: (a -> b) -> Vec n a -> Vec n b mapVec f v = UnsafeMkVec $ V.map f (getVector v) -- just for fun -- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec-2/VecWrapped.hs#L31-L32 instance Functor (Vec n) where fmap = mapVec
The compiler didn’t help us write this function, and we have to be pretty careful that the guarantees we specify in our types are reflected in the actual unsafe operations. This is because our types don’t structurally enforce their type-level lengths.
So, why bother? For us, here, our fixed-length vector types basically act as “active documentation”, in a way. Compare:
Which is the type signature we’d have to write for a map of an “unsized” vector (like from the Data.Vector module)
We have to rely on the documentation to tell us what the length of the final resulting vector is, even though it can be known statically if you know the length of the input vectors. The vectors have a static relationship in their length, but this isn’t specified in a way that the compiler can take advantage of.
By having our
mapVec :: (a -> b) -> Vec n a -> Vec n b, the relationship between the input lengths and output length is right there in the types, when you use
mapVec, GHC is aware of the relationships and can give you help in the form of typed hole suggestions and informative type errors. You can even catch errors in logic at compile-time instead of runtime!
-- the resulting vector's length is the sum of the input vectors' lengths -- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec-2/VecWrapped.hs#L34-L35 (++) :: Vec n a -> Vec m a -> Vec (n + m) a UnsafeMkVec xs ++ UnsafeMkVec ys = UnsafeMkVec (xs V.++ ys) -- you must zip two vectors of the same length -- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec-2/VecWrapped.hs#L37-L38 zipVec :: Vec n a -> Vec n b -> Vec n (a, b) zipVec (UnsafeMkVec xs) (UnsafeMkVec ys) = UnsafeMkVec (V.zip xs ys) -- type-level arithmetic to let us 'take' -- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec-2/VecWrapped.hs#L40-L43 takeVec :: forall n m a. KnownNat n => Vec (n + m) a -> Vec n a takeVec (UnsafeMkVec xs) = UnsafeMkVec (V.take l xs) where l = fromIntegral (natVal (Proxy @n)) -- splitAt, as well -- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec-2/VecWrapped.hs#L45-L49 splitVec :: forall n m a. KnownNat n => Vec (n + m) a -> (Vec n a, Vec m a) splitVec (UnsafeMkVec xs) = (UnsafeMkVec ys, UnsafeMkVec zs) where l = fromIntegral (natVal (Proxy @n)) (ys, zs) = V.splitAt l xs
(+) comes from GHC.TypeNats, which provides it as a type family (type-level function) we can use, with proper meaning and semantics.
Note the usage of
splitVec to let the function ensure that the input vector has “enough” (at least
n) elements to do the taking.
Notes on the typechecker
GHC’s typechecker works very well with concrete, monomorphic
Nats. For example,
5 + 3 will always typecheck as
8, so you don’t have to worry at all about
splitVec’s usage of
(+) if you work with monomorphic, specific
However, GHC treats
(+) “opaquely” when using using it with polymorphic type variables. That means that
n + (m + o) is seen as a completely different type to GHC than
(n + m) + o – GHC doesn’t reduce
+, and to it, they both just look like different trees. Remember that one is
(+) n ((+) m o), and the other is
(+) ((+) n m) o. Completely different structure!
This comes up as an issue when you start doing non-trivial things, so it sometimes helps to augment GHC’s typechecker.
The ghc-typelits-natnormalise package provides such a plugin. If we pass it as a flag to GHC (as
-fplugin GHC.TypeLits.NatNormalise) or as a pragma:
Then GHC will be able to recognize the fact that
n + (m + o) and
(n + m) + o are the same, and will unify them during typechecking. It also provides normalization/unification for many other situations that we “expect” to work when using
+ on type variables.
We need an appropriate type for indexing our vectors, but we’d like a type where indexing is “safe” – that is, that you can’t compile a program that will result in an index error.
For this, we can use the finite-typelits package, which provides the
Finite n type.
Finite n type is a type with exactly
n distinct inhabitants/values. For example,
Finite 4 contains four “anonymous” inhabitants. For convenience, sometimes we like to name them 0, 1, 2, and 3. In general, we sometimes refer to the values of type
Finite n as 0 … (n - 1).
So, we can imagine that
Finite 6 has inhabitants corresponding to 0, 1, 2, 3, 4, and 5. We can convert back and forth between a
Finite n and its
Integer representation using
We can use a
Finite n to “index” a
Vector n a. A
Vector n a has exactly
n slots, and a
Finite n has
n possible values. Clearly,
Finite n only contains valid indices into our vector!
index will never fail at runtime due to a bad index — do you see why? Valid indices of a
Vector 5 a are the integers 0 to 4, and that is precisely the exact things that
Finite 5 can store!
We can directly generate these vectors in interesting ways. Using return-type polymorphism, we can have the user directly request a vector length, just by using type inference or a type annotation. (kind of like with
For example, we can write a version of
Note that normally,
replicate takes an
Int argument so that the user can give how long the resulting vector needs to be. However, with our new
replicate, we don’t need that
Int argument — the size of the vector we want can more often than not be inferred auto-magically using type inference!
With this new cleaner type signature, we can actually see that
replicate’s type is something very familiar. Look at it carefully:
You might recognize it as very similar to haskellism
replicate is actually
pure for the Applicative instance of
Vec n! As an extra challenge, what would
<*> be? See the solution if you want to check your answer!
Generating with indices
We can be a little more fancy with
replicate, to get what we normally call
Type-Safety and positives and negatives
I think it’s an interesting point that we’re using
Finite n in a different sense here than in
index, and for different reasons. In
Finite is in the “negative” position — it’s something that the function “takes”. In
Finite is in the “positive” position — it’s something that the function “gives” (to the
In the negative position,
Finite n and type-safety is useful because:
- It tells the user what sort of values that the function expects. The user knows, just from the type, that indexing a
Vec 5 arequires a
Finite 5, or a number between 0 and 4.
- It guarantees that whatever
Finite nindex you give to
indexis a valid one. It’s impossible to give
indexan “invalid index”, so
indexis allowed to use “unsafe indexing” in its implementation, knowing that nothing bad can be given.
- It lets you develop code in “typed-hole” style: if a function requires a
Finite 4, put an underscore there, and GHC will tell you about all the
Finite 4s you have in scope. It can help you write your code for you!
In the positive position,
Finite n and the type-safety have different uses and advantages: it tells the user what sort of values the function can return, and also also the type of values that the user has to be expected to handle. For example, in
generate, the fact that the user has to provide a
Finite n -> a tells the user that they have to handle every number between 0 and n-1, and nothing else.
Moving between Sized and Unsized
One key part of our API is missing: how to convert between “sized” and “unsized” vectors.
Converting from sized to unsized is easy, and we already have it:
Converting from unsized to sized is harder. We already saw a “shoe-horning” method, if we know the size we want at compile-time:
But what if we don’t know what size
n we want? What if we want
n to be whatever the actual size of the input vector is?
In general we can’t predict the size of our input vector at compile-time, so we can’t just directly put in a size we want. What we want is a method to return a
Vec n, where
n is the length of the input vector, determined at runtime.
I’m going to try to convince you that a plausible API is:
(Note: this does require
People familiar with dependent types might recognize that
withVec is a function that takes an unsized vector and returns an existentially quantified sized vector, in CPS-style. Basically, give the function a vector, and a way to “handle” a
Vec n of any possible size. The function will then give your handler a
Vec n of the proper type/size. The function gets to chose the
n that you must handle.
Within your continuation/handler, you can take advantage of the size type, and do take advantage of all of the type-level guarantees and benefits of a length-indexed vector. In a way, it is its own “world” where your vector has a fixed size. However, the caveat is that you have to treat the size universally — you have to be able to handle any possible size given to you, in a parametrically polymorphic way.
ghci> myVector = V.fromList [10,5,8] :: V.Vector Int ghci> withVec myVector $ \(v :: Vec n Int) -> -- in this function body, `v :: Vec 3 Int`, and `n ~ 3` -- whatever I return here will be the return value of the entire line case packFinite 1 :: Maybe (Finite n) of -- Finite 3 Nothing -> 0 Just i -> v `index` i 5
We could write, say, a function to always safely get the third item:
And we can run it:
We can even do something silly like convert an unsized vector to a sized vector and then back again:
Now that I’ve (hopefully) convinced you that this function really does convert an unsized vector into a sized vector that you can use, let’s see how we can implement it!
To do this, we can take advantage of the
someNatVal function (from GHC.TypeNats):
SomeNat contains what we call an existentially quantified type,
n. Basically, a value of
SomeNat contains a
Proxy n with some specific
n, that is hidden “inside” the constructor. The only way to figure it out is to pattern match on the constructor and use it in a generic and parametrically polymorphic way. Sound familiar?
Natural (a non-negative Integer type) into a
SomeNat — it “picks” the right
n (the one that corresponds to that
Natural) and stuffs/hides it into
SomeNat. We can leverage this to write our
@m is used with
UnsafeMkVec to specify that we want a
Vec m a.)
This process is actually called “reification” – we take a value-level runtime property (the length) and “reify” it, bringing it up to the type-level.
And now, we have both of our conversion functions! We can convert from sized to unsized using
getVector, and from unsized to sized using
The final useful API aspect we will be looking at is how to verify properties of our vector lengths at the type level, and let us use those properties.
One common thing we might want to do is ensure that two vectors have the same length. This might happen when we use
withVec from two different vectors, and we get a
Vec n a and
Vec m a of two (potentially) different lengths.
We can do this using
sameNat from GHC.TypeNats:
The only way we can have a non-bottom value of type
n :~: m is with the
Refl constructor, which can only be used in the case that
m are equal.
sameNat gives us that
Refl, if possible — that is, if
m are equal. If not, it gives us
Now, we can write:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec-2/VecWrapped.hs#L78-L81 exactLength :: forall n m a. (KnownNat n, KnownNat m) => Vec n a -> Maybe (Vec m a) exactLength v = case sameNat (Proxy @n) (Proxy @m) of Just Refl -> Just v -- here, n ~ m, so a `Vec n a` is a `Vec m a`, too Nothing -> Nothing
(We could also write this by using
mkVec, which wraps and unwraps, but let’s pretend it is expensive to construct and re-construct).
Now we can do:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec-2/VecWrapped.hs#L37-L37 zipVec :: Vec n a -> Vec n b -> Vec n (a, b) -- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec-2/VecWrapped.hs#L83-L89 zipSame :: forall a b. V.Vector a -> V.Vector b -> Maybe (V.Vector (a, b)) zipSame v1 v2 = withVec v1 $ \(v1' :: Vec n a) -> withVec v2 $ \(v2' :: Vec m b) -> case exactLength v1' of Just v1Same -> Just $ getVector (zipVec v1Same v2') -- v1' has the same length as v2' Nothing -> Nothing
Which will zip two unsized vectors, but only if their lengths are the same.
Now, “checking that the length is a certain length” is literally the least interesting property we can test about our vectors. There are definitely more interesting properties we can test, like whether or not our lengths are even or odd, if they are greater than a certain number, etc.; for these, the process is essentially the same: find some way, at runtime, to get some sort of witness for the property you want. In our case, our witness was
n :~: m, which witnessed the fact that
n ~ m. Different libraries might provide different witnesses that might be useful. But the general process is
- Find a way to get your witness, using some runtime function (that will probably return
- Pattern match on your witness, and see that property realized and usable by GHC/the type checker!
Help from singletons
You have probably heard that
TypeNats provides a very bare-bones and primitive interface. This is true. Its interface also sometimes doesn’t play well with other type-level mechanisms you might want to try. To prepare you for the real world, let’s re-implement these things using the singletons library, which provides a unified interface for type-level programming in general.
someNatVal, we can use the singletons equivalents,
-- TypeNats style natVal :: KnownNat n => p n -> Natural -- Singletons style sing :: KnownNat n => Sing n fromSing :: Sing n -> Natural -- (for n :: Nat) -- TypeNats style data SomeNat = forall n. KnownNat n => SomeNat (Proxy n) someNatVal :: Natural -> SomeNat -- Singletons style data SomeSing Nat = forall n. SomeSing (Sing n) toSing :: Natural -> SomeSing Nat withSomeSing :: Natural -> (forall n. Sing n -> r) -> r -- TypeNats style sameNat :: (KnownNat n, KnownNat m) => Proxy n -> Proxy m -> Maybe (n :~: m) -- Singletons style -- from Data.Singletons.Decide -- for our purposes, Decision is basically a fancy Maybe data Decision a = Proved a | Disproved (a -> Void) (%~) :: Sing n -> Sing m -> Decision (n :~: m)
Hopefully the above should give you a nice “key” for translating between the two styles. But here are some practical translations:
-- "explicit Sing" style -- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec-2/VecWrappedSingletons.hs#L20-L24 mkVec_ :: Sing n -> V.Vector a -> Maybe (Vec n a) mkVec_ s v | V.length v == l = Just (UnsafeMkVec v) | otherwise = Nothing where l = fromIntegral (fromSing s) -- "implicit" style -- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec-2/VecWrappedSingletons.hs#L26-L30 mkVec :: forall n a. KnownNat n => V.Vector a -> Maybe (Vec n a) mkVec v | V.length v == l = Just (UnsafeMkVec v) | otherwise = Nothing where l = fromIntegral (fromSing (sing :: Sing n)) -- alternatively, re-using `mkVec_` mkVec :: KnownNat n => V.Vector a -> Maybe (Vec n a) mkVec = mkVec_ sing
As you can see, in singletons, we have the luxury of defining our functions in “explicit” style (where the user passes in a
Sing token which reveals what length they want) or “implicit” style (where the length is inferred from the return type, requiring a
KnownNat n => constraint), like we have been writing up to this point.
Sing n -> and
KnownNat n => really have the same power. You can think of
Sing n as a token that carries around
KnownNat n =>, in a way.
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec-2/VecWrappedSingletons.hs#L32-L42 replicate_ :: Sing n -> a -> Vec n a replicate_ s x = UnsafeMkVec $ V.replicate l x where l = fromIntegral (fromSing s) replicate :: KnownNat n => a -> Vec n a replicate = replicate_ sing withVec :: V.Vector a -> (forall n. Sing n -> Vec n a -> r) -> r withVec v f = case toSing (fromIntegral (V.length v)) of SomeSing s -> f s (UnsafeMkVec v) -- alternatively, skipping `SomeSing` altogether: -- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec-2/VecWrappedSingletons.hs#L44-L54 withVec' :: V.Vector a -> (forall n. Sing n -> Vec n a -> r) -> r withVec' v0 f = withSomeSing (fromIntegral (V.length v0)) $ \s -> f s (UnsafeMkVec v0) exactLength_ :: Sing m -> Sing n -> Vec n a -> Maybe (Vec m a) exactLength_ sM sN v = case sM %~ sN of Proved Refl -> Just v Disproved _ -> Nothing exactLength :: (KnownNat m, KnownNat n) => Vec n a -> Maybe (Vec m a) exactLength = exactLength_ sing sing
Note that you aren’t required to implement both a
replicate — I’m just including them here to show that both API’s (implicit and explicit) are possible. (You can always just directly use
sing right away before getting started to get the
Sing n that those functions use, and so skip
replicate_ and other explicit variants)
One slight bit of friction comes when using libraries that work with
KnownNat, like finite-typelits and the
Finite type. But we can convert between the two using
-- SNat can be used to construct a `Sing` if we have a `KnownNat` constraint -- It can also be pattern matched on to reveal a `KnownNat constraint` SNat :: KnownNat n => Sing n -- we can give a `Sing n` and be able to execute something in the context where -- that `n` has a `KnownNat` constraint withKnownNat :: Sing n -> (KnownNat n => r) -> r
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec-2/VecWrappedSingletons.hs#L56-L60 generate_ :: Sing n -> (Finite n -> a) -> Vec n a generate_ s f = withKnownNat s $ UnsafeMkVec $ V.generate l (f . fromIntegral) where l = fromIntegral (fromSing s) -- alternatively, via pattern matching: -- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec-2/VecWrappedSingletons.hs#L63-L66 generate'_ :: Sing n -> (Finite n -> a) -> Vec n a generate'_ s@SNat f = UnsafeMkVec $ V.generate l (f . fromIntegral) where l = fromIntegral (fromSing s) -- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec-2/VecWrappedSingletons.hs#L68-L69 generate :: KnownNat n => (Finite n -> a) -> Vec n a generate = generate_ sing
You can see most of our original code (with pure
KnownNat) rewritten to work with singletons in this file.
As you can see, singletons-style programming completely subsumes programming with
KnownNat. What we don’t see here is that singletons style integrates very well with the rest of the singletons ecosystem…so you might just have to take my word for it :)
What we have just witnessed is the bridge between the singletons ecosystem and the rest of the Haskell ecosystem’s usage of
KnownNat, because it is provided by GHC itself, is universal. However, I recommend any new projects or libraries you write that do anything more than the most trivial of usages of
KnownNat should take a look at doing things singletons-style.
Working with just
KnownNat, you run into limitations very quickly unless you stick to very basic things. And, if you ever work with any other type-level stuff, singletons integrates very well and very smoothly with everything else type-level you do. If you plan on doing other type-level things besides just the most basic, you will not regret starting singletons-style from the beginning.
This exact pattern is used in many real-world libraries. The canonical fixed-length vector library implemented in this style is vector-sized, which more or less re-exports the entire vector library, but with a statically-sized interface. This is the library I use for all my my modern sized-vector needs.
It’s also provided in the linear library, which was one of the first major libraries to adopt this style. However, it offers an incomplete API, and requires lens — its main purpose is for integration with the rest of the linear library, which it does very well.
Anyway, if all you really wanted was performant fixed-size containers, feel free to stop reading now (or jump to the conclusion). But if you want to explore a bit deeper into the world of inductive dependent types … continue on :)
The Structural Way
So, the (a?) problem with
TypeNats from GHC is that it has no internal structure. It’s basically the same as the
Natural type — every single value (constructor) is completely structurally unrelated to the next.
Just like we can imagine
We can also think of
Nat as just
0 | 1 | 2 | 3 | 4 .... Each constructor is completely distinct.
This is useful for most practical applications. However, when we want to use our fixed-length types in a more subtle and nuanced way, it might help to work with a length type that is more…structurally aware.
We’ve also noticed that the structure of our
Vec and the structure of our
Nat have nothing in common, so we can’t take advantage of any shared structure to help us with type-safety in our implementation…everything we wrote was pretty much implemented using “unsafe” functions.
So, enough of this non-structural blasphemy. We are proper dependent type programmers, dangit! We want structural verification! Compiler verification from the very bottom!
For this, we’ll dig into inductive type-level nats.
We’re using the
DataKinds extension, so not only does that define the type
Nat with the values
S :: Nat -> Nat, it also defines the kind
Nat with the types
'S :: Nat -> Nat! (note the backticks)
'Z represents 0, and
'S represents the “successor” function: one plus whatever number it contains.
'S 'Z represents 1,
'S ('S 'Z) represents 2, etc.
And now we can define a fixed-length list, which is basically a normal haskell list “zipped” with
Here, we’re using
GADT syntax to define our type using its constructors: the
VNil constructor (which creates a
Vec 'Z a, or the empty vector, like
) and the
(:+) constructor (like cons, or
(:)), which conses an item to a
Vec n a to get a
Vec ('S n) a, or a vector with one more element.
Basically, all usage of nil and cons (
:+) keeps track of the current “length” of the vectors in its type. Observe that the only way to construct a
Vec ('S ('S 'Z)) a is by using two
:+s and a
Type-level Guarantees are Structurally Free
One nice thing about this is that there is no “unsafe” way to construct a
Vec is inherently of the correct size. The very act of constructing it enforces its length.
Remember our “unsafe”
mapVec? We had to implement it unsafely, and trust that our implementation is correct. Even worse — our users have to trust that our implementation is correct!
But writing such a
mapVec function using
Vec is guaranteed to preserve the lengths:
Our implementation is guaranteed to have the correct length. Neat! We get all of the documentation benefits described in our previous discussion of
mapVec, plus more.
We can write
Isn’t it neat how the code reads exactly like the code for map/zip for lists? Because their structure is identical, their only real difference is the type-level tag. All of the functions we write are the same.
GHC provided our
+ before, so we have to write it ourselves if we want to be able to use it for our
Nats. We can write it as a type family:
This works! However, we have to be careful that GHC can verify that the final vector really does have the length
n + m. GHC can do this automatically only in very simple situations. In our situation, it is possible because
++ have the exact same structure.
Take a moment to stare at the definition of
++ very closely, and then squint really hard. You can see that
++ really describe the “same function”, using the exact same structure. First, if the first item is a Z-y thing, return the second item as-is. If the first item is a consy thing, return the second item consed with the rest of the first item. Roughly speaking, of course.
This is a part of what we mean when we say that we can take advantage of the structure of the length type. Here, the structure of
Nat aligns so well with the structure of
Vec what we can prove structural properties about
Nat and the
Vec together by exploiting their shared inductive structure.
Unfortunately, for examples where the function we write doesn’t exactly match the structure as the type family we write, this won’t work. And sometimes, the structural properties might get in the way of what we are trying to prove/produce. An example here would be a
snoc function (cons to the end of a list). If you try writing it, you’ll see that the structure of
Vec fight back against you pretty hard. So, exploiting structure isn’t universally useful, but it definitely helps in many situations! Handling tricky cases like this is a subject for a whole other blog post.
To index our previous type, we used some abstract
Finite type, where
Finite n conveniently represented the type of all possible indices to a
Vec n a. We can do something similar, inductively, as well:
I always thought of this inductive definition of
Fin as a cute trick, because I don’t think there was any way I could have thought of it on my own. But if you play around it enough, you might be able to convince yourself that there are exactly
n inhabitants of
For example, for
Fin ('S 'Z) (indices for a one-item vector), there should be only one inhabitant. And there is! It’s
FS FZ is not a valid inhabitant, because it has type
Fin ('S ('S m)) for some
m, so cannot possibly have the type
Fin ('S 'Z).
Let’s see the inhabitants of
Fin ('S ('S ('S 'Z))) (indices for three-item vectors):
As GHC informs us,
FS (FS (FS FZ)) is not an inhabitant of
Fin ('S ('S ('S 'Z))), which is exactly the behavior we wanted. This is because
FS (FS (FS FZ)) has type
Fin ('S ('S ('S ('S m)))) for some
m, and this can’t fit
Fin ('S ('S ('S 'Z))).
Also, note that there are no inhabitants of
Fin 'Z. There is no constructor or combination of constructors that can yield a value of that type.
Armed with this handy
Fin type, we can do structural type-safe indexing:
Note that our
Fin type structurally precludes us from being able to index into a
Vec 'Z a (an empty vector), because to do that, we would have to pass in a
Fin 'Z…but there is no such value with that type!
Now, generating these requires some more thought. Naively writing a
replicate :: a -> Vec n a is not possible; ideally, we’d want to “pattern match” on our length
n, and use
VNil if it’s
However, we can’t pattern match on types in Haskell, because types are erased at runtime. They’re just used by the compiler to verify your code, but they don’t exist at runtime. So, you can’t just say “do this if
'Z, otherwise do this”.
Recall that, in our previous vector type, we needed to use a
KnownNat n constraint to be able to reflect a
n type down to the value level. We can do something similar using the singletons machinery!
First, we need to get singletons for our
Sing n is a singleton for our
Nat, in that there is only one
Sing n for every
n. So, if we receive a value of type
Sing n, we can pattern match on it to figure out what
n is. Essentially, we can pattern match on
We can now branch depending on what
Basically, we can use a singleton if we ever want to “pattern match” or branch our program’s output based on the type. This is a general rule you will observe as we continue on this article.
Note that because of the inductive nature of our original
Nat type, the singletons are also inductive, as well. This is handy, because then our whole ecosystem remains inductive.
Now, to write
And we can recover our original “implicit” style, with type-inference-driven lengths, using
sing :: SingI n => Sing n:
You can think of
SingI as the “generic singletons” equivalent of
KnownNat lets us reflect out a
GHC.TypeNats.Nat to a
SingI lets us reflect any type that has singletons defined to its corresponding
Sing. Since our new
Nat type has singletons, we basically get a free “
See how useful the whole singletons ecosystem is? :)
Generating with indices
generate using the inductive
Nat is an interesting challenge. It’s actually a fairly standard pattern that comes up when working with inductive types like these. I’m going to leave it as an exercise to the reader – click the link at the top corner of the text box to see the solution, and see how it compares to your own :)
The one thing I will point out is that it is very useful that GHC verifies our code for us, and that we have typed holes to help us develop our code. If we ever don’t know something, we can just use a typed hole
_, and GHC will tell us what type it expects, and what values in scope have that type. It is infinitely useful for situations like this, especially when you are new to this sort of dependently typed inductive programming!
If you ever get stuck, try throwing in a
_ and seeing what types GHC expects…these clues will help you get your bearings!
Between Sized and Unsized
Converting from sized to unsized vectors (to lists) is something that is pretty straightforward, and can be done by just pattern matching on the vector and recursing on the tail. I’ve left it as an excercise to write
Vec n a -> [a].
More interesting is the other way around; our the API of converting unsized to sized vectors will be the same:
But implementing it inductively is also an interesting challenge. See my tip above about typed holes (
_). I recommend taking a break here to try to solve it yourself.
Welcome back! Hope you had a fun time :) Here’s the solution!
To handle the empty list, we just return immediately, giving
f the proper singleton and vector (
VNil). For the non-empty list, first we convert the tail
xs into a vector (
ys) and its corresponding length-singleton (
l), and then we give
f the “correct” length singleton of our complete vector (
SS l) and the correct complete vector (
x :+ ys)
One nice property where (in contrast with our previous non-structural
withVec) is that GHC ensures that the length of the vector we give to
f is actually what we claim it is.
We can create some corresponding example of
exactLength using the exact same process we did before
First, it’d be nice to get a witness for the length of a given vector just from the vector itself:
The type of
vecLength :: Vec n a -> Sing n says that it is possible, from the structure of the vector given alone, to get a witness to its length. And, because the structure of the vector and the structure of the length type are so similar, this is possible! (Note that this is not possible for our non-structural “wrapped”
Vec, without some unsafe operations)
Now, our code will be identical to the code for our wrapped/non-structural vectors, using
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec-2/VecInductive.hs#L117-L123 exactLength_ :: Sing m -> Vec n a -> Maybe (Vec m a) exactLength_ sM v = case sM %~ vecLength v of Proved Refl -> Just v Disproved _ -> Nothing exactLength :: SingI m => Vec n a -> Maybe (Vec m a) exactLength = exactLength_ sing
It’s nice that this is exactly the same as before, and that’s a testament to how useful the singletons library is at unifying all of these distinct type-level stuffs.
We could also write
exactLength in a cute way by inducting on the length we want and the vector, so it might be fun to look at this version instead –
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec-2/VecInductive.hs#L125-L135 exactLengthInductive_ :: Sing m -> Vec n a -> Maybe (Vec m a) exactLengthInductive_ = \case SZ -> \case VNil -> Just VNil _ :+ _ -> Nothing SS l -> \case VNil -> Nothing x :+ xs -> (x :+) <$> exactLengthInductive_ l xs exactLengthInductive :: SingI m => Vec n a -> Maybe (Vec m a) exactLengthInductive = exactLengthInductive_ sing
This is another way you can take advantage of the structure of the length type. Here, we explicitly take advantage of the inductive structure of the
Nat type and how it matches with the structure of the
Vec type, and do bold things with it!3
But I digress. Like in the last section, checking for a given length is literally the least interesting property you can check for. But, again, the same process is usable here: find a way to get your witness, and then pattern match on that witness.
For example, we can make a witness that
n is less than or equal to
m, as well as a way to construct such a witness:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec-2/VecInductive.hs#L137-L149 data LTE :: Nat -> Nat -> Type where LEZ :: LTE 'Z n LES :: LTE n m -> LTE ('S n) ('S m) isLTE :: Sing n -> Sing m -> Decision (LTE n m) isLTE = \case SZ -> \_ -> Proved LEZ SS n -> \case SZ -> Disproved $ \case -- EmptyCase SS m -> case isLTE n m of Proved l -> Proved $ LES l Disproved p -> Disproved $ \case LES l -> p l
So, it is impossible to construct an
LTE n m if
n is not less than or equal to
m. I dare you to try!
We can write code to check for this property in our vectors:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec-2/VecInductive.hs#L151-L157 atLeast_ :: Sing n -> Vec m a -> Maybe (LTE n m, Vec m a) atLeast_ sN v = case isLTE sN (vecLength v) of Proved l -> Just (l, v) Disproved _ -> Nothing atLeast :: SingI n => Vec m a -> Maybe (LTE n m, Vec m a) atLeast = atLeast_ sing
atLeast_ sN will only return our vector if its length is at least the length of the length indicated by
sN. Basically, we check if our vector is “at least” a certain length.
We can write a function that can “take” an arbitrary amount from a vector, given (via proof) that the vector has at least that many elements:
And, we can combine that with our
atLeast function, to be able to take (maybe)4 from any vector:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/fixvec-2/VecInductive.hs#L165-L169 takeVecMaybe_ :: Sing n -> Vec m a -> Maybe (Vec n a) takeVecMaybe_ sN v = uncurry takeVec <$> atLeast_ sN v takeVecMaybe :: SingI n => Vec m a -> Maybe (Vec n a) takeVecMaybe v = uncurry takeVec <$> atLeast v
In the Real World
This type is more like a list than a vector, so it’s in a bit of an awkward position, utility-wise. You usually chose a list over a vector in Haskell when you want some sort of lazy streaming, but the cases where you want to lazily stream something and you know exactly how many items you want to stream are admittedly a bit rare. GHC can’t handle infinite
Vecs, so there’s that, too. For “containers”, vector is great, so the non-structural
Vec is seen a lot more.
However, if you are working with a lot of other inductive types,
Vec works very naturally alongside them. It makes sense, then, that a “canonical” package offering
Vec is type-combinators, an actively maintained library with loads of useful inductive types for type-level programming, exporting its own
Sing-equivalents. If I am doing the sort of type-level programming that
Vec is useful for, chances are I already have type-combinators imported. This is the library that I personally suggest if you want to use this
Vec in the real world.
There’s obviously more to look at, and much more we can do with fixed-length vectors and inductive types. And, there will definitely be more issues that come up when you start working with these in the real world, with real applications.
If you plan on moving into learning about dependent types, I hope that guide would be a good launching point. But if all you wanted to do was learn how to use fixed-length vectors effectively in Haskell…hopefully after reading this, you have confidence to work with these things directly, and to know what to google if anything else comes up :)
Feel free as always to leave a comment or a tweet, or find me the freenode
#haskell channel, as jle`. I always welcome feedback, suggestions, or questions!
Users who are used to GHC 8.0 and below might remember
Natcoming from GHC.TypeLits. Well, GHC 8.2 is here,
TypeNatsis in. The difference is that, in
Natkind reifies/reflects with
Natkind reifies/reflects with
For singletons > 2.3
toSinggive and take
Naturalwhen going to
Nat. However, for 2.3.1 and below, they give/take
Note, however, that if you unroll the definition of
Nat, you pretty much get the exact same thing.↩
Remember the whole point of this exercise — that the
Maybeis required only in the completely polymorphic case, where we get our lengths at runtime and don’t know them at compile-time. If we knew
mat compile-time, and knew that
nwas less than or equal to
m, we could construct an
LTE n mand call
takeVecdirectly, and not return a