Enhancing Functor Structures Step-By-Step (Part 1)
by Justin Le ♦
A style of Haskell programming that I’ve been pretty excited about with over the past two years or so is something that I can maybe call a “functor structure” design pattern. In this post we’re going to be exploring the idea of enhancing normal data types with different types of functor structures step-by-step, by starting with a simple useful structure and enhancing it piece by piece in order to reap incremental benefits. This process reflects a lot of the way I personally work through these things — I normally don’t get the whole powerful structure all the way; instead I incrementally add things as I see how things fit together.
We’re going build the tools to describe a data type schema, which can represent algebraic data types — sums and products. We’ll start off just building things we can use to describe the schema (by printing out documentation), and by the end of the journey we’ll also be able to use our schema to generate parsers and serializers through json.
This interest in functor structures culminated in my Functor Combinatorpedia post last year and the functor-combinators library. But personally I had never really explored the less commonly used lowercase-f functor abstractions in Hask — contravariant functors and invariant functors until recently.
This series is designed for an intermediate Haskeller with familiarity in things like product/sum types, using Applicative
/Alternative
, and monadic parser combinators, and is written in sync with functor-combinators-0.3.6.0.
The Schema
Let’s start with the simplest level of describing our schema: a plain ol’ AST describing the possibilities our schema can take.
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/functor-structures/doc.hs#L9-L31
data Schema =
RecordType [Field]
| SumType [Choice]
| SchemaLeaf Primitive
deriving Show
data Field = Field
fieldName :: String
{ fieldValue :: Schema
,
}deriving Show
data Choice = Choice
choiceName :: String
{ choiceValue :: Schema
,
}deriving Show
data Primitive =
PString
| PNumber
| PBool
deriving Show
Our schema will either represent a record of many different fields, a sum of many different options, or a primitive value. If it’s a sum type, it’ll be described by a list of Choice
, which describes each branch. If it’s a record type, it’ll be described by a list of Field
, which describes each field. If it’s a primitive type, it’ll a Primitive
, which is either a string, number, or boolean.
Our end goal is to be able to write a schema for a type like
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/functor-structures/doc.hs#L33-L36
data Customer =
CPerson { cpName :: String, cpAge :: Int }
| CBusiness { cbEmployees :: Int }
deriving Show
and be able to represent documenting, parsing, and printing it all within Schema
. For our basic Schema
above, this looks like:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/functor-structures/doc.hs#L38-L52
customerSchema :: Schema
= SumType
customerSchema Choice
[ = "Person"
{ choiceName = RecordType
, choiceValue Field { fieldName = "Name", fieldValue = SchemaLeaf PString }
[ Field { fieldName = "Age" , fieldValue = SchemaLeaf PNumber }
,
]
}Choice
, = "Business"
{ choiceName = RecordType
, choiceValue Field { fieldName = "Employees", fieldValue = SchemaLeaf PNumber } ]
[
} ]
And a value like
PCustomer { cpName = "Sam", cpAge = 40 }
might be represented by a json value using our schema like
{ "tag": "Customer",
"contents":
{ "Name": "Sam"
, "Age": 40.0
}
}
Documentation
Using our schema type, let’s make a documentation generator. It’ll take a Schema
and nicely formatted documentation describing the schema itself.
To make our lives easier, we’ll be using the prettyprinter library, which will handle indentation, horizontal and vertical concatenation, and other printing concerns for us.
Let’s build things up by defining documentation generators for our individual types, so they’ll be easier to assemble.
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/functor-structures/doc.hs#L54-L79
schemaDoc :: String -- ^ name
-> Schema -- ^ schema
-> PP.Doc x
fieldDoc :: Field -> PP.Doc x
choiceDoc :: Choice -> PP.Doc x
primDoc :: Primitive -> PP.Doc x
So schemaDoc
will take the name of our type and a schema, and generate a PP.Doc x
, the type of a text document in the prettyprinter library. And fieldDoc
, choiceDoc
, and primDoc
just generate the documentation for each individual field or constructor.
(I’m using x
as the name of the type variable (instead of something more traditional like a
) to indicate that it isn’t meant to be referenced or used anywhere in any consistent way. Just remember it doesn’t mean anything special syntactically!)
The \case
syntax is known as LambdaCase syntax, and \case blah -> blah
is just sugar for \x -> case x of blah -> blah
; we use it extensively here to save us from having to think of a throwaway variable name.
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/functor-structures/doc.hs#L73-L83
fieldDoc :: Field -> PP.Doc x
Field name val) = schemaDoc name val
fieldDoc (
choiceDoc :: Choice -> PP.Doc x
Choice name val) = schemaDoc name val
choiceDoc (
primDoc :: Primitive -> PP.Doc x
= \case
primDoc PString -> "string"
PNumber -> "number"
PBool -> "bool"
Nothing too fancy here — since Field
and Choice
just have a name and a sub-schema, we can have them call schemaDoc
. primDoc
requires making our leaf documentation, so we can just print what type they have.
We tie it all together with schemaDoc
:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/functor-structures/doc.hs#L54-L71
schemaDoc :: String -- ^ name
-> Schema -- ^ schema
-> PP.Doc x
= \case
schemaDoc title RecordType fs -> PP.vsep [
"{" <> title <> "}")
PP.pretty (2 . PP.vsep $
, PP.indent map (\fld -> "*" PP.<+> PP.indent 2 (fieldDoc fld)) fs
]SumType cs -> PP.vsep [
"(" <> title <> ")")
PP.pretty ("Choice of:"
, 2 . PP.vsep $
, PP.indent map choiceDoc cs
]SchemaLeaf p -> PP.pretty (title <> ":")
PP.<+> primDoc p
Here we use PP.vsep
, which takes a list of docs and concatenates them vertically, PP.<+>
which concatenates two docs horizontally, and PP.indent
which indents things before going down a level. We appropriately call fieldDoc
, choiceDoc
, and primDoc
when we actually need to print one of them.
Hopefully that wasn’t too bad! There were a lot of moving parts because we have a recursive data type, but in the end hopefully each specific branch was self-contained enough to understand on their own. In the end the important thing to take away isn’t the mechanics of document generation, but rather how the data flows. Make sure you at least understand how the functions call each other, and how — this pattern is going to be very consistent across all the schema processors we write!
We can test out our function on customerSchema
, taking advantage of the fact that PP.Doc
’s Show
instance will render the document:
ghci> schemaDoc "Customer" customerSchema
(Customer)
Choice of:
{Person}
* Name: string
* Age: number
{Business}
* Employees: number
It works!
Parsing with Covariance
Now, let’s talk about using our Schema
type to generate a json parser. We want to be able to use our schema type and use it to parse information from a json value, of a given json format we are expecting.
To do this, we’re going to rewrite Schema
to take a type parameter to represent the type we want to parse into. A Schema a
will be a schema that can be used to generate documentation and describe a parser of a
s. In the end, we want customerSchema :: Schema Customer
, and a function like
schemaParser :: Schema a -> Parse ErrType a
to generate a json parser of a
s. We’ll be using the json parser type Parse err a
from aeson-better-errors (not because of the better errors, but just because it’s closer to an actual incremental/stateful parser than other alternatives out there), which can be run with parse :: Parse err a -> ByteString -> Either (ParseError err) a
. So our final interface will look like:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/functor-structures/parse.hs#L154-L155
parseSchema :: Schema a -> ByteString -> Either (A.ParseError ErrType) a
= A.parse (schemaParser sc) parseSchema sc
To do this, we now have to include information on “how to parse an a
” in our schema. “How to parse a record” and “how to parse a sum” are going to be handled universally for all schemas…so the only thing that really will vary from type to type (aside from the structure) is how to parse those primitive leaf types when we encounter them in the json. And so, the main thing we need to modify is just Primitive
:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/functor-structures/parse.hs#L43-L47
data Primitive a =
PString (String -> Maybe a)
| PNumber (Scientific -> Maybe a)
| PBool (Bool -> Maybe a)
deriving Functor
A Primitive a
now encodes a way to parse an a
if given the appropriate json primitive. It can be PString
, PNumber
, or PBool
. To create a “String Parser”, you need to use PString
with a function on “what to do with the string you get”. To create a “Bool parser”, you need PBool
with a function on what to do with the bool you get. Note that the PNumber
parser takes a Scientific
, which is the type aeson (the underlying json library) uses to represent valid JSON numbers (it’s basically Either Integer Double
).
We can write some helper primitives:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/functor-structures/parse.hs#L49-L56
pString :: Primitive String
= PString Just
pString
pInt :: Primitive Int
= PNumber toBoundedInteger
pInt
pBool :: Primitive Bool
= PBool Just pBool
pString :: Primitive String
is the most basic way to parse a primitive json string: just return the String
itself. pInt
needs to reject any non-integer numbers, so toBoundedInteger :: Scientific -> Maybe Int
works well.
We can now start writing our parsers for each branch of Schema
. The SchemaLeaf
branch should be the simplest. We can use aeson-better-error’s primitive value parsers:
-- | Parse successfuly only if the current value is a String, running the
-- validation function
withString :: (String -> Either ErrType a) -> Parse ErrType a
withScientific :: (Scientific -> Either ErrType a) -> Parse ErrType a
withBool :: (Bool -> Either ErrType a) -> Parse ErrType a
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/functor-structures/parse.hs#L131-L138
primParser :: Primitive a -> A.Parse String a
= \case
primParser PString f -> A.withString $
maybe (Left "error validating string") Right . f
PNumber f -> A.withScientific $
maybe (Left "error validating number") Right . f
PBool f -> A.withBool $
maybe (Left "error validating bool") Right . f
Nothing too fancy, mostly plumbing.
Deducing Ap
However, this small change (and adding the type parameter) leaves in a predicament. What should Schema
look like?
At first glance, we might think we could just write
data Schema a =
RecordType [Field a]
| SumType [Choice a]
| SchemaLeaf (Primitive a)
deriving Functor
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/functor-structures/parse.hs#L31-L37
data Field a = Field
data Choice a = Choice
But there’s a problem here: RecordType
is a combination of Field
s, but…each Field
is of a different type! For example, in our Customer
example, the Person
branch has two fields: Name
and Age
. Our name schema would look like nameField :: Field String
, and our age schema would look like ageField :: Field Int
…and so you can’t really put that into a list like [Field a]
since they each have different types. And further more, we want a final Customer
(in our Schema Customer
), a type which is different from both String
and Int
.
What we need is a way to express heterogeneous collection/sequence of Field a
, coupled with a way of “combining” all of them to create an aggregate value of a final type. A type that says “use a bunch of Field
of x
s of different types to generate a final a
”.
There are a couple of ways to arrive at this mystery type. One way is to recognize “combine a bunch of f x
s of different types to create an f b
” is essentially the M.O. of the Applicative abstraction, and so essentially we want to give Field
some sort of Applicative
structure. And so we can reach for “the type that gives something an Applicative
structure”, the free applicative. (This is the strategy I talk about in my Applicative Regular Expressions post: if you know exactly the interface you want, you can just use that interface’s free structure)
Another way is to think about it as an enhancement along a functor combinator described in the functor combinatorpedia. Here we know we want to enhance Field
in a specific way, so we can scan the list of functor combinators until there is one that we need. And scrolling down, we see:
Origin: Control.Applicative.Free / Data.Functor.Apply.Free
Enhancement: The ability to provide multiple
f
s that the interpreter must consume all of. (…)While
ListF
may be considered “multiple options offered”,Ap
can be considered “multiple actions all required”. The interpreter must consume/interpret all of the multiplef
s in order to interpret anAp
.Note that ordering is not enforced: while the consumer must handle each
f
eventually, they are free to handle it in whatever order they desire. In fact, they could even all be handled in parallel. SeeFree
for a version where ordering is enforced.…
Because this has an
Applicative
instance, you can use(<*>) :: Ap f (a -> b) -> Ap f a -> Ap f b
to sequence multipleAp f
s together, andpure :: a -> Ap f a
to produce a “no-op”Ap
without anyf
s.
That sounds like it matches to me! In order to parse a RecordType
, we need to parse every Field
. It doesn’t make any sense to skip one field or the other: they all need to be processed and parsed. This sounds like just the thing we need.
The description here also gives a clue for what we might want to use for SumType
(ListF
sounds like a good companion for the behavior we want sum type parsers to have)
Building Ap
With this, we can write our final Schema
type.
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/functor-structures/parse.hs#L25-L47
data Schema a =
RecordType (Ap Field a)
| SumType (ListF Choice a)
| SchemaLeaf (Primitive a)
deriving Functor
data Field a = Field
fieldName :: String
{ fieldValue :: Schema a
,
}deriving Functor
data Choice a = Choice
choiceName :: String
{ choiceValue :: Schema a
,
}deriving Functor
data Primitive a =
PString (String -> Maybe a)
| PNumber (Scientific -> Maybe a)
| PBool (Bool -> Maybe a)
deriving Functor
Note that I switched from [Choice a]
to ListF Choice a
as hinted earlier — the two are the same, but the latter has the Functor
instance we want (fmap :: (a -> b) -> ListF Choice a -> ListF Choice b
), and is an instance of useful functor combinator typeclasses. Furthermore, it illustrates the symmetry between sum types and record, since Ap
and ListF
are contrasting types: Ap
can be used to a represent the “product” between many required fields, and ListF
can be used to the option between many possible choices. It’s more clear how product types and sum types are “opposites” in a nice clean way.
We can now make our Customer
schema:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/functor-structures/parse.hs#L63-L77
customerSchema :: Schema Customer
= SumType $
customerSchema Choice
inject = "Person"
{ choiceName = RecordType $
, choiceValue CPerson
<$> inject Field { fieldName = "Name", fieldValue = SchemaLeaf pString }
<*> inject Field { fieldName = "Age" , fieldValue = SchemaLeaf pInt }
}<!> inject Choice
= "Business"
{ choiceName = RecordType $
, choiceValue CBusiness
<$> inject Field { fieldName = "Employees", fieldValue = SchemaLeaf pInt }
}
The main new thing is using inject :: Choice a -> ListF Choice a
and inject :: Field a -> Ap Field a
to lift our base types into their appropriate combinators. Then after that, we just use Ap
’s Applicative
instance and ListF
’s Plus
instance to combine them together. Overall it should look very similar to the schema we wrote for the documentation section.
Interpreting Ap
Now, the typical way to “run” an applied functor combinator is with interpreting functions, like:
interpret :: Applicative g => (forall x. f x -> g x) -> Ap f a -> g a
interpret :: Plus g => (forall x. f x -> g x) -> ListF f a -> g a
You can interpret an Ap f a
into any Applicative g
, and you can interpret a ListF f a
into any Plus g
(Plus
is basically Alternative
without an Applicative
requirement, supporting (<!>) :: f a -> f a -> f a
).
Basically, the strategy for using interpret
is that you write a function to interpret any individual f
you might find in the structure, and interpret
will accumulate them all together for you.
In our case, if we decided to use interpret
, we could write:
interpret :: (forall x. Field x -> Parse ErrType x) -> Ap Field a -> Parse ErrType x
interpret :: (forall x. Choice x -> Parse ErrType x) -> ListF Choice a -> Parse ErrType x
Basically, if we have a way to parse each Field
, then we have a way to parse an Ap Field a
. If we have a way to parse each Choice
, then we have a way to parse a ListF Choice a
.
Let’s write those individual parsers for each smaller type:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/functor-structures/parse.hs#L128-L129
fieldParser :: Field a -> A.Parse String a
Field name val) = A.key (T.pack name) (schemaParser val) fieldParser (
Here we use aeson-better-errors’s key :: Text -> Parser a -> Parser a
, which takes a key and a parser, and runs that parser on whatever is under that key. For fieldParser
, we run the schema parser for our sub-schema under that key.
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/functor-structures/parse.hs#L121-L126
choiceParser :: Choice a -> A.Parse String a
Choice name val) = do
choiceParser (<- A.key "tag" A.asString
tag == name) $
unless (tag "Tag does not match"
A.throwCustomError "contents" $ schemaParser val A.key
Our sum type encoding has to be a bit more involved, because json doesn’t have any native sum type construct. The one we’re going to use for this post is the {"tag": <tag>, "contents": <contents>}"
form. We’re going to parse whatever is in the key "tag"
, and if that tag matches our current choice’s constructor, we parse the schema parser for our sub-schema under that key. Otherwise, this choice isn’t what is currently in our json value.
Finally, to bring it all together, we use the interpret
functions we talked about:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/functor-structures/parse.hs#L115-L119
schemaParser :: Schema a -> A.Parse ErrType a
= \case
schemaParser SumType cs -> interpret choiceParser cs
RecordType fs -> interpret fieldParser fs
SchemaLeaf p -> primParser p
And that’s it!
Ah well, not exactly so fast. Even though they could support it, aeson-better-errors doesn’t provide Plus
a for Parse
. We can write them as orphans here just because this is a fun learning experience (but we usually do like to avoid defining instances for types or typeclasses that aren’t ours).
Alt
and Plus
represent fallback choices: x <!> y
will try x
first, then if x
fails, try y
.
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/functor-structures/parse.hs#L108-L111
instance Monad f => Alt (A.ParseT e f) where
<!>) = (A.<|>)
(instance Monad f => Plus (A.ParseT String f) where
= A.throwCustomError "No options were validated" zero
And…that should work!
> :set -XOverloadedStrings
ghci> parseSchema customerSchema "{ \"tag\": \"Person\", \"contents\": { \"Name\": \"Same\", \"Age\": 40 } }"
ghciRight (CPerson {cpName = "Same", cpAge = 30})
> parseSchema customerSchema "{ \"tag\": \"Business\", \"contents\": { \"Employees\": 3 } }"
ghciRight (CBusiness {cbEmployees = 3})
We were able to generate a fully functional parser from our schema, by only providing parsers for the smaller, more specific types we had (Field
and Choice
), and having them all fit together in a way directed by their Applicative
and Plus
typeclass instances.
Direct Structural Inspection
However, sometimes the typeclass instances aren’t really the best way to handle things. It gives us a nice principled shortcut — for example, to interpret out of an Ap
, GHC needs a way to know “how to sequence Parse
s”, and so interpret
uses the Applicative
instance for that. But we know there are usually different ways to sequence or combine actions — famously in IO, we have the option to “sequence” IO actions in series or in parallel, with the default Applicative
instance being series sequencing. So, offloading our logic to a typeclass can be a convenient route, but it’s not necessarily the behavior we want.
In our case, the Plus
instance actually combines failed fallback behavior in an undesirable way: our errors become not too useful, because <!>
always picks the right side’s errors, and we eventually run into A.throwCustomError "No options were validated"
.
> parseSchema customerSchema "{ \"tag\": \"Business\", \"contents\": { \"Employees\": \"Mustard\" } }"
ghciLeft (BadSchema [] (CustomError "No options were validated"))
> parseSchema customerSchema "{ \"tag\": \"Grape\", \"contents\": { \"Color\": \"purple\" } }"
ghciLeft (BadSchema [] (CustomError "No options were validated"))
Since the definition of zero
(which was our fault because we wrote it here as an orphan instance — oops!) always falls back to the same error, this is not very useful!
As we see, interpret
for ListF
, while convenient, isn’t necessarily the best way to tear down a ListF
. Luckily, most functor combinators are just ADTs that we can pattern match and break down and access the structures manually. In the case of ListF
, the structure is pretty simple:
data ListF f a = ListF { runListF :: [f a] }
Our ListF Choice a
is just [Choice a]
. This is something we can work with! Let’s write a better ListF Choice a
processor by working with the list itself.
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/functor-structures/parse.hs#L140-L152
schemaParser2 :: Schema a -> A.Parse ErrType a
= \case
schemaParser2 RecordType fs -> interpret fieldParser fs
SumType cs -> do
let schemaMap = M.fromList
| Choice nm vl <- runListF cs ]
[ (nm, vl) <- A.key "tag" A.asString
tag case M.lookup tag schemaMap of
Nothing -> A.throwCustomError $
"tag " <> tag <> " not recognized: Expected one of "
<> intercalate ", " (M.keys schemaMap)
Just sc -> A.key "contents" (schemaParser2 sc)
SchemaLeaf p -> primParser p
We can use the structure of ListF
to generate a Map
associating any tags with the schemas they are meant to encode. We then parse the tag, look up what schema it represents (if any) and then use that schema under the contents key.
: parseSchema2 customerSchema "{ \"tag\": \"Business\", \"contents\": { \"Employees\": \"Mustard\" } }"
λLeft (BadSchema [ObjectKey "contents",ObjectKey "Employees"] (WrongType TyNumber (String "Mustard")))
: parseSchema2 customerSchema "{ \"tag\": \"Grape\", \"contents\": { \"Color\": \"purple\" } }"
λLeft (BadSchema [] (CustomError "tag Grape not recognized: Expected one of Business, Person"))
Much better messages!
Backporting documentation
Remember that the whole point of this exercise was to add functionality to our schema. That means we also have to upgrade our documentation function as well.
Hopefully it is clear from the structure of our data type that we haven’t lost any information. Updating our documentation generator should be just a matter of changing how to we get the items from our ListF
and Ap
.
Following what we just learned, one way to do this would be to use interpret
or manually pattern match and take advantage of the structure. However, if we just want to get a list of monomorphic items from a functor combinator, there’s an abstraction in functor-combinators that gives you a “higher-order” version of toList
called htoList
:
htoList :: (forall x. f x -> b) -> ListF f a -> [b]
htoList :: (forall x. f x -> b) -> Ap f a -> [b]
Give it a function to “get” a b
out of every f
, it collects the b
from every f
inside the structure and puts it in a list for us. Note that this type is very similar to the map
we used earlier:
-- what we used before
map :: ( Field -> b) -> [Field] -> [b]
-- what we can use now
htoList :: (forall x. Field x -> b) -> Ap Field a -> [b]
So it looks like htoList
should work as a drop-in replacement for map
…
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/functor-structures/parse.hs#L79-L106
schemaDoc :: String -- ^ name
-> Schema x -- ^ schema
-> PP.Doc a
= \case
schemaDoc title RecordType fs -> PP.vsep [
"{" <> title <> "}")
PP.pretty (2 . PP.vsep $
, PP.indent -> "*" PP.<+> PP.indent 2 (fieldDoc fld)) fs
htoList (\fld
]SumType cs -> PP.vsep [
"(" <> title <> ")")
PP.pretty ("Choice of:"
, 2 . PP.vsep $
, PP.indent
htoList choiceDoc cs
]SchemaLeaf p -> PP.pretty (title <> ":")
PP.<+> primDoc p
where
fieldDoc :: Field x -> PP.Doc a
Field name val) = schemaDoc name val
fieldDoc ( choiceDoc :: Choice x -> PP.Doc a
Choice name val) = schemaDoc name val
choiceDoc ( primDoc :: Primitive x -> PP.Doc a
= \case
primDoc PString _ -> "string"
PNumber _ -> "number"
PBool _ -> "bool"
Neat, we just had to replace map (\fld -> ..) fs
with htoList (\fld -> ...) fs
, and map choiceDoc cs
with htoList choiceDoc cs
. We were able to re-use the exact same logic — we lose no power and upgrading was a straightforward mechanical transformation.
Contravariant Consumption
Now, let’s consider instead the situation where we would want to serialize an a
with a schema. We’ll make a type Schema a
that represents something that can encode an a
as a json value. The overall interface of using that type would be:
schemaToValue :: Schema a -> a -> Aeson.Value
(Aeson.Value
being the json representation from the aeson library)
To keep things simple, let’s forget all the parsing stuff for now; we’ll add it back in later. Let’s just create a type that can only serialize by enhancing our documentation schema.
Again, for the same reasons as before, we can get away with the only fundamental change being at the leaves/primitives. Our structure itself is defined by the ADT, and all of the variations outside of the structure itself comes from how each leaf is serialized.
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/functor-structures/serialize.hs#L36-L39
data Primitive a =
PString (a -> String)
| PNumber (a -> Scientific)
| PBool (a -> Bool)
A Primitive a
will be a way to serialize a json primitive — it can be PString
, PNumber
, or PBool
. To create a “String Serializer”, you need to use PString
with a function on “how to turn it into a String
”. To create a “Bool parser”, you need PBool
with a function on what how to turn the value into a String
.
Again, it can be useful to add some helper primitives:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/functor-structures/serialize.hs#L58-L65
pString :: Primitive String
= PString id
pString
pInt :: Primitive Int
= PNumber fromIntegral
pInt
pBool :: Primitive Bool
= PBool id pBool
pString :: Primitive String
is the most basic way to serialize a primitive json string: just return the String
itself. pInt
needs to serialize the Int
into a Scientific
(the numeric type of the aeson library).
We can start off by writing the serializer for Primitive
just go get a feel for how our serializer will work:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/functor-structures/serialize.hs#L138-L142
primToValue :: Primitive a -> a -> Aeson.Value
= \case
primToValue PString f -> \x -> Aeson.String (T.pack (f x))
PNumber f -> \x -> Aeson.Number (f x)
PBool f -> \x -> Aeson.Bool (f x)
Again, nothing too fancy — mostly plumbing along the aeson library’s primitive constructors.
Covariance vs Contravariance
Before we go further, let’s take a moment to pause and discuss the difference between covariant and contravariant functors, and the usefulness of those concepts. “Covariant” functors (or capital-F Functor
s in Haskell) are functors f
where you can consider f a
as a “producer” of a
— for example, Schema a
from our parsing section is a thing you can use to parse/produce an a
out of a bytestring. These are things where it makes sense to fmap :: (a -> b) -> f a -> f b
: if you have a producer of a
s, you can always “post-filter” the result with an a -> b
to get a producer of b
s.
“Contravariant” functors (Contravariant
in Haskell) are functors f
where you can consider f a
as a “consumer” of a
. For example, Primtive a
(and the Schema a
we want to make) from our serializing section is something that consums a
s and produces json values. These are things where it makes sense to contramap
:
class Contravariant f where
contramap :: (a -> b) -> f b -> f a
which says: if you have a consumer of b
s, you can always “pre-filter” the input with an a -> b
to get a consumer of a
s.
Deducing Dec
Now, back on to building our Schema
type. Again, we might want to write something like
data Schema a =
RecordType [Field a]
| SumType [Choice a]
| SchemaLeaf (Primitive a)
deriving Functor
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/functor-structures/serialize.hs#L26-L31
data Field a = Field
data Choice a = Choice
However, we have a problem here (incidentally, it’s the opposite of the problem we had in the previous case). Choice a
doesn’t quite make sense as the sum type consumer for Schema a
, because each Choice
is only meant to handle the types in a specific branch. For example, in our Customer
example, for the CPerson
branch we need a Choice (String, Int)
to consume its contents, and in the CBusiness
branch we need a Choice Int
to consume its contents.
What we need is a way to express a hetereogenous collection/sequence of Choice a
, coupled with a way of “choosing” exactly one of them to handle one form that our input a
can take. A type that says “use exactly one of a bunch of Choice
s of different x
s, and choose one to dispatch depending on what a
we get”. So how do we find the tool we need?
If you are already familiar with contravariant abstractions (but who is?) you might recognize this as the essence of the Decidable typeclass, from the contravariant library…or more accurately, “Decidable without a Divisible constraint”, which is Conclude. A Conclude f
allows you to combine two f
values, and one will be picked to use based on inspection of the input value. Upon recognizing this, we look for find a way to give Choice
some Conclude
interface and search up “the type that gives us a free Conclude
structure”. Following that search, we arrive at Dec
, and so we use Dec Choice a
for our sum type consumer.
But let’s say you’re like the vast majority of Haskell users and have never had any reason to look at the contravariant abstraction hierarchy. How would you think of this?
Like before, we could also look through the functor combinatorpedia (in specific, the contravariant section) and find:
Enhancement: The ability to provide multiple
f
s, one of which will be chosen to consume the overall input.If
f x
is a consumer ofx
s, thenDec f a
is a consumer ofa
s that does its job by choosing a single one of thosef
s to handle that consumption, based on whata
is received.Contrast this with
Div
, where the multiplef
actions are all used to consume the input.Dec
only uses one singlef
action to consume the input, chosen at consumption time.For example, let’s say you had a type
Socket a
which represents some IO channel or socket that is expecting to receivea
s. ADec Socket b
would be a collection of sockets that expects a singleb
overall, and will pick exactly one of thoseSocket
s to handle thatb
.
Sounds like exactly what we need! It also gives us a nice hint of what we might want to use for RecordType
.
Building Dec
With this, we can write our final Schema
type.
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/functor-structures/serialize.hs#L21-L39
data Schema a =
RecordType (Div Field a)
| SumType (Dec Choice a)
| SchemaLeaf (Primitive a)
data Field a = Field
fieldName :: String
{ fieldValue :: Schema a
,
}
data Choice a = Choice
choiceName :: String
{ choiceValue :: Schema a
,
}
data Primitive a =
PString (a -> String)
| PNumber (a -> Scientific)
| PBool (a -> Bool)
Note that I switched from [Field a]
to Div Field a
— the two are the same (Div Field a
is essentially a newtype wrapper over [Field a]
), but the latter has useful functor combinator typeclass instance methods like interpret
(like ListF
before)1. And, again, I feel like it illustrates the symmetry between sum types and record types; Div
and Dec
are opposite types, as Dec
represents a contravariant choice between different choices, and Div
represents a contravariant merger between different consumers. It makes more clear the duality between product types and sum types.
We can assemble our Customer
schema, in a way that looks a lot like our parser schema:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/functor-structures/serialize.hs#L72-L87
customerSchema :: Schema Customer
= SumType $
customerSchema case CPerson x y -> Left (x, y); CBusiness x -> Right x)
decide (\Choice
(inject = "Person"
{ choiceName = RecordType $ divided
, choiceValue Field { fieldName = "Name", fieldValue = SchemaLeaf pString })
(inject Field { fieldName = "Age" , fieldValue = SchemaLeaf pInt })
(inject
}
)Choice
(inject = "Business"
{ choiceName = RecordType $
, choiceValue Field { fieldName = "Age" , fieldValue = SchemaLeaf pInt }
inject
} )
Here we use a few contravariant combinators to combine and merge contravariant functors values (like Div Field a
and Dec Choice a
):
decide
works like:
decide :: Conclude f
=> (a -> Either b c) -- ^ break into branches
-> f b -- ^ handle first branch
-> f c -- ^ handle second branch
-> f a -- ^ overall handler
decide :: (Customer -> Either (String, Int) Int) -- ^ break into branches
-> Dec Choice (String, Int) -- ^ handle CPerson branch
-> Dec Choice Int -- ^ handle CBusiness branch
-> Dec Choice Customer
And divided
works like:
divided :: Divisible f
=> f a -- ^ first handler
-> f b -- ^ second handler
-> f (a, b) -- ^ merged handler
divided :: Div Field String -- ^ handle the cpName field
-> Div Field Int -- ^ handle the cpAge field
-> Div Field (String, Int) -- ^ handle both together
Interpreting Dec
To write our schema serializers, we can use interpret
again:
interpret :: Divisible g => (forall x. Field x -> g x) -> Div Field a -> g a
interpret :: Conclude g => (forall x. Choice x -> g x) -> Dec Choice a -> g a
But, what should we choose as our choice of g
?
choiceToValue :: Choice a -> g a
Well, how do we want to “use” a Choice a
? Remember that Schema a
encodes a way to serialize an a
to an json value. A Choice a
would encode a way to serialize an a
into a json value. We want to turn a Choice a
into an a -> Aeson.Value
:
choiceToValue :: Choice a -> (a -> Aeson.Value)
-- is supposed to match up with
choiceToValue :: Choice a -> g a
So, we need to pick some g
where g a
is a -> Aeson.Value
. This is exactly Op
from Data.Functor.Contravariant, in base:
data Op r a = Op { getOp :: a -> r }
So, if we write
choiceToValue :: Choice a -> Op Aeson.Value a
-- a newtype wrapper away from
choiceToValue :: Choice a -> a -> Aeson.Value
then we have
choiceToValue :: Dec Choice a -> Op Aeson.Value a
interpret
-- a newtype wrapper away from
choiceToValue :: Dec Choice a -> a -> Aeson.Value interpret
Let’s write it!
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/functor-structures/serialize.hs#L128-L132
choiceToValue :: Choice a -> Op Aeson.Value a
Choice name val) = Op $ \x -> Aeson.object
choiceToValue ("tag" Aeson..= T.pack name
[ "contents" Aeson..= schemaToValue val x
, ]
Now onto RecordType
’s Div Field
. Here, we want to build an object using Aeson.object :: [Aeson.Pair] -> Aeson.Value
(one way that the aeson library allows us to build objects). Therefore, our type for fieldToValue
should be:
fieldToValue :: Field a -> a -> [Aeson.Pair]
This looks familiar; it’s the same thing as before, but with Op [Aeson.Pair]
instead of Op Aeson.Value
.
fieldToValue :: Field a -> Op [Aeson.Pair] a
fieldToValue :: Div Field a -> Op [Aeson.Pair] a
interpret
-- a newtype wrapper away from
fieldToValue :: Div Field a -> a -> [Aeson.Pair] interpret
We can go ahead and write it out, actually:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/functor-structures/serialize.hs#L134-L136
fieldToValue :: Field a -> Op [Aeson.Pair] a
Field name val) = Op $ \x ->
fieldToValue (Aeson..= schemaToValue val x] [T.pack name
(Note that this behavior relies on the fact that the interpret
instance for Div
— using the Divise
instance for Op r
— will combine the [Aeson.Pair]
list monoidally, concatenating the results of calling fieldToValue
on every Field
in the Div Field a
.)
We should now have enough to write our entire serializer:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/functor-structures/serialize.hs#L119-L126
schemaToValue :: Schema a
-> a
-> Aeson.Value
= \case
schemaToValue SumType cs -> getOp (interpret choiceToValue cs)
RecordType fs -> Aeson.object . getOp (interpret fieldToValue fs)
SchemaLeaf p -> primToValue p
Running our schemaToValue
on a sample Person
gives the json value we expect:
ghci> Aeson.encode (schemaToValue customerSchema (CPerson "Sam" 40))
{"tag":"Person","contents":{"Age":40,"Name":"Sam"}}
Some Convenience
Note that this contravariant interpretation pattern (wrapping in Op
and then unwrapping it again to run it) is so common that functor-combinators has a helper function to make things a bit neater:
iapply :: (forall x. f x -> x -> b) -> Dec f a -> a -> b
ifanout :: (forall x. f x -> x -> b) -> Div f a -> a -> [b]
With these we could write
choiceToValue :: Choice a -> a -> Aeson.Value
fieldToValue :: Field a -> a -> Aeson.Pair
And then:
choiceToValue :: Dec Choice a -> a -> Aeson.Value
iapply fieldToValue :: Div Field a -> a -> [Aeson.Pair] ifanout
Backporting documentation
Because our new structure is pretty much the same as before (data types wrapped by functor combinators), and Div
/Dec
support htoList
just like Ap
/ListF
did before, the implementation of schemaDoc
is pretty much word-for-word identical as it was for our parser schema:
-- source: https://github.com/mstksg/inCode/tree/master/code-samples/functor-structures/serialize.hs#L89-L116
schemaDoc :: String -- ^ name
-> Schema x -- ^ schema
-> PP.Doc a
= \case
schemaDoc title RecordType fs -> PP.vsep [
"{" <> title <> "}")
PP.pretty (2 . PP.vsep $
, PP.indent -> "*" PP.<+> PP.indent 2 (fieldDoc fld)) fs
htoList (\fld
]SumType cs -> PP.vsep [
"(" <> title <> ")")
PP.pretty ("Choice of:"
, 2 . PP.vsep $
, PP.indent
htoList choiceDoc cs
]SchemaLeaf p -> PP.pretty (title <> ":")
PP.<+> primDoc p
where
fieldDoc :: Field x -> PP.Doc a
Field name val) = schemaDoc name val
fieldDoc ( choiceDoc :: Choice x -> PP.Doc a
Choice name val) = schemaDoc name val
choiceDoc ( primDoc :: Primitive x -> PP.Doc a
= \case
primDoc PString _ -> "string"
PNumber _ -> "number"
PBool _ -> "bool"
Neat!
Looking Forward
We first started with a simple structure to represent our schema. We then added covariant capabilities to get us parser generation. Then we added contravariant capabilities to get us serializers.
The next step might be to add both enhancements to the same structure! The benefits for this seem pretty significant: we can write our structure once (less code, less bugs), and we also write our serializer, parser, and documenting functions in a way that are automatically kept in-sync, and can never be incompatible with each other. Solving the documentation rot and mismatched parser/serializer problem in one stroke!
For this, we’ll wait until the next post, where we explore not one, but two ways to combine our two capabilities into something known as an invariant functor!
Proceed to the next post here!
Special Thanks
I am very humbled to be supported by an amazing community, who make it possible for me to devote time to researching and writing these posts. Very special thanks to my supporter at the “Amazing” level on patreon, Josh Vera! :)
And, if we want it, it has the more useful
Contravariant
instance:contramap :: (a -> b) -> Div Field b -> Div Field a
.↩︎