Shake: Task Automation and Scripting in Haskell

by Justin Le ♦ Tuesday September 17, 2013

Source ♦ Markdown ♦ LaTeX ♦ Posted in Haskell, Tutorials ♦ Comments

As someone who comes from a background in ruby and rake, I’m used to powerful task management systems with expressive dependency. Make is a favorite tool of mine when I’m working on projects with people who don’t use ruby, and when I’m working on ruby projects I never go far without starting a good Rakefile. The two tools provided a perfect DSL for setting up systems of tasks that had complicated file and task dependencies.

As I was starting to learn Haskell and building larger-scale Haskell projects, I began to look for alternatives in Haskell. Was there a Haskell counterpart to Ruby’s rake, Node’s jake? (Not to mention the tools of slightly different philosophy grunt and ant)

It turns out that by far the most established answer is a library known as Shake (maintained by the prolific Neil Mitchell of hoogle fame and much more). So far it’s served me pretty well. Its documentation is written from the perspective of chiefly using it as a build tool (more “make” than “rake”), so if you’re looking to use it as a task management system, you might have to do some digging. Hopefully this post can help you get started.

I also go over the core concepts of a task management system, so I assume no knowledge of make; this post therefore should also be a good introduction to starting with any sort of task management system.

Our Sample Project

Our sample project is going to be a report build system that builds reports written in markdown with pandoc into html, pdf, and doc formats. This is honestly one of my most common use cases for make, so porting it all to shake will be something useful for me.

The final directory structure will look like this:

img

img1.jpg

img2.jpg

out

report.doc

report.html

report.pdf

src

report.md

css

report.css

Shakefile

When we run shake, we want to build report.doc and report.pdf if report.md or any of the images have changed, and report.html if report.md, report.css, or any of the images have changed.

Furthermore, img2.jpg actually comes from online, and requires us to re-download it every time we compile to make sure it is up to date.

Setup

Installing Shake

Installing shake is as simple as installing any other cabal package:

$ cabal update
$ cabal install shake

I’ll will be using shake-0.10.6 for this post.

Setting up the Shakefile

We set up our Shakefile with a simple scaffold:

-- Shakefile

import Development.Shake

opts = shakeOptions { shakeFiles    = ".shake/" }        -- 1

(~>) = phony                                             -- 2
                                                         -- (obsolete)

main :: IO ()
main = shakeArgs opts $ do
    want []

    "clean" ~> removeFilesAfter ".shake" ["//*"]

On my machine I’ve set this up to be generated by a bash script called “shakeup”, so I can start a project up on a Shakefile by simply typing shakeup at the project root.

Some notes:

Store shake’s metadata files to the folder .shake/. This differs from the default behavior, where all files would be saved to the root directory with .shake as a filename prefix.
I’ve aliased the operator ~> for phony to allow for a more expressive infix notation — more on this later. I’ve submitted a patch to the project and it should be included in the next cabal release.

Edit: As of the 0.10.7 release of Shake, this is no longer needed, as ~> is included in the library.

What is a Rule?

If you haven’t used make before, it is important that you understand the key concepts before moving on.

A task management system/build system is a system that works to ensure that all files in the project are “up to date”. In our case, our system will ensure that the files in the out directory are up to date.

In order to do this, files are given “rules”. Rules specify:

What other files/rules this file “depends” on
Instructions to execute to make this file up to date (or to create the file), if it is not already up to date or created.

A file or rule is out of date if any of its dependencies are out of date or if the file it indicates is either not created or has been updated since the last time the task management system has run. When this happens, the guilty dependencies are updated using their own rules. Afterwards, the file’s own instructions are executed.

If a file has no rule, “out of date” simply means that it has been updated or changed since the last time the task management system has run, or it does not exist. If it has, then all files or rules that depend on it are also out of date.

A good task management system is smart enough to keep track of what is up to date and what isn’t. If multiple rules all have one dependency, that dependency might be checked and updated every single time. For example, all of our builds in this sample project require img2.jpg to be downloaded afresh from online. A naive build system might re-download img2.jpg for every single build, instead of once for all three.

File Rules

Let’s set up src/report.md with a simple markdown document on our new project:

<!-- src/report.md -->

Report
======

This is a report.  Render me!

![first image](img/img1.jpg)
![second image](img/img2.jpg)

Our project tree should look like this at this point:

img

img1.jpg

img2.jpg

out

src

report.md

template

Shakefile

Let’s set up our first rule – rendering out/report.doc if report.md has changed.

"out/report.doc" *> \f -> do
    need ["src/report.md","img/img1.jpg","img/img2.jpg"]
    cmd "pandoc" [ "src/report.md", "-o", f ]

This is equivalent to the Makefile rule:

out/report.doc: src/report.md
	pandoc src/report.md -o out/report.doc

The operator *> attaches an Action (with a parameter) to a FilePattern (a string) – that is, when shake decides that it needs that specified file on the left hand side to be up to date, it runs the action on the right hand side with that filename as a parameter.

To be clear, the right hand side is of type:

rightHandSide :: FilePattern -> Action ()

where the FilePattern is the filename of the file that is being “needed”.

The need function specifies all of the dependencies of that action. If shake decides it needs out/report.doc to be up to date, need tells it that it first needs src/report.md and the images to be up to date – or rather, that out/report.doc is only out of date if src/report.md or the images are out of date, or have changed since the last build.

With this in mind, let us write the rest of our file rules:

-- Shakefile

"out/report.doc" *> \f -> do
    need ["src/report.md","img/img1.jpg","img/img2.jpg"]
    cmd "pandoc" [ "src/report.md", "-o", f ]

"out/report.pdf" *> \f -> do
    need ["src/report.md","img/img1.jpg","img/img2.jpg"]
    cmd "pandoc" [ "src/report.md", "-o", f, "-V", "links-as-notes" ]

"out/report.html" *> \f -> do
    need [ "src/report.md"
         , "img/img1.jpg"
         , "img/img2.jpg"
         , "css/report.css" ]
    cmd "pandoc" [ "src/report.md", "-o", f, "-c", "css/report.css", "-S" ]

"img/img2.jpg" *> \f -> do
    cmd "wget" [ "http://example.com/img2.jpg", "-O", f ]

And that is it!

Running Shake

How do we tell shake what file it is that we want to be up to date? We specify this by modifying the line want []:

want ["out/report.doc","out/report.pdf","out/report.html"]

That tells shake that when we run main with no arguments, we want those three files to be checked to be up to date.

Now, to wrap it all together, we run:

$ runhaskell Shakefile

And let the magic happen!

I run this enough times that I like to alias this:

# in ~/.bashrc
alias shake=runhaskell Shakefile

Note that want specifies the default “wants”. You can specify your own collection by passing a parameter:

$ runhaskell Shakefile out/report.doc

Wildcards

You may have noticed that even though we had multiple images in the img folder, we required them all explicitly. This could cause problems. What if in the future, our documents used more images?

We can define wildcards using shake’s getDirectoryFiles, which returns results of a wildcard search in an Action monad. getDirectoryFiles takes a directory base and a list of wildcards.

-- Shakefile

srcFiles :: Action [FilePath]
srcFiles = getDirectoryFiles ""
    [ "src/report.md"
    , "img/*.jpg" ]

main :: IO ()
main = shakeArgs opts $ do
    want ["out/report.doc","out/report.pdf","out/report.html"]

    "out/report.doc" *> \f -> do
        deps <- srcFiles
        need deps
        cmd "pandoc" [ "src/report.md", "-o", f ]

    "out/report.pdf" *> \f -> do
        deps <- srcFiles
        need deps
        cmd "pandoc" [ "src/report.md", "-o", f, "-V", "links-as-notes" ]

    "out/report.html" *> \f -> do
        deps <- srcFiles
        need $ "css/report.css" : deps
        cmd "pandoc" [ "src/report.md", "-o", f, "-c", "css/report.css", "-S" ]

    "img/img2.jpg" *> \f -> do
        cmd "wget" [ "http://example.com/img2.jpg", "-O", f ]

    "clean" ~> removeFilesAfter ".shake" ["//*"]

If you are comfortable with monadic operators, you can make it all happen on one line:

"out/report.doc" *> \f -> do
    need =<< srcFiles
    cmd "pandoc" [ "src/report.md", "-o", f ]

Phony Rules

Now, you might sometimes want rules that are “just tasks” that don’t relate to creating a specific file. That is, they still depend on other files or rules and are triggered to update when their dependencies are out of date, but they just aren’t about building files.

For example, what if you wanted a task build-some, which builds only report.pdf and report.doc, and outputs a proverb to the command line?

One thing you can do is to simply use a rule with a name that does not correspond to any file:

-- Bad
"build-some" *> \_ -> do
    need ["out/report.pdf","out/report.doc"]
    cmd "fortune" [""]

However, this is kind of an inelegant solution. There really actually is not a file build-some. Also, if someone ever decides to create a file called build-some, you’ll find that this rule never gets run.

The best way is to create a “phony” rule, which is a rule that is not tied to a file. This is the reason for the alias I specified at the beginning of the post:

-- Good
"build-some" ~> do
    need ["out/report.pdf","out/report.doc"]
    cmd "fortune" [""]

And voilà!

Cleanup

You might have noticed the phony rule in the scaffold Shakefile:

"clean" ~> removeFilesAfter ".shake" ["//*"]

If you run shake clean, it will remove all files in the .shake/ directory after the rule has completed its execution. removeFilesAfter removes the files in the given base directory (.shake) matching the given wildcards (["//*"]) after all rules have completed their course.

This is useful for cleaning up shake’s metadata files after you are done with your build, or if you want to run the task management system on a clean start.

Completed File

-- Shakefile
{-# OPTIONS_GHC -fno-warn-wrong-do-bind #-}

import Control.Applicative ((<$>))
import Development.Shake

opts = shakeOptions { shakeFiles    = ".shake/" }

main :: IO ()
main = shakeArgs opts $ do
    want ["out/report.doc","out/report.pdf","out/report.html"]

    "build-some" ~> do
        need ["out/report.pdf","out/report.doc"]
        cmd "fortune" [""]

    "out/report.doc" *> \f -> do
        need <$> srcFiles
        cmd "pandoc" [ "src/report.md", "-o", f ]

    "out/report.pdf" *> \f -> do
        need <$> srcFiles
        cmd "pandoc" [ "src/report.md", "-o", f, "-V", "links-as-notes" ]

    "out/report.html" *> \f -> do
        deps <- srcFiles
        need $ "css/report.css" : deps
        cmd "pandoc" [ "src/report.md", "-o", f, "-c", "css/report.css", "-S" ]

    "img/img2.jpg" *> \f -> do
        cmd "wget" [ "http://example.com/img2.jpg", "-O", f ]

    "clean" ~> removeFilesAfter ".shake" ["//*"]

srcFiles :: Action [FilePath]
srcFiles = getDirectoryFiles ""
    [ "src/report.md"
    , "img/*.jpg" ]

Wrapping Up

If you look at the Shake Documentation, you will find a lot of ways you can build complex networks of dependencies.

Hopefully there are enough use cases here to be useful in general applications.

Monadic Tricks

Because everything is Haskell, you can easily generate rules using your basic monad iterators by taking advantage of Haskell’s extensive standard library of monad functions. For example, if you want to generate multiple reports, you can use forM_:

let reports = ["report1", "report2", "report3"]

want $ (\s f -> "out/" ++ s ++ "." ++ f) <$>
    ["report1","report2","report3"] <*> ["doc","pdf","html"]

forM_ ["report1","report2","report3"] $ \reportName -> do
    let
        outBase = "out/" ++ reportName
        srcName = "src/" ++ reportName ++ ".md"

    outBase ++ ".doc" *> \f -> do
        need <$> srcFiles
        cmd "pandoc" [ srcName, "-o", f ]

    outBase ++ ".pdf" *> \f -> do
        need <$> srcFiles
        cmd "pandoc" [ srcName, "-o", f, "-V", "links-as-notes" ]

    outBase ++ ".html" *> \f -> do
        deps <- srcFiles
        need $ "css/report.css" : deps
        cmd "pandoc" [ srcName, "-o", f, "-c", "css/report.css", "-S" ]

Note however that you can get the same thing by just using wildcards (with takeFileName). But this is just an example, feel free to let your imagination roam!

Looking Forward

We’ve seen how Shake is good at setting up systems for managing and executing dependencies. This is good for running simple system commands. However, there is a lot more about scripting and task automation than managing dependencies.

For example, almost everything we’ve done can be done with a simple Makefile. What does Haskell offer to the scripting scene?

Strong Typing

As you’ll know, one of the magical things about Haskell is that because of its expressive strong typing system, you leave the debugging to the compiler. If it compiles, it works exactly the way you want!

This is pretty lacking in the bare-bones system we have in place now. Right now we are just firing off arbitrary system commands that are basically specified in strings with no type of typing. We can compile anything, whether there are bugs in it or not.

Luckily Shake is very good at integrating seamlessly with any kind of framework. We can leave this up to other frameworks.

One popular framework for this that is gaining in maturity is Shelly (A fork of an older project that is an ongoing Yesod Project experiment), but you are welcome to using your own. At the present Haskell is still developing and growing in this aspect. I hope to eventually write an article about Shelly integration with Shake.

Other

These are just some ways to think about using Shake in new more creative ways. Let me know if you think of any clever integrations in the comments!