Applicative Functors for Fun and Parsing

PSA: This post has a bunch of Haskell code, but I’m going to try to make it more broadly accessible. Let’s see how that goes.

I’ve been proceeding apace with my 3rd year in Abhinav’s Haskell classes at Nilenso, and we just got done with the section on Applicative Functors. I’m at that point when I finally “get” it, so I thought I’d document the process, and maybe capture my a-ha moment of Applicatives.

I should point out that the ideas and approach in this post are all based on Abhinav’s class material (and I’ve found them really effective in understanding the underlying concepts). Many thanks are due to him, and any lack of clarity you find ahead is in my own understanding.

Functors and Applicatives

Functors represent a type or a context on which we can meaningfully apply (map) a function. The Functor typeclass is pretty straightforward:

class Functor f where
  fmap :: (a -> b) -> f a -> f b

1 2	class Functor f where fmap :: (a -> b) -> f a -> f b

Easy enough. fmap takes a function that transforms something of type a to type b and a value of type a in a context f. It produces a value of type b in the same context.

The Applicative typeclass adds two things to Functor. Firstly, it gives us a means of putting things inside a context (also called lifting). The second is to apply a function within a context.

class Functor f => Applicative f where
  pure :: a -> f a
  (<*>) :: f (a -> b) -> f a -> f b

class Functor f => Applicative f where

pure :: a -> f a

(<*>) :: f (a -> b) -> f a -> f b

We can see pure lifts a given value into a context. The apply function (<*>) intuitively looks like fmap, with the difference that the function is within a context. This becomes key when we remember that Haskell functions are curried (and can thus be partially applied). This would then allow us to write something like:

maybeAdd :: Maybe Int -> Maybe Int -> Maybe Int
maybeAdd ma mb = pure (+) <*> ma <*> mb

1 2	maybeAdd :: Maybe Int -> Maybe Int -> Maybe Int maybeAdd ma mb = pure (+) <> ma <> mb

This function takes two numbers in the Maybe context (that is, they either exist, or are Nothing), and adds them. The result will be the sum if both numbers exist, or Nothing if either or both do not.

Go ahead and convince yourself that it is painful to express this generically with just fmap.

Parsers

There are many ways of looking at what a parser is. Let’s work with one definition: A parser,

Takes some input
Converts some or all of it into something else if it can
Returns whatever input was not used in the conversion

How do we represent something that converts something to something else? It’s a function, of course. Let’s write that down as a type:

newtype Parser i o = Parser (i -> (Maybe o, i))

1	newtype Parser i o = Parser (i -> (Maybe o, i))

This more or less directly maps to what we just said. A Parser is a data type which has two type parameters — an input type and an output type. It contains a function that takes one argument of the input type, and produces a tuple of Maybe the output type (signifying if parsing succeeded) and the rest of the input.

We can name the field runParser, so it becomes easier to get a hold of the function inside our Parser type:

newtype Parser i o = Parser { runParser :: i -> (Maybe o, i) }

1	newtype Parser i o = Parser { runParser :: i -> (Maybe o, i) }

Parser combinators

The “rest” part is important for the reason that we would like to be able to chain small parsers together to make bigger parsers. We do this using “parser combinators” — functions that take one or more parsers and return a more complex parser formed by combining them in some way. We’ll see some of those ways as we go along.

Parser instances

Before we proceed, let’s define Functor and Applicative instances for our Parser type.

instance Functor (Parser i) where
  fmap f p = Parser $ \input ->
    let (mo, i) = runParser p input
    in (f <$> mo, i)

instance Functor (Parser i) where

fmap f p = Parser $ \input ->

let (mo, i) = runParser p input

in (f <$> mo, i)

The intuition here is clear — if I have a parser that takes some input and provides some output, fmaping a function on that parser translates to applying that function on the output of the parser.

instance Applicative (Parser i) where
  pure x = Parser $ \input -> (Just x, input)

  pf <*> po = Parser $ \input ->
    case runParser pf input of
         (Just f, rest) -> case runParser po rest of
                                (Just o, rest') -> (Just (f o), rest')
                                (Nothing, _)    -> (Nothing, input)
         (Nothing, _)   -> (Nothing, input)

instance Applicative (Parser i) where

pure x = Parser $ \input -> (Just x, input)

pf <*> po = Parser $ \input ->

case runParser pf input of

(Just f, rest) -> case runParser po rest of

(Just o, rest') -> (Just (f o), rest')

(Nothing, _) -> (Nothing, input)

The Applicative instance is a bit more involved than Functor. What we’re doing first is “running” the first parser which gives us the function we want to apply (remember that this is a curried function, so rather than parsing out a function, we are most likely parsing out a value and creating a function with that). If we succeed, then we run the second parser to get a value to apply the function to. If this is also successful, we apply the function to the value, and return the result within the parser context (i.e. the result, and the rest of the input).

Implementing some parsers

Now let’s take our new data type and instances for a spin. Before we write a real parser, let’s write a helper function. A common theme while parsing a string is to match a single character on a predicate — for example, “is this character an alphabet”, or “is this character a semi-colon”. We write a function to take a predicate and return the corresponding parser:

satisfy :: (Char -> Bool) -> Parser String Char
satisfy p = Parser $ \input ->
  case input of
       (c:cs) | p c -> (Just c, cs)
       _            -> (Nothing, input)

satisfy :: (Char -> Bool) -> Parser String Char

satisfy p = Parser $ \input ->

case input of

(c:cs) | p c -> (Just c, cs)

_ -> (Nothing, input)

Now let’s try to make a parser that takes a string, and if it finds a ASCII digit character, provides the corresponding integer value. We have a function from the Data.Char module to match ASCII digit characters — isDigit. We also have a function to take a digit character and give us an integer — digitToInt. Putting this together with satisfy above.

import Data.Char (digitToInt, isDigit)

digit :: Parser String Int
digit = digitToInt <$> satisfy isDigit

import Data.Char (digitToInt, isDigit)

digit :: Parser String Int

digit = digitToInt <$> satisfy isDigit

And that’s it! Note how we used our higher-order satisfy function to match a ASCII digit character and the Functor instance to apply digitToInt to the result of that parser (reminder: <$> is just the infix form of writing fmap — this is the same as fmap digitToInt (satisfy digit).

Another example — a character parser, which succeeds if the next character in the input is a specific character we choose.

char :: Char -> Parser String Char
char x = satisfy (x ==)

1 2	char :: Char -> Parser String Char char x = satisfy (x ==)

Once again, the satisfy function makes this a breeze. I must say I’m pleased with the conciseness of this.

Finally, let’s combine character parsers to create a word parser — a parser that succeeds if the input is a given word.

word :: String -> Parser String String
word ""     = Parser $ \input -> (Just "", input)
word (c:cs) = (:) <$> char c <*> word cs

word :: String -> Parser String String

word "" = Parser $ \input -> (Just "", input)

word (c:cs) = (:) <$> char c <*> word cs

A match on an empty word always succeeds. For anything else, we just break down the parser to a character parser of the first character and a recursive call to the word parser for the rest. Again, note the use of the Functor and Applicative instance. Let’s look at the type signature of the (:) (list cons) function, which prepends an element to a list:

(:) :: a -> [a] -> [a]

1	(:) :: a -> [a] -> [a]

The function takes two arguments — a single element of type a, and a list of elements of type a. If we expand the types some more, we’ll see that the first argument we give it is a Parser String Char and the second is a Parser String [Char] (String is just an alias for [Char]).

In this way we are able to take the basic list prepend function and use it to construct a list of characters within the Parser context. (a-ha!?)

JSON

JSON is a relatively simple format to parse, and makes for a good example for building a parser. The JSON website has a couple of good depictions of the JSON language grammar front and center.

So that defines our parser problem then — we want to read a string input, and convert it into some sort of in-memory representation of the JSON value. Let’s see what that would look like in Haskell.

data JsonValue = JsonString String
               | JsonNumber JsonNum
               | JsonObject [(String, JsonValue)]
               | JsonArray [JsonValue]
               | JsonBool Bool
               | JsonNull

-- We represent a number as an infinite precision
-- floating point number with a base 10 exponent
data JsonNum = JsonNum { negative :: Bool
                       , signif   :: Integer
                       , expo     :: Integer
                       }

data JsonValue = JsonString String

| JsonNumber JsonNum

| JsonObject [(String, JsonValue)]

| JsonArray [JsonValue]

| JsonBool Bool

| JsonNull

-- We represent a number as an infinite precision

-- floating point number with a base 10 exponent

data JsonNum = JsonNum { negative :: Bool

, signif :: Integer

, expo :: Integer

}

The JSON specification does not really tell us what type to use for numbers. We could just use a Double, but to make things interesting, we represent it as an arbitrary precision floating point number.

Note that the JsonArray and JsonObject constructors are recursive, as they should be — a JSON array is an array of JSON values, and a JSON object is a mapping from string keys to JSON values.

Parsing JSON

We now have the pieces we need to start parsing JSON. Let’s start with the easy bits.

null

To parse a null we literally just look for the word “null”.

jsonNull :: Parser String JsonValue
jsonNull = word "null" $> JsonNull

1 2	jsonNull :: Parser String JsonValue jsonNull = word "null" $> JsonNull

The $> operator is a flipped shortcut for fmap . const — it evaluates the argument on the left, and then fmaps the argument on the right onto it. If the word "null" parser is successful (Just "null"), we’ll fmap the JsonValue representing null to replace the string "null" (i.e. we’ll get a (Just JsonNull, <rest of the input>)).

true and false

First a quick detour:

instance Alternative (Parser i) where
  empty = Parser $ \input -> (Nothing, input)
  p1 <|> p2 = Parser $ \input ->
      case runParser p1 input of
           (Nothing, _) -> case runParser p2 input of
                                (Nothing, _) -> (Nothing, input)
                                justValue    -> justValue
           justValue    -> justValue

instance Alternative (Parser i) where

empty = Parser $ \input -> (Nothing, input)

p1 <|> p2 = Parser $ \input ->

case runParser p1 input of

(Nothing, _) -> case runParser p2 input of

(Nothing, _) -> (Nothing, input)

justValue -> justValue

The Alternative instance is easy to follow once you understand Applicative. We define an empty parser that matches nothing. Then we define the alternative operator (<|>) as we might intuitively imagine.

We run the parser given as the first argument first, if it succeeds we are done. If it fails, we run the second parser on the whole input again, if it succeeds, we return that value. If both fail, we return Nothing.

Parsing true and false with this in our belt looks like:

jsonBool :: Parser String JsonValue
jsonBool =  (word "true" $> JsonBool True)
        <|> (word "false" $> JsonBool False)

jsonBool :: Parser String JsonValue

jsonBool = (word "true" $> JsonBool True)

<|> (word "false" $> JsonBool False)

We are easily able express the idea of trying to parse for the string “true”, and if that fails, trying again for the string “false”. If either matches, we have a boolean value, if not, Nothing. Again, nice and concise.

String

This is only slightly more complex. We need a couple of helper functions first:

hexDigit :: Parser String Int
hexDigit = digitToInt <$> satisfy isHexDigit

digitsToNumber :: Int -> [Int] -> Integer
digitsToNumber base digits = foldl (\num d -> num * fromIntegral base + fromIntegral d) 0 digits

hexDigit :: Parser String Int

hexDigit = digitToInt <$> satisfy isHexDigit

digitsToNumber :: Int -> [Int] -> Integer

digitsToNumber base digits = foldl (\num d -> num * fromIntegral base + fromIntegral d) 0 digits

hexDigit is easy to follow. It just matches anything from 0-9 and a-f or A-F.

digitsToNumber is a pure function that takes a list of digits, and interprets it as a number in the given base. We do some jumping through hoops with fromIntegral to take Int digits (mapping to a normal word-sized integer) and produce an Integer (arbitrary sized integer).

Now follow along one line at a time:

jsonString :: Parser String String
jsonString = (char '"' *> many jsonChar <* char '"')
  where
    jsonChar =  satisfy (\c -> not (c == '\"' || c == '\\' || isControl c))
            <|> word "\\\"" $> '"'
            <|> word "\\\\" $> '\\'
            <|> word "\\/"  $> '/'
            <|> word "\\b"  $> '\b'
            <|> word "\\f"  $> '\f'
            <|> word "\\n"  $> '\n'
            <|> word "\\r"  $> '\r'
            <|> word "\\t"  $> '\t'
            <|> chr . fromIntegral . digitsToNumber 16 <$> (word "\\u" *> replicateM 4 hexDigit)

jsonString :: Parser String String

jsonString = (char '"' *> many jsonChar <* char '"')

where

jsonChar = satisfy (\c -> not (c == '\"' || c == '\\' || isControl c))

<|> word "\\\"" $> '"'

<|> word "\\\\" $> '\\'

<|> word "\\/" $> '/'

<|> word "\\b" $> '\b'

<|> word "\\f" $> '\f'

<|> word "\\n" $> '\n'

<|> word "\\r" $> '\r'

<|> word "\\t" $> '\t'

<|> chr . fromIntegral . digitsToNumber 16 <$> (word "\\u" *> replicateM 4 hexDigit)

A string is a valid JSON character, surrounded by quotes. The *> and <* operators allow us to chain parsers whose output we wish to discard (since the quotes are not part of the actual string itself). The many function comes from the Alternative typeclass. It represents zero or more instances of context. In our case, it tries to match zero or more jsonChar parsers.

So what does jsonChar do? Following the definition of a character in the JSON spec, first we try to match something that is not a quote ("), a backslash (\) or a control character. If that doesn’t match, we try to match the various escape characters that the specification mentions.

Finally, if we get a \u followed by 4 hexadecimal characters, we put them in a list (replicateM 4 hexDigit chains 4 hexDigit parsers and provides the output as a list), convert that list into a base 16 integer (digitsToNumber), and then convert that to a Unicode character (chr).

The order of chaining these parsers does matter for performance. The first parser in our <|> chain is the one that is most likely (most characters are not escaped). This follows from our definition of the Alternative instance. We run the first parser, then the second, and so on. We want this to succeed as early as possible so we don’t run more parsers than necessary.

Arrays

Arrays and objects have something in common — they have items which are separated by some value (commas for array values, commas for each key-value pair in an object, and colons separating keys and values). Let’s just factor this commonality out:

sepBy :: Parser i v -> Parser i s -> Parser i [v]
sepBy v s = (:) <$> v <*> many (s *> v) 
         <|> pure []

sepBy :: Parser i v -> Parser i s -> Parser i [v]

sepBy v s = (:) <$> v <*> many (s *> v)

<|> pure []

We take a parser for our values (v), and a parser for our separator (s). We try to parse one or more v separated by s, and or just return an empty list in the parser context if there are none.

Now we write our JSON array parser as:

jsonArray :: Parser String JsonValue
jsonArray = JsonArray <$> (char '[' *> (json `sepBy` char ',') <* char ']')

1 2	jsonArray :: Parser String JsonValue jsonArray = JsonArray <$> (char '[' > (json `sepBy` char ',') < char ']')

Nice, that’s really succinct. But wait! What is json?

Putting it all together

We know that arrays contain JSON values. And we know how to parse some JSON values. Let’s try to put those together for our recursive definition:

json :: Parser String JsonValue
json =  jsonNull
    <|> jsonBool
    <|> jsonString
    <|> jsonArray
--  <|> jsonNumber
--  <|> jsonObject

json :: Parser String JsonValue

json = jsonNull

<|> jsonBool

<|> jsonString

<|> jsonArray

-- <|> jsonNumber

-- <|> jsonObject

And that’s it!

The JSON object and number parsers follow the same pattern. So far we’ve ignored spaces in the input, but those can be consumed and ignored easily enough based on what we’ve learned.

You can find the complete code for this exercise on Github.

Some examples of what this looks like in the REPL:

*Json> runParser json "null"
(Just null,"")

*Json> runParser json "true"
(Just true,"")

*Json> runParser json "[null,true,\"hello!\"]"
(Just [null, true, "hello!" ],"")

*Json> runParser json "null"

(Just null,"")

*Json> runParser json "true"

(Just true,"")

*Json> runParser json "[null,true,\"hello!\"]"

(Just [null, true, "hello!" ],"")

Concluding thoughts

If you’ve made it this far, thank you! I realise this is long and somewhat dense, but I am very excited by how elegantly Haskell allows us to express these ideas, using fundamental aspects of its type(class) system.

A nice real world example of how you might use this is the optparse-applicative package which uses these ideas to greatly simplify the otherwise dreary task of parsing command line arguments.

I hope this post generates at least some of the excitement in you that it has in me. Feel free to leave your comments and thoughts below.

3 Comments

Add yours

Ashesh Ambasta

February 23, 2018 — 1:18 pm

This is pretty great. A similar example is in RWH if I remember correctly.

Evgeniy

May 19, 2019 — 4:26 pm

This post has a bunch of Haskell code,

Do you have a code-only version? Running a particular script locally makes it a lot easier to understand the writing.

- Arun
  
  May 20, 2019 — 10:23 am
  
  Here you go: https://github.com/ford-prefect/haskell-classes/blob/master/year3/json.hs