UPDATE: Right after I finished this post, I started working on the next one and promptly discovered some bugs in the code. I’ve updated this post with the corrected code.

In the next couple of posts, I’m going to show how to build a parser library based on the Free Applicative Functor and what you can do with it. To follow along, clone (or update) this repo. Then ‘lein repl’ and you should be able to copy the code to see what it does.

This post runs long (again). The code is really not that long and it’s mostly very small functions. But the amount of details hidden by the abstractions takes a lot of prose to explain. Which is actually one of the very real benefits of using these abstractions. They let you implement an enormous amount of functionality in very few lines of code with fewer bugs.

If you want to see what the point of all this prose is, skip to the “Other Interpretations” section at the bottom and then come back to see how it was done.

For a couple of years now, I’ve wanted a Clojure environment that compiled to native code. One of the first tasks is be able to parse a source file of text into an intermediate form. While I was looking at that, I came across references to Applicative Parsers. So my side project took a long detour that has finally wound it’s way back to actually implementing an applicative parser.

First, the preamble

```
(require '[effects :refer :all]
'[effects.free :refer :all]
'[effects.maybe :refer :all]
'[effects.state :refer :all :exclude [set-val update-state]])
```

Now, the core of any parser combinator library are the actual combinators used to compose simpler parsers into more complicated ones. And there are really only a few that are needed. We’ll define record types for two of them

```
(deftype Opt [expr])
(defn optional [expr]
(free (Opt. expr)))
(deftype NoneOrMore [expr])
(defn none-or-more [parser]
(free (NoneOrMore. parser)))
```

`Opt`

makes the parser `expr`

optional and `NoneOrMore`

makes a parser match zero or multiple instances of `expr`

in the source text.

Then we also define two record types for some special combinators.

```
(deftype Ignore [expr])
(defn ignore [expr]
(free (Ignore. expr)))
```

`Ignore`

is a special combinator that is unique to Applicative parsers. It goes ahead and matches the parser `expr`

in the source text, but discards the result.

```
(deftype Rule [name expr handler-fn])
(defmacro defrule
([sym]
`(def ~sym (effects.free/free (->Rule '~sym nil nil))))
([sym rule]
`(let [~sym (effects.free/free (->Rule '~sym nil nil))]
(def ~sym (effects.free/free (->Rule '~sym ~rule nil)))))
([sym rule handler-fn]
`(let [~sym (effects.free/free (->Rule '~sym nil nil))]
(def ~sym (effects.free/free (->Rule '~sym ~rule ~handler-fn))))))
```

`Rule`

is how we’re going to assign names to parsers so we can refer to them later in other parsers. We’ll talk more about `handler-fn`

later. `defrule`

is just some syntax sugar to make things look nice. One interesting point to note about `defrule`

is that it binds the name given to a `Rule`

record that has nothing in it before expanding to a `def`

of that name to the given parser expression. This lets you write recursive rules in a natural fashion as we’ll see in a little bit.

Now that we have the combinators, we need something to compose. The Free Applicative gives us the rest of the combinators we need. First off, terminal characters are the simplest parser possible. They succeed if they are the next character in the text being parsed. We can use the `Pure`

type to represent those terminals.

```
(defn term [v]
(pure (str v)))
```

Once we have terminals, we need a way to sequentially compose them. We’ll use the Free implementation of `fapply*`

to do this. What `all`

does is `concat`

all the results of the given parsers together. This implies that a parser result is not just a single value, but a sequence of values.

```
(defn all [& parsers]
(fapply* (pure concat) parsers))
```

For our final combinator, we need to be able to try a parser, and if it fails try another and another until we find one that matches. So far, we haven’t seen any Free structure that would fit the bill. But if you think about it, what we really want is just a Free structure that holds a list of parsers. And the simplest Free Monoid is just such a list. So we add the `FreePlus`

record type.

```
(defn one-of [& parsers]
(free-plus parsers))
```

And one more convenience combinator will round out our parser library.

```
(defn one-or-more [parser]
(all parser (none-or-more parser)))
```

Here’s a small grammar to parse some very simple s-expressions. You’ll notice that some of the rules have handler functions specified.

```
(defrule upper-case (apply one-of (map term "ABCDEFGHIJKLMNOPQRSTUVWXYZ")))
(defrule lower-case (apply one-of (map term "abcdefghijklmnopqrstuvwxyz")))
(defrule letter (one-of upper-case lower-case))
(defrule integer (all (optional (one-of (term \+) (term \-)))
(apply one-of (map term "123456789"))
(none-or-more (apply one-of (map term "0123456789"))))
(fn [& digits]
(read-string (apply str digits))))
(defrule whitespace (apply one-of (map term " \n\t\f")))
(defrule word (one-or-more letter)
(fn [& chars]
(->> chars
(apply str)
symbol)))
(defrule form)
(defrule s-expr (all (ignore (term \())
(ignore (none-or-more whitespace))
(none-or-more
(all form
(ignore (none-or-more whitespace))))
(ignore (term \))))
list)
(defrule form
(one-of integer
word
s-expr))
```

One thing to notice is the way `form`

and `s-expr`

are mutually recursive and how that is described by pre-declaring an empty rule for `form`

first, then supplying the definition later. Also note that `form`

is self recursive.

At this point, the most important thing to keep in mind is that `form`

can’t actually parse anything. It is only a data structure that encodes the structure of the parser. By adding some `toString`

implementations to the various record types, you could explore what this data structure looks like.

Before we can convert `form`

to an actual parsing function, we have to have some kind of Applicative data type that implements parsing. Parsing involves two things; a text that gets consumed and the possibility that a parser will fail. Each of these two aspects are represented by a different effect.

The repo I’ve been using for these investigations is built around the idea of effects which have different expressions. An effect may be any or all of; Endofunctor, Applicative, Monad, Comonad or Arrow. Each of these aspects are represented as a Clojure protocol and the various effects implement the protocols that are appropriate.

One of these effects is `maybe`

, which implements the semantics of conditional success.

Another one is the `state`

effect, which implements the semantics of mutable state.

The main idea I’m investigating with this repo is how these various effects can be composed to produce other effects. The idea is that when two effects are composed, the composite effects implement the expressions that are valid. For instance, since `maybe`

and `state`

both have Monad expressions (and by implication Endofunctor and Applicative), their composition would also have Endofunctor and Applicative expressions, since both of those are closed under composition. The composition of two effects that are Monads might also have a Monad expression. But that’s not guaranteed since Monad composition is not closed.

I have much more to learn about the composition of these kinds of effects, so stay tuned.

So, how do we compose `state`

and `maybe`

to make a parsing effect? It couldn’t be simpler since all effects implement the Monoid interface

```
(def state-maybe (plus state maybe))
```

But realize that in this case, `plus`

is not commutative. That is `(plus state maybe)`

is not the same as `(plus maybe state)`

. They both might be useful in different cases, but they’re not equivilent.

Also, much of the usefulness of an effect lies in additional functions. In particular, the `state`

effect has two of these functions, `update-state`

and `set-val`

. If you look back up at the top of this post, you’ll see that we `:excluded`

these two functions when we brought in the `effects.state`

namespace. That’s because we’re going to have to reimplement them for the composed `state-maybe`

effect. I haven’t figured out a way to do this cleanly yet, though I have some ideas. For now, we’ll just do it the hard way. This involves some implementation details that I’ll skip.

```
(defn update-state [f]
(->State maybe
(fn [s] (maybe (list s (f s))))
(pr-str "update-state")))
(defn set-val [k v]
(update-state #(assoc % k v)))
```

`update-state`

lets us update the mutable state value by providing a function `f`

that takes a state value and returns an updated one. `set-val`

then uses `update-state`

to implement a stateful key/value store with a hash-map.

The first step in evaluating Free structures is to write a function to evaluate the `Pure`

values. In our case, our `Pure`

values are either functions, which we’ll ignore, or terminal characters. This function must accept a value and return a `state-maybe`

value. If the input value is a function, we’ll just wrap it in `state-maybe`

.

If its a terminal character, we need to return a `state-maybe`

value that will pull the first character from the state, which is going to be a string, and tests it against the terminal character. The rest of the string also needs to become the new value of the state which is accomplished using Clojure’s substring function `subs`

.

```
(defn pure-parser [term]
(if (fn? term)
(state-maybe term)
(let [term-len (count term)]
(for [text (update-state identity)
:when (.startsWith text term)
_ (update-state #(subs % term-len))]
[term]))))
```

In this function, we’re using the `for`

macro from the `effects`

repo which implements generalized monad comprehension, not the one from `clojure.core`

which implements comprehension only for sequences. Also, `update-state`

returns the previous state value and Clojure’s list destructuring syntax provides a succinct way to extract the first character.

Now we can get down to the business of converting `form`

to an actual parser.

First we need a protocol function we can use to extend our basic combinators so they generate parsers.

```
(defprotocol GenParser
(gen-parser [_]))
(extend-type Opt
GenParser
(gen-parser [p]
(plus (evaluate (.expr p) pure-parser gen-parser)
(state-maybe []))))
```

The first thing we do for the `Opt`

combinator is extract the `expr`

field from the `Opt`

record value `p`

. This parser expression is a Free structure, so we call `evaluate`

on it to convert it to an actual parser. With that parser in hand, we then take advantage of the Monoid expression of the `state-maybe`

effect by `plus`

ing it with a parser that always succeeds but doesn’t contribute any parsed value.

Next up, we need to generate a parser from a `NoneOrMore`

record. But we need a little helper function. `repeat-parser`

takes `parser`

and attempts to match it, capturing the successful match in `x`

. And then it calls itself recursively to attempt to match `parser`

again. Eventually `parser`

will fail and the stack built by the recursion unwinds with all the results `concat`

ed together.

```
(defn repeat-parser [parser]
(for [x parser
xs (plus (repeat-parser parser)
(state-maybe []))]
(concat x xs)))
(extend-type NoneOrMore
GenParser
(gen-parser [p]
(plus (repeat-parser (evaluate (.expr p) pure-parser gen-parser))
(state-maybe []))))
```

And then the implementation for `NoneOrMore`

uses `plus`

to either repeatedly match the parser constructed from `(.expr p)`

or the empty-results parser.

`Ignore`

has an interesting implementation of `gen-parser`

. It uses the Applicative expression of `state-maybe`

to first match a parser constructed from `p`

. Then it throws the result away and always returns the empty results value.

```
(extend-type Ignore
GenParser
(gen-parser [p]
(fapply (constantly []) (evaluate (.expr p) pure-parser gen-parser))))
```

Generating a parser from a rule is the most complex of all the combinators, but it’s still pretty easy. First, we have to extract the `expr`

and `handler-fn`

from the Free parser structure `p`

.

Then, if `expr`

is nil, we have to use Clojure’s `resolve`

function to look up the symbol for `p`

’s name to get the Free parser structure to evaluate. Otherwise, we just evaluate the Free structure `expr`

to build a parser. In either case, we use the `for`

macro to generate a parser that matches the just created parser and captures the successful match into `parsed`

. Then, the handler function is applied to the parsed items so that other kinds of data values can be returned rather than just character sequences.

```
(extend-type Rule
GenParser
(gen-parser [p]
(let [expr (.expr p)
handler-fn (or (.handler-fn p)
identity)]
(if (nil? expr)
(for [parsed (evaluate (deref (resolve (symbol (.name p))))
pure-parser gen-parser)]
[(apply handler-fn parsed)])
(for [parsed (evaluate expr pure-parser gen-parser)]
[(apply handler-fn parsed)])))))
```

And finally, a little function to generate a parser when given a Free parser expression.

```
(defn parser [expr]
(evaluate expr pure-parser gen-parser))
```

```
(prn (extract ((parser form) "(str (add 15 92))")))
;; => ([(str (add 15 92))] "")
```

Lets break this down. `form`

is Free parser expression as defined above. We create a parser function from it using `parser`

. This function takes a string as input, so we call it with one. The value returned by the parser is a `maybe`

effect value that contains a `state`

effect value. To unwrap the `maybe`

we use `extract`

. This leaves a `state`

effect value which is just a list of two items. The second item is the text remaining to parse. In this case, the entire input string was parsed so nothing is left.

The first item of the `state`

value is a vector containing the final parsed value. Which in this case is a nested list structure.

Since you’re still with me, I’ll show the real payoff of using the Free Applicative parser.

With a normal parser, all you have is a function that parses a string. The only thing you can do with it is use it to parse strings. A useful result, of course, but we can do much better. Here’s just one example. You should have enough understanding now to figure out how it works. But if not be sure and ask in the comments.

```
(defprotocol GenEBNF
(gen-ebnf [_]))
(deftype EBNF [x]
Applicative
(fapply* [_ args]
(EBNF. (fapply* (state-maybe (fn [& args]
(apply str (interpose " , " args))))
(map #(.x %) args))))
Monoid
(plus* [v vs]
(EBNF. (fapply* (state-maybe (fn [& args]
(apply str (interpose " | " args))))
(map #(.x %) (cons v vs))))))
(defn pure-ebnf [v]
(EBNF. (state-maybe (condp = v
"\n" "\"\\n\""
"\t" "\"\\t\""
" " "\" \""
"\f" "\"\\f\""
(format "\"%s\"" v)))))
(extend-type Opt
GenEBNF
(gen-ebnf [p]
(EBNF. (for [expr-ebnf (.x (evaluate (.expr p) pure-ebnf gen-ebnf))]
(str "[ " expr-ebnf " ]")))))
(extend-type NoneOrMore
GenEBNF
(gen-ebnf [p]
(EBNF. (for [expr-ebnf (.x (evaluate (.expr p) pure-ebnf gen-ebnf))]
(str "{ " expr-ebnf " }")))))
(extend-type Rule
GenEBNF
(gen-ebnf [p]
(if (.expr p)
(EBNF. (for [expr-ebnf (.x (evaluate (.expr p) pure-ebnf gen-ebnf))
_ (set-val (.name p) (str expr-ebnf " ;"))]
(.name p)))
(EBNF. (state-maybe (.name p))))))
(extend-type Ignore
GenEBNF
(gen-ebnf [p]
(evaluate (.expr p) pure-ebnf gen-ebnf)))
(defn ebnf [rule]
(second (extract ((.x (evaluate rule pure-ebnf gen-ebnf)) {}))))
(defn print-ebnf [rule]
(doseq [[name rule] (reverse (ebnf rule))]
(println name ":=" rule)))
(print-ebnf form)
```

Generating an EBNF grammar from a parser is pretty cool. But that’s just the beginning of how you can interpret Free Applicative parsers. Next time I plan on showing some of those. But you might consider how you could interpret a Free parser structure to implement optimized parsers depending on the grammar specified.

Jim Duey 14 June 2014