Evaluating Parsers

I’m finally ready to end this detour and get back to my main side project and there’s one last detail to address. So let’s get down to it. This post is going to be very short, even with the code.

Picking up

In the last post, I defined a grammar as

(defrule upper-case (apply one-of (map term "ABCDEFGHIJKLMNOPQRSTUVWXYZ")))
(defrule lower-case (apply one-of (map term "abcdefghijklmnopqrstuvwxyz")))
(defrule letter (one-of upper-case lower-case))

(defrule integer (all (optional (one-of (term \+) (term \-)))
                      (apply one-of (map term "123456789"))
                      (none-or-more (apply one-of (map term "0123456789"))))
  (fn [& digits]
    (read-string (apply str digits))))

(defrule whitespace (apply one-of (map term " \n\t\f")))

(defrule word (one-or-more letter)
  (fn [& chars]
    (->> chars
         (apply str)
         symbol)))

(defrule form)
(defrule s-expr (all (ignore (term \())
                     (ignore (none-or-more whitespace))
                     (none-or-more
                      (all form
                           (ignore (none-or-more whitespace))))
                     (ignore (term \))))
  list)

(defrule form
  (one-of integer
          word
          s-expr))

form then, is just a Free data structure than can be evaluated in various ways. We saw two of those interpretations last time.

Simple Samples

Generating a parser from form to parse a string can be thought of as running the parser forwards. Is it possible to run it backwards and generate random strings that conform to the grammar? Its not only possible, its almost trivial. And it is a good example of to evaluate Free Applicative structures.

The first thing you need is a protocol with a single function that you can use to extend the primitive record types of the parser; Opt, NoneOrMore, Rule and Ignore.

(defprotocol GenString
  (gen-string [_ depth]))

Then you need a record type that implements the Applicative protocol. And also the Monoid protocol if you used plus like we did for our parser library. The final value returned from the evaluation is going to be of this type. The implementations of fapply* and plus* should do whatever is appropriate.

(deftype SampleString [x]
  Applicative
  (fapply* [_ args]
    (SampleString. (apply str (map #(.x %) args))))

  Monoid
  (plus* [v vs]
    (nth (cons v vs) (rand-int (inc (count vs))))))

For our purposes, fapply* just concats all the strings contained in args. plus* selects one of it’s argument’s strings at random.

Evaluating the pieces

Next, we need functions to convert the various pieces of our parser into SampleString record types. For terminal characters, its trivial.

(defn pure-string [v]
  (SampleString. v))

For the other pieces, we construct appropriate funcitons

(extend-type Opt
  GenString
  (gen-string [p depth]
    (if (or (> depth 600)
            (= 0 (rand-int 2)))
      (SampleString. "")
      (evaluate (.expr p) pure-string #(gen-string % (inc depth))))))

The protocol function gen-string takes an extra parameter in addtion to the Opt value. This parameter keeps track of how deep the recursion is. I found it was very easy to blow the stack if there wasn’t a limit. So if the depth exceeds 600, then don’t add the optional substring and also skip it if the random integer is 0.

But if the random integer is not 0, then generate a SampleString value from the parser expr by recursively calling evaluate, incrementing the depth by one.

(extend-type NoneOrMore
  GenString
  (gen-string [p depth]
    (if (or (> depth 600)
            (= 0 (rand-int 2)))
      (SampleString. "")
      (fapply str
              (evaluate (.expr p) pure-string #(gen-string % (inc depth)))
              (gen-string p (inc depth))))))

NoneOrMore is very similar except it keeps concatenating the generated sub-strings if the depth hasn’t been exeeded or the random int is not 0.

(extend-type Rule
  GenString
  (gen-string [p depth]
    (let [expr (or (.expr p)
                   (deref (resolve (symbol (.name p)))))]
      (evaluate expr pure-string #(gen-string % (inc depth))))))

(extend-type Ignore
  GenString
  (gen-string [p depth]
    (evaluate (.expr p) pure-string #(gen-string % (inc depth)))))

Rule and Ignore are even easier. They just generate a sub-string from their parser expr’s.

And finally, a function to kick off the process at the top level.

(defn sample-string [rule]
  (.x (evaluate rule pure-string #(gen-string % 1))))

Making use of the samples

So now, we can generate a bunch of sample strings from form and then run them through the parser.

(dotimes [_ 100]
  (let [s (sample-string form)
        p (extract ((parser form) s))]
    (if (nil? p)
      (prn s))))

That shouldn’t print any strings out because they should all be parseable. I did this right after I pushed out my last post and immediately found 3 or 4 bugs. A couple of tweaks later and they were fixed. Generative testing for the win!

But you can introduce a bug into the parser and see what happens. Try this one on for size

(defn pure-parser [term]
  (if (fn? term)
      (state-maybe term)
      (let [term-len (count term)]
        (for [text (update-state identity)
              :when (not= term "x")
              :when (.startsWith text term)
              _ (update-state #(subs % term-len))]
          [term]))))

Then re-evaluate form and re-run the test. It should print out a series of strings that all have x in them. This is a very crude form of generative, or property based, testing. But it shows how easy it is to do using Free Applicatives.

Extra Credit

I thought of some other ideas for evaluating the grammar, but decided to not pursue them.

I first saw the idea of running a parser backwards in a ‘The Reasoned Schemer’ and then saw Dan Friedman and Will Byrd wow the crowd with a demonastration at Clojure/Conj. So it should be very easy to generate a core.logic program from any grammar. Let me know if you want some pointers on how to do that.

Also, from my work on implementing a logic programming system using an Arrow instead of Monad, I showed you could visualize a logic program as a tree. It should likewise be trivial to do with any grammar.

And then there’s the idea of generating various optimized parsers from a grammar depending on the structure of the grammar.

Pick one and let me know the results.

Now it’s back to working on a native Clojure compiler.

Jim Duey 16 June 2014
blog comments powered by Disqus