# Building Trees

After all the posts on the logic arrow, we’re finally to the point of implementing a rudimentary core.logic. Way back at the start of this series, I said that this is only a proof of concept and that hasn’t changed. I’ve got other things to work on that I find more interesting, but there’s a huge potential here for someone to make core.logic the best logic programming system out there. Well, it’s already that, but to make it even better. So, if you’re inclined in that direction and want to really expand your abilities with monads, arrows and core.logic, let me know and I’ll be glad to assist. I would love to see someone take this an run with it.

This is quite long, but I wanted to finish this series and move on to something else. And if you think it’s too much trouble, keep in mind that we are implementing a way to automatically eliminate many of the errors that plague logic programming thereby saving enormous amounts of programmer time and frustration.

### The magic

To start with, take a look at this code and corresponding tree:

(defno v [v] (== v 15))
(defno w [w] (== w 10))
(defno x [x] (== x :x))
(defno y [y] (== y :y))
(defno z [z] (== :z z))
(def p (fresh [a b c q]
(z a)
(conde
[(x b)]
[(y b)])
(w q)
(v q)))

This is nonsense code used strictly as a means to produce the tree. Expression p can be read as the call to z followed by the calls to x and y in parallel with both results being fed to w and then to v.

Or, you can visualize an empty substitution dropping into the top of the tree and falling through each node as it moves towards the leaves, picking up unifications on the way. At the branching node, a copy is passed to each branch.

So even though w and v only appear once in the code, they appear twice in the tree because they follow the conde expression which has two clauses. And our task in evaluating the code is to build a representation of the tree using the logic arrow.

To do this, notice that each of the functions v through z are created by defno. I’ll explain that later, but for now it will suffice to say that they each take their argument and (sort of) return an arrow value. So fresh somehow will use arrow-seq to combine those arrow values sequentially. But it has to work from the bottom up, so it knows that v follows w follows the conde expression, etc. This is a classic use of the continuation monad. So what functions defined by defno return are actually arrow values wrapped in the continuation monad. Simple, no?

### Except …

It’s not that simple because we have another requirement and it’s actually the point of this whole exercise.

Consider this code and the resulting tree:

(declare top)
(defno bottom [x]
(conde
[(top x)]
[(v x)]))

(defno top [x]
(bottom x))

top and bottom are mutually recursive and v is same as above. In this case, bottom is the continuation of top, but top is used in the definition of bottom. To eliminate divergence, you need to detect this recursion. And you can only do this by building the tree from the root downward and keeping track of which functions have been called, but not completed, before being called again. Which is easy to do by using the state monad to keep track of which functions are ‘active’. And when you see an active function being called again, you know it’s a recursive call so you can tag the resulting arrow with a :recursive meta data tag. But how to do this?

### The Magic

And here is why I love monads, we have two requirements, each with their own form of complexity and each being solved by a different monad. It would be possible to brute force our way through using some kind of mutable state and a stack, and hacking a path through the untracked jungle on our own.

Or we can use a monad, which gives a map (no pun intended :)) of how to proceed. And even though the path may be difficult, it is there and there are trail markers to tell us when we’ve lost our way.

So, we create a monad that combines both the state and continuation monad:

(def tree-m (m/state-t m/cont))

This just uses the state monad transformer to generate the needed monad.

With this in hand, we can revisit our basic goals. First failure:

(def fail-p (logic-arr. (constantly logic-zero)
{:results 0}))
(def fail (tree-m fail-p))

We assign the logic arrow value to fail-p so we can use it later. And our fail goal is just that value wrapped in our tree monad.

And our unification goal creator is just:

(defn == [u v]
(tree-m
(logic-arr. (fn [a]
(if-let [ap (unify a u v)]
(logic-m ap)
logic-zero))
{})))

### Some complexity

Here’s where the path gets a little steep. Any time you use a continuation monad, you’ve got to have a way to get at the continuation when you need to or else there’s no point in using the monad in the first place. For us, this is complicated by the fact the our continuation monad is wrapped in the state monad. But it’s fairly straight forward what needs to written.

(defn tree-cc [f]
(state-transformer. m/cont nil
(tree-m nil)
(fn [_]
(fn [m]
(fn [c]
(f m c))))
nil))

There are two pieces of contexts our tree monad is managing for us; the state m and the continuation c. So tree-cc simply takes a function f that accepts as parameters both m and c and returns a tree-m monadic value. By supplying various kinds of functions to tree-cc, we can create values in our tree monad that have access to m and c so we can do whatever the heck we want. This is where the power of monads really shows up.

One thing to keep at the forefront of your thinking is that c is a function that builds the rest of the tree from the present node down to the leaves. While m is just whatever value we want to pass through as our state. But if c is a function, what is it’s argument? Since it is a continuation, it must take whatever value we would ordinarily return from whatever function we’re in. But since our tree monad also has the state context, we need to pass that on as well. As a result, c is a function that takes a vector of our current return value and the current state.

(c [value state])

And it will proceed to build out the rest of the tree.

Given that, how do we actually build the tree for a given logic program?

At the top level, a logic program is created by a call to defno. And defno returns a value in our tree monad. So given something like

(defno z [q]
(== q 10))

We could build our tree with

(first ((z beginning-state) identity))

We’ll discuss which value to use for beginning-state in a bit. But calling z with it returns a function that expects to be called with a continuation. By passing identity as our continuation, we’re indicating we just want to get whatever the final result is. And since, for us, continuations always accept a vector, that’s what is going to be returned to us. But we’re not interested in what final state value was, so we use first to get what we’re interested in, which is a logic arrow value. All this follows pretty mechanically from the definition of our tree monad

(def tree-m (m/state-t m/cont))

Since we’re going to be building trees in several different places, we’ll make function to do that for us.

(defn build-tree
([tree-fn] (build-tree tree-fn #{}))
([tree-fn m] (build-tree tree-fn m identity))
([tree-fn m c] ((tree-fn m) c)))

If we’re given just a tree building function, we’re going to use an empty hash-set as our initial state value.

Remember that in building our tree, we want to keep track of ‘active’ functions. That is, functions that are called, but whose continations haven’t been called yet. Whether a function is active or not is a binary option, so a function will only appear in our state at most once. And we’ll need to check whether a given function is in our state, so a set is the natural choice.

### Glue

Given our tree monad, fail and ==, we need a way to compose them to build logic programs. minKanren/core.logic provides a set of operators to that.

First off, we want to combine goals sequentially.

(defn all [& steps]
(m/bind (first steps)
(m/chain (map (fn [step]
(fn [p]
(tree-cc
(fn [m c]
(let [[step-p new-m] (build-tree step m c)]
[(a/seq p step-p) new-m])))))
(rest steps)))))

Oh boy. That looks complicated, so let’s break it up. We know that all needs to accept a set of goals or steps. These goals are values in our tree monad and all needs to return a tree-m value as well. Also, we want to compose these goals sequentially and that is done using m/bind which always returns a monadic value.

While the first parameter to m/bind is a monadic value, the second is a function that takes a logic arrow value (since those are the only kinds of values we’ll be wrapping with our tree monad) and returns a monadic value. So all of the goals except the first one have to somehow be scrunched into a single function.

m/chain will do what we want because it takes a list of functions that each take a plain value and return a monadic value and produces a function that has the effect of chaining them together sequentially.

So we have to create such a sequence out of the rest of the steps and the call to map does this. For reference, here’s the function we’re mapping with.

(fn [step]
(fn [p]
(tree-cc
(fn [m c]
(let [step-p (first (build-tree step m c))]
[(a/seq p step-p) nil])))))

This function has to take a step and return a function. That returned function has to take a logic arrow value p and return a tree monadic value. Hence the call to tree-cc.

tree-cc takes a function that accepts a state m and continuation c. And now we finally come to heart of the matter.

We have a logic value p which are all the preceding steps composed sequentially. And we have a continuation with which to build a logic value that is the rest of the tree from this point on. And we have the current step. So we build the step-p which is the current step plus the rest of the tree, using the current state m and the continuation. Then we use the sequential arrow combinator a/seq to compose those two arrow values and return that value as the first element of a pair.

### Branches

The next major core.logic combinator is conde for specifying alternatives, or branches, in our search tree

(defn conde [& clauses]
(let [clauses (map #(apply all %) clauses)]
(tree-cc
(fn [m c]
(let [clauses-p (->> clauses
(map #(build-tree % m c))
(map first)
(apply a/all))]
[clauses-p nil])))))

It’s a little shorter. conde accepts a list of clauses where each clause is a list of goals. Like all it needs to return a logic arrow value wrapped in the tree monad and so it uses tree-cc.

The first thing we need to do is combine each clause into a single goal, which we can do immediately with all we just defined. Then we can call tree-cc with a function that closes over clauses which waits until we’re given a state m and continuation c at tree building time.

When that happens, we build the sub tree corresponding to each branch of the conde using the same state and continuation for each. Since each call to build-tree returns a pair, we use first to extract the logic arrow values.

And finally, we use the arrow combinator a/all to compose the clause arrow values into a single logic arrow value, which we return in a pair. That final arrow value will see to it that each branch of the search tree is traversed and all the answers collected for a final answer. Magic!

fresh is not all that interesting, so I’ll just include it for completeness

(defmacro fresh [[& lvars] & goals]
`(let [~@(lvar-binds lvars)]
(clojure.core.logic/all ~@goals)))

### The final piece

And the final bit of magic is building these logic functions that return goals using defno. Buckle up.

Since this is just a proof of concept, instead of doing the right thing when we hit a recursive call, we’re going to punt and do nothing since all we really care about is the meta data on the arrow. So every recursive logic arrow value is going to be:

(defn recursive-p [sym]
(logic-arr. (constantly logic-zero)
{:recursive #{sym}}))

And finally, defno itself:

(defn embed-tree [sym expr m c]
(let [[p new-m] (build-tree expr
(conj m sym)
identity)]
(if (and (= (get (meta p) :results 0) 0)
(= (get (meta p) :recursive) #{sym}))
[fail-p m]
(c [p (disj new-m sym)]))))

(defn immediate-recursive [sym]
(constantly (tree-cc
(fn [m c]
[(recursive-p sym) m]))))

(defmacro defno [sym args expr]
`(defn ~sym ~args
(let [~sym (clojure.core.logic/immediate-recursive '~sym)]
(clojure.core.logic/tree-cc
(fn [m# c#]
(if (contains? m# '~sym)
[(clojure.core.logic/recursive-p '~sym) m#]
(clojure.core.logic/embed-tree '~sym ~expr m# c#)))))))

Starting with defno, it has to be a macro because it’s going to be generating definitions in the dictionary. It takes a symbol sym for the function to define, the args to the function. And finally, the goal expression expr.

The only real trick here is to define a value for sym inside the function definition for sym so that any immediately recursive call to sym in expr will not recur but will instead return value from immediate-recursive. And that value is just a tree monadic value that terminates the tree building process at that point, returning the recursive-p logic arrow value.

With that taken care of, we define a function for sym that uses tree-cc to return a tree monadic value anytime it is called. At tree building time, when given m# and c#, we check to see if sym is in m# indicating that this a recursive call to sym and if so, terminating the building.

But if it’s not a recursive call, then we need to build the tree from this point on using m# and c# which is done in embed-tree.

embed-tree adds sym to the state m signifying that this function is now ‘active’. And then it builds the tree for expr using the updated state.

If the resulting logic arrow value doesn’t return any results and it only has recursive calls to sym, then we know that there is no point searching that part of the tree because it will never produce any results.

But if the tree fragment for expr (that is p) can produce results, we take the resulting state new-m, remove sym from it and pass it along with p to the continuation c to build the rest of the tree.

And we’re done. Finally.

### Coda

Since this is just a proof of concept, there are so many things that could be done.

You could add tabling to goals that are called recursively, and only those goals. That would eliminate a class of divergence with out incurring the penalty of tabling every goal.

If substitutions are combinable, then tree fragments that have no recursion could be reduced down to a list of sub-substitutions at compile time. When the search hits that point, instead of having to walk the tree, you just combine each of the sub-substitutions with the current one to see if you get any results.

Also, any paths from the root to a leaf that contains mutually exclusive substitutions could be eliminated at compile time.

Or you could convert every path from the root to each leaf in to a set of unifications. Then rearrange them so that common prefixes of unifications could be done just once, improving performance.

And there are probably other optimizations that are possible that I haven’t thought of.

What I most appreciate about monads and the other abstractions from Category Theory is that they let you isolate the various pieces of essential complexity into independent pieces. Here, the logic monad dealt with backtracking, the logic arrow dealt with composing expressions and static optimizations while the state and continuation monad each dealt with their piece of the building of the arrow values. It certainly wasn’t easy figuring all that out, but it was very satisfying to see all the pieces come together.

### Next time …

I’m going to go back to the basics with monoids.

24 October 2013