New Project

For the last month, I’ve been getting up to speed on a new project. For quite some time, I’ve been following the graal project. It’s a Java bytecode compiler that’s written in Java. I’ve always wanted to know the details of how the JVM works, so I’ve started porting Graal to Clojure. It’s a large project, so this is going to take a while. I’m calling this project ‘Crawl’, mostly because it rhymes with Graal and also because you have to crawl before you can walk. I’m going to keep the discussion of the JVM internals to a minimum. Instead, I’m going to focus on the differences between OOP and Functional Programming styles and use this code as an opportunity to explain certain concepts.

Compiling

The Java compiler takes Java source code and transforms it into Java bytecode that the JVM can execute. You can think of the JVM as a virtual CPU with its own particular instruction set. To actually execute the bytecode, it has to be transformed into machine language for the actual CPU the JVM is running on. This second transformation is what Graal does.

The JVM doesn’t actually compile all the methods in a program. It starts off by interpreting the bytecodes on the fly as needed. This is fairly slow, so when it detects that a method has been executed a number of times, it hands that method off to the bytecode compiler. The compiler takes the bytecodes of the method and produces native machine code which the runtime then uses when that method is called in the future. This is referred to as JIT (Just In Time) compiling.

Parsing

The interesting thing about methods/functions is that they can be represented as graphs. Not the kind of graphs you get out of Excel. These kinds of graphs.. Converting the sequence of bytecodes to a graph is the first step of compilation and what I’m covering in this post. This graph will consist of nodes which are called basic blocks. These are sequences of bytecode instructions that are guaranteed to be exectuted from start to finish. There are no jumps of any kind in a basic block and execution never jumps to a bytecode index in the middle of a basic block, only to the first instruction of a basic block. The edges of the graph indicate the sequence the blocks must be executed in. So each block has a list of successor blocks and a block must be executed before any one of its successors are. This is often referred to as a Control Flow Graph (CFG).

Anytime you have to take an undifferentiated stream of somethings and turn them into some kind of structure, you’re parsing. And when I think parsing, the first tool I reach for is the parser monad. You can read an explanation of this monad here

So we need to parse this sequence of bytes and turn it into a graph. And this is where the standard Java style diverges from a functional style. In the Java style, it’s natural to define a class for the nodes of the graph. If different types of nodes are needed, sub-classes of the basic node class can be created. In the basic node class, you can have a field that is a list of pointers to the nodes that are the successor of an instantiation of this block. That way, adding an edge to the graph involves mutating an object to add a pointer to the successors list.

In my port to Clojure, I chose to use a nested hash map to store the graph. Each node is assigned a unique key in the hash map and all information about a node is stored as a hash map value at that key. The successors of a node are kept as a list of keys of nodes.

The problem with the Java style is that when you mutate a node in any way, you lose the previous version of the graph. This means that you have to be very careful to only mutate the graph when you really want to. With the Clojure hash map, you’re free to change the graph speculatively because you can keep old versions until you’re sure you don’t need them.

Actual code

So how does this play out in actual code? Using a hash map in that fashion lets us build up the graph of basic blocks using the parser monad because the partially built graph can be part of the state passed along behind the scenes. Let’s start by looking at the initial state that needs to be created before parsing.

The function that gets called to parse the bytecode into basic blocks accepts a method object and looks like this:

(defn make-hir [method options]
  (let [initial-state {:method method
                       :bytecode (.code method)
                       :bci 0
                       :block-map (vector-of :int)}
        blocks (->> (basic-blocks initial-state)
                second
                :blocks)]
    blocks))

This function is eventually going to make the complete High-level Intermediate Representation, of which getting the basic blocks is the first step. We initialize initial-state with the method, the bytecode from the method, the starting bytecode index and an empty vector of ints that we’ll see used later. The basic-blocks function returns a vector, the second item being a new state hash map which has the parsed basic block graph at the :blocks key. The symbol blocks gets bound to this value. For now, we’ll just return this value back to the caller.

Now let’s see what basic-blocks looks like:

(def basic-blocks
  (do-state
    [method (fetch-val :method) 
     _ (m-map #(exception-block (.handlerBCI %))
              (.exceptionHandlers method))
     _ (none-or-more basic-block)]
    nil))

The do-state is boiler plate. fetch-val is a function that pulls a value out of the state hash map that’s being passed around behind the scenes, in this case, the value of the :method key which is the method object itself. This value is bound to the method symbol. The next clause uses m-map to create an exception block for each exception handler of the method at the bytecode index that the exception handler appears at in the bytecode of the method. Then the bytecode is parsed to extract 0 or more basic blocks.

(defn exception-block [bci]
  (with-state
    (new-block bci)
    (update-val :blocks assoc-in [bci :exception-block] true)))

Creating an exception block is very straight forward. Again, with-state is boiler plate. new-block just creates a minimal basic block in the graph at the given BCI. Remember that the graph is stored under the :blocks key in the state hash map being passed around, so the call to update-val sets the :exception-block flag in the block at the given bci to true, marking it as an exception block.

(defn new-block [bci]
  (with-state
    (no-block-at? bci) 
    (update-val :block-map (partial expand-block-map bci))
    (update-val :block-map assoc bci bci) 
    (update-val :blocks assoc-in [bci :start-bci] bci)))

The first thing new-block does is make sure a block doesn’t already exists that starts at this bci with a call to no-block-at?. Then new-block updates the :block-map vector by first ensuring that it’s long enough to cover the current bci, then adding a mapping for the current bci to it. The last line does the actual addition of a new block to the :blocks graph. Here is where we see that blocks added to the graph are indexed by the bci they start at and that they are a hash map that contains a key, :start-bci, which is its starting bci.

So we’ve seen that there’s a hash map being passed around which is modified using update-val and read using fetch-val and that the graph being built up out of the parsed bytecode is associated with the :blocks key. Now we get to the task of actually parsing the bytecode.

Parsing the regular basic blocks

In the basic-blocks function, we declared that a method has 0 or more basic blocks with the

(none-or-more basic-block)

line. This is the interesting thing about parsing using the parser monad. Other than some primitives like new-block, you end up just declaring what each parsed item is composed of. It’s like specifying a grammar for a language, which is exactly what we’re doing. We’re creating a DSL to declare the grammar for a valid bytecode sequence which is composed of basic blocks. The cool thing about this is that this grammar we’re specifying is actually executable code. So we get the luxury of declaring what we want to parse rather than how we want to parse it.

(def basic-block
  (with-state
    start-block
    (none-or-more instruction)
    block-terminator
    end-block))

And here’s where we start to see the power of this approach. A basic block is just declared to be started, then a sequence of zero or more non-block-terminator instructions followed by a single block terminator and then ended. The start-block and end-block functions just do some bookkeeping.

(def start-block
  (do-state
    [bci (fetch-val :bci)
     _ (add-block bci)
     _ (set-val :current-block bci)]
    bci))

(def end-block
  (do-state
    [{:keys [bci current-block]} (fetch-state)
     _ (update-val :blocks assoc-in [current-block :end-bci] (dec bci))
     _ (set-val :current-block nil)]
    nil))

The start-block function just adds a block at the current bci and sets the :current-block key. The end-block function just sets the :end-bci key in the block that has just been parsed to the bci that it ended on and then clears the :current-block value.

(defn add-block [bci]
  (with-state
    (m-plus (new-block bci) 
            (existing-block bci)
            (split-block bci))))

Here we see something new. Functions like basic-block, add-block and new-block can be thought of as parsers. That is, they define how to parse some bytecode. So far, the only way of composing parsers is sequentially. So a parser like basic-block is composed of a series of parsers that operate sequentially on the bytecode as they each read it.

But m-plus is a conditional composition. It works by trying each parser in sequence until it finds the first one that succeeds. In this case, add-block first tries to add a new block, but if a block already includes that bci, new-block fails. Then existing-block checks to see if a block begins at that bci and returns that block if it does. But if a block does not start at the given bci, but does include it, then that block is split into two blocks at the given bci by split-block.

(def instruction
  (do-state
    [bci (fetch-val :bci)
     blocks (fetch-val :blocks)
     current-block (fetch-val :current-block)
     opcode consume-opcode 
     :when (and (not (block-end-bytecode opcode))
                (or (= current-block bci)
                    (not (contains? blocks bci))))
     _ (m-seq (repeat (dec (instruction-len opcode)) 
                      consume-bytecode))]
    nil))

The instruction function looks complicated, but it can easily be broken down. First, we get the bci, the current graph (:blocks) and block index that we’re currently building from the state. Then we get the next opcode. The :when clause just takes an expression that evaluates to a boolean. If the the result is false, the instruction parser fails at that point and any modifications made to the state are discarded. This is where the power of functional programming really shows up. We are free to modify the state any way we choose, knowing that we can just discard those changes if they don’t work out. The final m-seq clause just consumes the rest of the bytecodes of the instruction.

So this parser reads the next instruction from the bytecode stream. But if that instruction should end the block, it fails so that another parser can parse that instruction properly. This parser also fails if the bci of the opcode that was just read already is known to start another block. Which would be the case if it was the target of a branch instruction, which would also end the current block.

(def block-terminator
  (with-state
    (m-plus cross-block-boundary
            return-inst
            throw-inst
            if-inst
            goto-inst
            tableswitch-inst
            lookupswitch-inst
            jsr-inst
            ret-inst
            invoke-inst)))

And we have one more parser to look at. The block-terminator parser extracts an opcode from the bytecode stream that will terminate this basic block. But it first checks to see if we’ve crossed into a block that has already been created because it was the target of a jump or branch instruction.

Remainders

There’s quite a bit more code to fully specify the grammar to correctly parse a bytecode stream into a Control Flow Graph. But it all uses the same DSL we’ve described so far.

From those and some primitive parsers like consume-bytecode, you can declare a grammar to parse a bytecode stream into a graph of basic blocks. And you do little in the way of telling ‘how’ the parsing should happen and just describe the ‘what’ you want parsed. It’s a pretty powerful concept that leads to very clean code. Which would you rather debug? A declaration of a grammar that you can reason about in one file. Or a bunch of classes spread over multiple files that you need a complicated IDE to track state as you step through the code to find out what’s going on.

Performance

This code as written is going to run poorly. There’s a lot of overhead in the standard monad implementation. However, because it’s written using a DSL, all we have to do is re-write the implementation of the DSL and we get fast code. The grammar we define doesn’t have to change because it’s not describing how we parse a bytecode stream, but rather what we want parsed.

Finally

The code for this project is here on github. It’s not going to do much yet since there’s so much more work to do. But you can track my progress if you’re interested.

Jim Duey 01 May 2012
blog comments powered by Disqus