Haskell & LLVM Talk

I did a talk at the Linux Users’ Group of Davis yesterday evening that went very well. The QA and discussion afterwards was much better than previous talks I’ve given to students, the attendees from the Bay Area Haskell Users’s group were very sharp and had good comments.

Photos and slides are posted here and the slides are also viewable below.

The Basics of LLVM (Compiler Series Part X)

For the first pass at using LLVM for code generation we’re only generating code for simple expression of the form “1 + 2″ etc. This frees us from having to consider many many issues such as how to deal with the memory layout of objects. In future passes we relax the conditions on code generation.

I advise you to read the LLVM Haskell documentation at http://hackage.haskell.org/package/llvm (it’s under the modules section) to familiarize yourself with the API. There are two parts to the LLVM Haskell bindings, a low level interface to the C API, and a high level wrapper. I tried using the high level wrapper but the type declarations confused me. I’m not that fluent in Haskell and the high level wrapper makes extensive use of type classes, monads, and a few other constructs. I decided to use the low level wrapper (Core/Util.hs) around the C API, which uses less of Haskell’s advanced type features.

With the above said, I want to dive in and create some LLVM IR code. A simple NewL program is:

class Main {
    public static void main(string[] args) {
        System.out.println("");
    }
}
class Foo 
{
   public int foo()
   { 
      return 1 + 2;
   }
}

For now I’m ignoring the “Main” class and concentrating only on:

   public int foo()
   { 
      return 1 + 2;
   }

Our plan is to traverse the parse tree and generate code at each node. The parse tree for “foo” is shown below:

To use LLVM to generate code we need to know some LLVM concepts. To use LLVM we need to create a module by calling “Util.createModule.” Util is a wrapper that provides some syntactic sugar for calling the C API of LLVM. One note is that almost any call to the LLVM returns “IO returnType” that is all the return types are in the IO monad. The reason is that most LLVM calls are not pure functions.

To create function a function LLVM two things are needed the module, m, that the function will reside in and the function type definition. The below code creates the function type defintion.

getLLVMType :: TypeNames.Type -> FFI.TypeRef
getLLVMType TypeInt = FFI.int64Type
getLLVMType TypeBool = FFI.int1Type
getLLVMType _ = error("Unimplemented type")

getArgTypes :: FormalList -> [FFI.TypeRef]
getArgTypes FEmpty = []
getArgTypes (FormalList theType ident FEmpty) = [(getLLVMType theType)]
getArgTypes (FormalList theType ident rest) =
    [(getLLVMType theType)] ++ (getArgTypes rest) 

methodType = Util.functionType False (getLLVMType theType) (getArgTypes formalList)

With “methodType” defined we add the function (which is so far empty) to the module, m, by calling

  method <- Util.addFunction m FFI.ExternalLinkage methodName methodType

We now have an empty function called “method.” We create a builder to emit instructions and add a basic block to the method using the below code:

  bld <- U.createBuilder
  entry <- U.appendBasicBlock method "entry"
  U.positionAtEnd bld entry

With the basic block added it is time to emit the code to compute the expression “return 1+2;” We do this with the following code.

  withBuilder bld $ \bldRef ->
      do
        expVal <- codeGenExp exp bld context
        FFI.buildRet bldRef expVal 

The call to “withBuilder” takes some explaining. It takes a “Builder” object and a function of type “BuilderRef -> IO a” and returns “IO a” where “a” is any type. We need “withBuilder” because the function “FFI.buildRet” has type “BuilderRef -> ValueRef -> IO ValueRef.” That is we can’t pass a “Builder” object to “FFI.buildRet” directly we have to pass a “BuilderRef” to it.

If you’re paying attention you notice that only the line “FFI.buildRet bldRef expVal ” actually emits any code. Emitting the code to generate the value of the expression “exp” is left the to the “codeGenExp” function. That code is not interesting and you can view it in the code listing at the end of this post. You are probably interested in what kind of LLVM IR code we generate from the function. See the below listing for what it looks like.

define i64 @foo()
{
entry:
  ret i64 3
}

The “i64″ symbol means that the function returns a 64 bit integer. And “ret i64 3″ means return “3″ as a 64 bit integer.

In my next post I’ll continue explore using LLVM to generate code. Specifically I want to show how to represent strings and other composite objects in LLVM.

The code listing for the entire code generation module is shown below. If you want to compile the code check it out of the github repository at http://github.com/bjwbell/NewL-Compiler and read the README file in the llvm_code_generation1 directory.

module CodeGen where
import TypeNames
import Data.Word
import Data.Int(Int32)
import Data.Typeable as T
import Foreign.C.String as CS
import System.IO.Unsafe
import LLVM.FFI.Core as FFI
import LLVM.Core.Util as U

codeGen :: Program -> IO String
codeGen (Program mainClass classDeclList) = codeGenClassDeclList classDeclList

codeGenClassDeclList :: ClassDeclList -> IO String
codeGenClassDeclList CEmpty = return "Ok"
codeGenClassDeclList (ClassDeclList classDecl classDeclList)  = 
    do
      result <- codeGenClassDecl classDecl
      if result /= "Ok"
        then return "fail: codeGenClassDeclList"
        else codeGenClassDeclList classDeclList
      return "Ok"

codeGenClassDecl :: ClassDecl -> IO String
codeGenClassDecl (ClassDecl className varDeclList methodDeclList) = 
    do
      m <- U.createModule className
      codeGenMethodDeclList methodDeclList m [("this", className)]

codeGenMethodDeclList :: MethodDeclList -> U.Module -> [(String, String)] -> IO String 
codeGenMethodDeclList MEmpty m context = return "Ok" -- no methods so automatically successful
codeGenMethodDeclList (MethodDeclList methodDecl methodDeclList) m context =
    do
      result <- codeGenMethodDecl methodDecl m context
      if result /= "Ok"
        then return "fail: codeGenMethodDeclList"
        else codeGenMethodDeclList methodDeclList m context


getLLVMType :: TypeNames.Type -> FFI.TypeRef
getLLVMType TypeInt = FFI.int64Type
getLLVMType TypeBool = int1Type
getLLVMType _ = error("Unimplemented type")

getArgTypes :: FormalList -> [FFI.TypeRef]
getArgTypes FEmpty = []
getArgTypes (FormalList theType ident FEmpty) = [(getLLVMType theType)]
getArgTypes (FormalList theType ident rest) =
    [(getLLVMType theType)] ++ (getArgTypes rest) 

codeGenMethodDecl :: MethodDecl -> U.Module -> [(String, String)] -> IO String
codeGenMethodDecl (MethodDecl theType methodName formalList varDeclList statementList exp) m context = do
  let methodType = U.functionType False (getLLVMType theType) (getArgTypes formalList)
  method <- U.addFunction m FFI.ExternalLinkage methodName methodType
  bld <- U.createBuilder
  entry <- U.appendBasicBlock method "entry"
  U.positionAtEnd bld entry
  withBuilder bld $ \bldRef ->
      do
        expVal <- codeGenExp exp bld context
        FFI.buildRet bldRef expVal 
  FFI.dumpValue method
  return "Ok"

codeGenExp :: Exp -> U.Builder -> [(String, String)] -> IO FFI.ValueRef
codeGenExp (ExpOp exp1 char exp2) bld context = 
    do
      val1 <- codeGenExp exp1 bld context
      val2 <- codeGenExp exp2 bld context
      case char of 
        '+' -> U.withBuilder bld $ \ bldRef ->
               withCString "" $ \ cString -> buildAdd bldRef val1 val2 cString
        '-' -> U.withBuilder bld $ \ bldRef ->
               withCString "" $ \ cString -> buildAdd bldRef val1 val2 cString
        '*' -> U.withBuilder bld $ \ bldRef ->
               withCString "" $ \ cString -> buildAdd bldRef val1 val2 cString
        '/' -> U.withBuilder bld $ \ bldRef ->
               withCString "" $ \ cString -> buildAdd bldRef val1 val2 cString
        _ -> error ("Unrecognized operator, " ++ (show char) ++ " in ExpOp")

codeGenExp (ExpInt value) bld context = 
    do
      return (constInt int64Type (fromIntegral value) (fromIntegral 64)) 
codeGenExp (ExpBool True) bld context = 
    do
      return (constInt int1Type (fromIntegral 1) (fromIntegral 1)) 
codeGenExp (ExpBool False) bld context = 
    do
      return (constInt int1Type (fromIntegral 0) (fromIntegral 1)) 

Code Generation With LLVM (Compiler Series Part IX)

With type checking done, it’s time to generate code from the parse tree. Code generation typically consists of the following steps:
1. Generate intermediate code.
2. Generate the machine or byte code from the intermediate code.

The following are options for doing the above two tasks:

  • Generate intermediate and machine code for x86 or RISC using our own methods (tedious and error prone).
  • Generate the intermediate code and then byte code for either the JVM or CLI.
  • Outsource the problem to LLVM.

I’m a lazy fellow so I pick choice #3 of outsourcing the code generation to LLVM. This is a particularly apt choice given the exciting intersection of Haskell with LLVM recently via David Terei’s honors thesis on using LLVM as the backend for the GHC compiler. This work is ongoing at http://hackage.haskell.org/trac/ghc/wiki/Commentary/Compiler/Backends/LLVM and for an explanation of why LLVM is used as the backend see http://blog.llvm.org/2010/05/glasgow-haskell-compiler-and-llvm.html.

Bryan O’Sullivan put together bindings to LLVM at http://code.haskell.org/llvm/. Bryan and the other author, Lennart Augustsson, also posted examples of using the bindings at http://www.serpentine.com/blog/2008/01/03/llvm-bindings-for-haskell/, http://augustss.blogspot.com/2009/01/llvm-llvm-low-level-virtual-machine-is.html, and http://augustss.blogspot.com/2009/01/llvm-arithmetic-so-we-want-to-compute-x.html.

I’m reading up on LLVM and will post more as I learn more about LLVM in general and the Haskell LLVM bindings in particular.