My goal is to write a series of posts on creating a compiler for a restricted subset, NewL, of the Java language in Haskell. My tools are Haskell, Alex for scanning, and Happy for parsing.
Alex and Happy are basically the Haskell equivalent of Lex and Yacc. I know most of my compiler theory and practices from the Dragon book so I’ll be making references to it for the techniques used in semantic analysis and code generation. I’ve learned everything I know about Haskell from the really good book “Real World Haskell“, I cannot recommend it highly enough.
I plan to cover the following topics with Haskell source code and make files included.
1. Go over the grammar of NewL (this post).
2. Scanner for NewL using Alex.
3. Parser for NewL using Happy.
4. Error recover and reporting for the scanner and parser (I’m keeping it simple and abort on the first error and print the line and column number)
5. Symbol table and how it’s used in semantic analysis and code generation.
6. Revised NewL language and more symbol tables.
7. Type checking.
8. Intermediate code generation.
9. Machine code generation.
10. Garbage collection.
As you can see that’s a lot of stuff to cover. I’m warning you ahead of time that I’m very much a newbie when it comes to writing Haskell. I don’t even understand monads, so be forewarned the code get ugly compared to a true Haskell programmer.
With the above out of the way I’d like to go over the grammar of NewL. Please note that the semantics of NewL are similar to Java so if there’s any question about what some operation should do, we follow what Java does.
Terminal symbols are bolded and single character ones are in double quotes.
NewL Grammar
1. Program ::= MainClass { ClassDecl }
2. MainClass :== class Ident “{” public static void main “(” String “[" "]“ Ident “)” “{” Statement “}” “}“
3. ClassDecl ::= class Ident “{” {VarDecl} {MethodDecl} “}“
4. VarDecl ::= Type Ident “;“
5. MethodDecl ::= public Type Ident “(” FormalList “)” “{” {VarDecl} {Statement} return Exp “;” “}“
6. FormalList :== Type Ident { FormalRest } | ε
7. FormalRest :== “,” Type Ident
8. Type :== int “[" "]“ | boolean | int | String | Ident
9. Statement :== “{” { Statement } “}” | if “(“ Exp “)” Statement else Statement | while “(” Exp “)” Statement | System.out.println “(” Exp “)” “;” | Ident = Exp “;” | Ident “[" Exp "]” “=” Exp “;“
10. Exp ::= Exp Op Exp | Exp “[" Exp "]” | Exp “.” length | Exp “.” Ident “(” ExpList “)” |
Integer_Literal | String_Literal | true | false | Ident | this | new int “[" Exp "]” | new Ident “(” “)” | “!” Exp | “(” Exp “)“
11. ExpList :== Exp { ExpRest } | ε
12. ExpRest ::= “,” Exp
13. Ident ::= Letter { Letter | Digit | “_” }
14. Integer_Literal ::= Digit { Digit }
15. String_Literal ::= ““” { Character } “““
16. Op ::= && | < | +| - | *
The classic Hello World for NewL
class HelloWorld {
public static void main(String[] a) {
System.out.println(new Hello().World());
}
}
class Hello {
public String World() {
return "Hello World";
}
}
I found Parsec way more powerful, more expressive and easier to use than Alex and Happy. Maybe you should check it out: http://legacy.cs.uu.nl/daan/parsec.html
I am really looking forward to this.
This looks like it will be very interesting.
I think you have a typo in line 13 of the grammar. “Lestter” should be “Letter”.
Thanks George I’ve corrected it.
I thought about using Parsec but most of code for these posts I’m taking from my project for a compilers course I took last fall and I don’t want to rewrite the code.
very nice! looking forward to this!
i know parsec but i don’t have any experience using alex and happy so that will be neat to watch as you go along!
Something new?
Pingback: Having Fun with Happy (Compiler Series Part III) « A blog on math
Excellent series. I hope you will continue with this all the way to the end. Hoping to see a complete compiler written in Haskell soon.
Thanks Rohit. The next part is on doing machine code generation using LLVM. It should be exciting with the recent buzz around GHC and LLVM. http://donsbot.wordpress.com/2010/02/21/smoking-fast-haskell-code-using-ghcs-new-llvm-codegen/
So when do we get to see LLVM based codegen in action?
I posted the first part today. I’m reading up on LLVM in general but my free time is limited. I plan on posting the LLVM code generation for a simple function by the end of this week.
—Cheers
Bryan
Pingback: 2010 in review « Math and More