We’re going to use Alex to write the scanner for NewL. Alex is the Haskell equivalent of Lex. The Alex manual is reasonably thorough and has several excellent examples of simple scanners written using it.
The basic idea for Alex is to pass it your Haskell code written up in the special Alex syntax and it spits out a plain Haskell source code file. There are several wrapper options available for using Alex as (1) a standalone scanner or (2) a module that gets compiled in with your parser. For this post we’ll be using Alex as a standalone scanner that takes input from standard in and outputs a list of tokens.
Alex like all other scanner generators uses regular expressions for specifying tokens. The basic format of an Alex file is
regexp { haskell-code }
where haskell-code is an anonymous function that takes a string (the string that matched the regexp) and returns a token. A typically example is
[0-9]+ { \s -> TIntLiteral s }
The \s is Haskell’s way of defining an anonymous function with one parameter named s. The “->” signals the start of the function body. Thus “\s -> TIntLiteral s” is a function that takes a String s as the argument and returns TIntLiteral s.
To define our tokens we make a new data type called Token with a list of the possible tokens.
data Token = TIntLiteral String |
TIdentifier String
The above code defines two token types one is a literal int such as “10″ and the other is an identifier such as “myVariable”.
Alex has a nice option %wrapper "basic" that defines the function alexScanTokens which takes a string and returns the list of tokens. Using this we define our main function as
main = do
s <- getContents
print (alexScanTokens s)
The main function gets the string from the standard input and puts it in s and then uses
print (alexScanTokens s) to generate the list of tokens and print them out.To test the scanner run
echo "10 \"StringLiteral\"" | ./a.out
the above command echos 10 “StringLiteral” to the standard output and then pipes it into the scanner, a.out (I assume a.out is the scanner executable). The output should be [TIntLiteral 10, TStringLiteral "StringLiteral"].
I’ve included a listing of the NewL Alex scanner below. I strongly recommend reading the Alex manual introduction. For compiling the listing use “alex Scanner.x” to generate the Haskell code and then run “ghc Scanner.hs” to generate the executable.
{
module Main (main) where
}
%wrapper "basic"
$digit = 0-9 -- digits
$alpha = [a-zA-Z] -- alphabetic characters
$graphic = $printable # $white
@string = \" ($graphic # \")* \"
tokens :-
$white+ ;
"class" { \s -> TClass }
"new" { \s -> TNew }
"String" { \s -> TString }
"static" { \s -> TStatic }
"void" { \s -> TVoid }
"main" { \s -> TMain }
"return" { \s -> TReturn }
"public" { \s -> TPublic }
"extends" { \s -> TExtend }
"int" { \s -> TInt }
"boolean" { \s -> TBool }
"if" { \s -> TIf }
"else" { \s -> TElse }
"true" { \s -> TTrue }
"false" { \s -> TFalse }
"this" { \s -> TThis }
"length" { \s -> TLength }
"while" { \s -> TWhile }
$digit+ { \s -> TIntLiteral (read s) }
"." { \s -> TPeriod }
"&&" { \s -> TOp (head s) }
"!" { \s -> TNot }
[\+\-\*\/] { \s -> TOp (head s) }
"<" { \s -> TComOp (head s) }
"=" { \s -> TEquals }
";" { \s -> TSemiColon }
"(" { \s -> TLeftParen }
")" { \s -> TRightParen }
$alpha[$alpha $digit \_ \']* { \s -> TIdent s }
@string { \s -> TStringLiteral (init (tail s)) -- remove the leading " and trailing " }
"{" { \s -> TLeftBrace }
"}" { \s -> TRightBrace }
"," { \s -> TComma }
"[" { \s -> TLeftBrack }
"]" { \s -> TRightBrack }
"System.out.println" { \s -> TPrint }
{
-- Each action has type :: String -> Token
-- The token type:
data Token =
TLeftBrace |
TRightBrace |
TComma |
TLeftBrack |
TRightBrack |
TClass |
TPublic |
TString |
TStatic |
TVoid |
TMain |
TExtend |
TInt |
TBool |
TIf |
TElse |
TTrue |
TFalse |
TThis |
TLength |
TWhile |
TNew |
TOp Char |
TComOp Char |
TNot |
TEquals |
TPeriod |
TSemiColon |
TLeftParen |
TRightParen |
TIdent String |
TPrint |
TIntLiteral Int |
TStringLiteral String |
TReturn
deriving (Eq,Show)
main = do
s <- getContents
print (alexScanTokens s)
}

[...] Go over the grammar of NewL (this post). 2. Scanner for NewL using Alex. 3. Parser for NewL using Happy. 4. Error recover and reporting for the scanner and parser [...]
Pingback by Writing A Compiler In Haskell (Compiler Series Part I) « A blog on math — April 6, 2010 @ 9:27 pm |
[...] my previous post on the scanner I mentioned that we used the “basic” wrapper for Alex which provided a [...]
Pingback by Playing With Alex Again (Compiler Series Part IV) « A blog on math — April 13, 2010 @ 9:58 pm |
Your sources @github include the binaries too. It unnecessarily inflates the repo and download size. You might want to get rid of them.
Comment by Rohit Garg — July 23, 2010 @ 5:50 pm |
I was lazy and didn’t exclude them from the source versioning. I’ll make a .gitignore file to remove the binary files.
Comment by Bryan Bell — July 26, 2010 @ 9:56 pm |
[...] Playing with Alex (Compiler Series Part II) January 20104 comments [...]
Pingback by 2010 in review « Math and More — January 2, 2011 @ 8:58 am |