Programming assignments 2 through 5 will direct you to design and build an interpreter for Cool. Each assignment will cover one component of the interpreter: lexical analysis, parsing, semantic analysis, and operational semantics. Each assignment will ultimately result in a working interpreter phase which can interface with the other phases.
For this assignment, you will be programming in Java and performing lexical analysis.
You must work in a team of two people for this assignment. You do not need to keep the same teammate, but you are encouraged to do so.
For this assignment you will write a lexical analyzer, also called a scanner, using a lexical analyzer generator. You will describe the set of tokens for Cool in an appropriate input format and the analyzer generator will generate actual code. You will then write additional code to serialize the tokens for use by later interpreter stages.
You must create four artifacts:
file.cl
). That argument will be an ASCII text Cool source file.
Your program must either indicate that there is an error in the input
(e.g., a malformed string) or emit a CL-Lex
-formatted serialized list
of Cool tokens. Your program's main lexer component must be constructed by
a lexical analyzer generator. The "glue code" for processing command-line
arguments and serializing tokens should be written by hand. If your
Java project compiles to lexer.jar
, the following two commands should
produce the same output (for a valid Cool program):
java -jar lexer.jar file.cl > file.cl-lex
cool --lex file.cl
Note that >
performs output redirection
to save standard output to a file.
Your program will consist of a number of Java files.
readme.txt
describing your
design decisions and choice of test cases. See the grading rubric. A few
paragraphs should suffice.
references.txt
providing a
citation for each resource you used (excluding class notes, and
assigned readings) to complete the assignment. For example, if you
found a Stack Overflow answer helpful, provide a link to it.
Additionally, provide a brief description of how the resource helped
you.
good.cl
and bad.cl
. The first should
lex correctly and yield a sequence of tokens. The second should contain an
error.
You must use JFlex (or a similar tool or library). Do not write your entire lexer by hand. Parts of it must be tool-generated from regular expressions you provide.
The first line in a file is line 1. Each successive '\n'
newline
character increments the line count. Your lexer is responsible for keeping
track of the current line number.
To report an error, write the string
ERROR: line_number: Lexer: message
to standard output (System.out
) and terminate the program. You may
write whatever you want in the message, but it should be fairly indicative.
Your program should not print any tokens if there is a lexing error.
If there are no errors in file.cl
your program should print
a list of scanned tokens in CL-Lex format to standard output (System.out
).
This is the same format used by the reference compiler to create a
file.cl-lex
when invoked with the --lex
flag. Each token is
represented by a pair (or triplet) of lines. The first line holds the line
number. The second line gives the name of the token. The optional third
line holds additional information (i.e., the lexeme) for
identifiers, integers, strings and types. For example, for an integer
token the third line should contain the decimal integer value.
.cl-lex
output:
The official list of token names is:
In general the intended token is evident. For the more exotic names:
The CL-Lex
format is exactly the same as the one generated
by the reference compiler when you specify --lex
. In addition,
the reference compiler (and your upcoming PA3 parser!) will read
CL-Lex
formatted input instead of .cl
files.
You must use a lexical analyzer generator or similar library for this assignment. In class, we discuss JFlex, a lexical analyzer generator for Java. You will find the documentation to be particularly helpful. Because you are producing a Java program for this assignment, it is highly encouraged that you use JFlex.
For your reference, there exist similar tools for other programming languages:
ocamllex
and it comes
with any OCaml distribution.
lexeme
is available, but you must
download it yourself.
jison
is available. You must download it
yourself.
ply
is available, but you must download it
yourself. (We will use this for PA3.)
All of these lexical analyzer generators are derived from lex
(or
flex
), the
original
lexical analyzer generator for C. Thus you may find it handy to
refer to the
Lex paper or the
Flex manual. When you're reading, mentally translate the C code
references.
A copy of JFlex is available here.
To work with the provided ant
build file, you should place this
jar
in the lib
directory at the top level of your project.
You can do basic testing with something like the following:
cool --out reference --lex file.cl
ant
java -jar lexer.jar file.cl > file.cl-lex
diff -b -B -E -w file.cl-lex reference.cl-lex
Note that the reference lexer is being explicitly instructed to save
the output to reference.cl-lex
(the --out
flag).
Since P2 prints CL-Lex to standard output (rather than writing it to a file),
we use a redirect to save this output to a file (> file.cl-lex
).
Finally, diff
is a command line tool for comparing the
contents of two files. You may also find Diffchecker
to be helpful.
You may find the reference compiler's --unlex
option useful
for debugging your .cl-lex
files.
Need more testcases? Any Cool file you have (including the one you wrote
for PA1) works fine. The contents of cool-examples.zip
should
be a good start. There's also one among the PA1 hints. You'll want to make
more complicated test cases—in particular, you'll want to make
negative testcases (e.g., testcases with malformed string
constants).
If you are still stuck, you can post on the forum or approach the professor.
You must turn in a tar.gz
file containing these files:
readme.txt
: your README file references.txt
: your file of citationsteam.txt
: a file listing only the SLU
email IDs of both team members (see below).
good.cl
: a novel positive testcase bad.cl
: a novel negative testcase build.xml
: your ant build file for this project
main.java
(contains the main method) some_file.flex
(or similar)
If your regular expressions and lexer definition are in some other
files (e.g., lexer.flex
), be sure to
include those as well!
Do not submit any pre-compiled jar
files with your submisison.
This includes, but is not limited to, your lexer, JFlex, etc.
The following directory layout is recommended for your tar.gz
file. This is the default layout generated by ant
(see below) should
you following the same project structure from the Lexer example in class.
fullsubmit.tar.gz
-- bad.cl
-- build.xml
-- good.cl
-- readme.txt
-- references.txt
-- src/
-- lexer.flex
-- Main.java
-- ...
-- team.txt
Ant
Build File
Ant
is a build system for Java (much like make
or
dune
).
You may download a (mostly) pre-configured build file (build.xml
)
here.
Add this file to the root of your java project.
Update the following entries in build.xml
:
main-class
property (line 8) to be the name of the class in your project
containing main()
flex-file
property (line 9) to be the name of your JFlex specification
project-name
must remain lexer
for SLUGS to correctly compile and run your code.
There are three (3) key operations that build.xml
supports:
ant dist
: compile your project (this is the same as just running ant
)
ant clean
: remove all generated Java files and compiled Java classes
ant fullsubmit
: generate the tar.gz
submission archive.
Note that ant
will print out the list of files that are found in this archive.
Review this list to be sure you are submitting what you think you are submitting!
You must complete this assignment in a team of two. Teamwork imposes burdens of communication and coordination, but has the benefits of more thoughtful designs and cleaner programs. Team programming is also the norm in the professional world.
Students on a team are expected to participate equally in the effort and to be thoroughly familiar with all aspects of the joint work. Both members bear full responsibility for the completion of assignments. Partners turn in one solution for each programming assignment; each member receives the same grade for the assignment. If a partnership is not going well, the instructor will help to negotiate new partnerships. Teams may not be dissolved in the middle of an assignment.
both team members should
submit to SLUGS. The submission should include the file
team.txt
, a two-line, two-word flat ASCII text file that
contains the email ID of both teammates. Don't include the @stlawu.edu bit. Example: If sjtime10
and
kaangs10
are working together, both kaangs10
and sjtime10
would submit
fullsubmit.tar.gz
with a team.txt
file that
contains:
kaangs10
sjtime10
Then, sjtime10
and kaangs10
will both receive the same
grade for that submission.
This seems minor, but in the past students have failed to correctly format this file. Thus you now get a point on this assignment for either formatting this file correctly.
P2 Grading (out of 50 points):
team.txt
file
good.cl
and bad.cl
files
lexer.flex
file defining lexer)