Programming assignments 2 through 5 will direct you to design and build an interpreter for Cool. Each assignment will cover one component of the interpreter: lexical analysis, parsing, semantic analysis, and operational semantics. Each assignment will ultimately result in a working interpreter phase which can interface with the other phases.
You will complete this assignment using Python and implement the parsing component of an interpreter.
You may work in a team of two people for this assignment. You may work in a team for any or all subsequent programming assignments. You do not need to keep the same teammate. The course staff are not responsible for finding you a willing teammate.
For this assignment you will write a parser using a parser generator. You will describe the Cool grammar in an appropriate input format and the parser generator will generate actual code (in Python). You will also write additional code to unserialize the tokens produced by the lexer stage and to serialize the abstract syntax tree produced by your parser.
You must create four artifacts:
A program that reads CL-Lex
format (as described in P2)
from standard input. The CL-Lex
data will always be well-formed (i.e., there will be no syntax
errors in the CL-Lex
file itself). However, the
CL-Lex
file may describe a sequence of Cool tokens that do
not form a valid Cool program.
Your program must either indicate that there is an error in the Cool
program described by the CL-Lex
data (e.g., a parse error
in the Cool file) or emit a CL-AST
-formatted serialized Cool
abstract syntax tree. Your program's main parser component must be
constructed by a parser generator. The "glue code" for unserializing
tokens and serializing the
resulting abstract syntax tree should be written by hand.
Your main program should be in a module named parser. Thus, the following two commands should produce the same output (for a valid Cool program):
python3 parser.py < file.cl-lex > file.cl-ast
cool --parse file.cl
Note that <
performs input redirection
to load standard input from a file, and >
performs output redirection
to save standard output to a file. Your
program will consist of a number of Python files.
readme.txt
describing your
design decisions and choice of test cases. See the grading rubric. A
few paragraphs should suffice.
references.txt
providing a
citation for each resource you used (excluding class notes, and
assigned readings) to complete the assignment. For example, if you
found a Stack Overflow answer helpful, provide a link to it.
Additionally, provide a brief description of how the resource helped
you.
good.cl
and bad.cl
. The first should parse
correctly and yield an abstract syntax tree. The second should contain
an error.
The line number for an expression is the line number of the first token that is part of that expression.
while x <=
99 loop
x <- x + 1
pool
The while
expression is on line 5, the x <=
99
expression is on line 5, the 99
expression is on
line 6, and the x <- x + 1
and x + 1
expressions are on line 7. The line numbers for tokens are present in
the serialized token .cl-ast
file.
Your parser is responsible for keeping track of the line numbers (both for the output syntax tree and for error reporting).
To report an error, write the string
ERROR: line_number: Parser: message
to standard output and terminate the program. You may write whatever you want in the message, but it should be fairly indicative.
class Cons inherits List + IO {
If there are no errors in file.cl-lex
your program should print
the abstract syntax tree in CL-AST format to standard output (the default output of print()
).
This is the same format used by the reference compiler to create a
file.cl-ast
file when invoked with the --parse
flag.
The general format of CL-AST follows the Cool Reference
Manual Syntax chart. Basically, we do a
pre-order
traversal of the abstract syntax tree, writing down every node as we come to it.
We will now describe exactly what to output for each kind of node. You can view this as specifying a set of mutually-recursive tree-walking functions. The notation "superclass:identifier" means "output the superclass using the rule (below) for outputting an identifier". The notation "\n" means "output a newline".
class List {
-- Define operations on lists.
cons(i : Int) : List {
(new Cons).init(i, self)
};
};
CL-AST
output with comments:
1 -- number of classes 3 -- line number of class name identifier List -- class name identifier no_inherits -- does this class inherit? 1 -- number of features method -- what kind of feature? 6 -- line number of method name identifier cons -- method name identifier 1 -- number of formal parameters 6 -- line number of formal parameter identifier i -- formal parameter identifier 6 -- line number of formal parameter type identifier Int -- formal parameter type identifier 6 -- line number of return type identifier List -- return type identifier 7 -- line number of body expression dynamic_dispatch -- kind of body expression 7 -- line number of dispatch receiver expression new -- kind of dispatch receiver expression 7 -- line number of new-class identifier Cons -- new-class identifier 7 -- line number of dispatch method identifier init -- dispatch method identifier 2 -- number of arguments in dispatch 7 -- line number of first argument expression identifier -- kind of first argument expression 7 -- line number of the identifier i -- what is the identifier? 7 -- line number of second argument expression identifier -- kind of second argument expression 7 -- line number of the identifier self -- what is the identifier?
The CL-AST
format is quite verbose, but it is particularly
easy for later stages (e.g., the type checker) to read in again without
having to go through all of the trouble of "actually parsing". It will
also make it particularly easy for you to notice where things are going
awry if your parser is not producing the correct output.
Writing the code to output a CL-AST
file given an AST may take a bit of time but it should not be difficult;
the reference compiler implementation (OCaml, so not directly comparable) does it
in 116 lines and cleaves closely to the structure given above.
You must use a parser generator or similar library for this assignment. In class, we discussed Ply, a parser generator for Python. You will find the documentation to be particularly helpful.
There exist similar tools for other programming languages:
All of these parser generators are derived from yacc
(or bison
), the original parser generator for
C. Thus you may find it handy to refer to the Yacc
paper or the
Bison manual. When you're reading, mentally translate the C code
references.
You can do basic testing as follows:
cool --lex file.cl
cool --out reference --parse file.cl
python parser.py < file.cl-lex > file.cl-ast
diff -b -B -E -w file.cl-ast reference.cl-ast
Note that the reference lexer is being explicitly instructed to save
the output to reference.cl-ast
(the --out
flag).
Since P3 prints CL-AST to standard output (rather than writing it to a file),
we use a redirect to save this output to a file (> file.cl-ast
).
Finally, diff
is a command line tool for comparing the
contents of two files. You may also find Diffchecker
to be helpful.
You may find the reference compiler's
--unparse
option useful for debugging your
.cl-ast
files.
If you are failing every negative test case, it is likely that you are not handling cross-platform compatibility correctly on all of your inputs and outputs.
If you are still stuck, you can post on the forum or approach the professor.
You must turn in a tar.gz
file containing these files:
readme.txt
: your README file
references.txt
: your file of citations
team.txt
: a file listing only the SLU
email IDs of both team members (see below).
good.cl
: a novel positive testcase
bad.cl
: a novel negative testcase
parser.py
(the entrypoint for your parser) 3aeee2ffb6bdcec698011572b6bbcaf180807419
in a comment
parsetab.py
The following directory layout is recommended for your tar.gz
file. This is the default layout generated by make
(see below) should
you following the same project structure from the Lexer example in class.
fullsubmit.tar.gz
-- bad.cl
-- good.cl
-- Makefile
-- parser.py
-- readme.txt
-- references.txt
-- team.txt
Note, you can use the CS-364 Makefile
to generate a
submission archive:
make fullsubmit
The Makefile
is available on Sakai and here.
Be sure to update the IDENTIFIER
and EXECUTABLE
variables appropriately.
Note that you do not need to run make
to compile anything for
this project; we are just using a part of this script to generate the
tarball that you can submit.
You must complete this assignment in a team of two. Teamwork imposes burdens of communication and coordination, but has the benefits of more thoughtful designs and cleaner programs. Team programming is also the norm in the professional world.
Students on a team are expected to participate equally in the effort and to be thoroughly familiar with all aspects of the joint work. Both members bear full responsibility for the completion of assignments. Partners turn in one solution for each programming assignment; each member receives the same grade for the assignment. If a partnership is not going well, the instructor will help to negotiate new partnerships. Teams may not be dissolved in the middle of an assignment.
both team members should
submit to SLUGS. The submission should include the file
team.txt
, a two-line, two-word flat ASCII text file that
contains the email ID of both teammates. Don't include the @stlawu.edu bit. Example: If sjtime10
and
kaangs10
are working together, both kaangs10
and sjtime10
would submit
fullsubmit.tar.gz
with a team.txt
file that
contains:
kaangs10
sjtime10
Then, sjtime10
and kaangs10
will both receive the same
grade for that submission.
This seems minor, but in the past students have failed to correctly format this file. Thus you now get a point on this assignment for either formatting this file correctly.
P3 Grading (out of 50 points):
team.txt
file
good.cl
and bad.cl
files