CS-364 P3 - Kevin Angstadt :: Teaching

Programming assignments 2 through 5 will direct you to design and build an interpreter for Cool. Each assignment will cover one component of the interpreter: lexical analysis, parsing, semantic analysis, and operational semantics. Each assignment will ultimately result in a working interpreter phase which can interface with the other phases.

You will complete this assignment using Python and implement the parsing component of an interpreter.

You may work in a team of two people for this assignment. You may work in a team for any or all subsequent programming assignments. You do not need to keep the same teammate. The course staff are not responsible for finding you a willing teammate.

Goal

For this assignment you will write a parser using a parser generator. You will describe the Cool grammar in an appropriate input format and the parser generator will generate actual code (in Python). You will also write additional code to unserialize the tokens produced by the lexer stage and to serialize the abstract syntax tree produced by your parser.

Specification

You must create four artifacts:

A program that reads CL-Lex format (as described in P2) from standard input. The CL-Lex data will always be well-formed (i.e., there will be no syntax errors in the CL-Lex file itself). However, the CL-Lex file may describe a sequence of Cool tokens that do not form a valid Cool program.

Your program must either indicate that there is an error in the Cool program described by the CL-Lex data (e.g., a parse error in the Cool file) or emit a CL-AST-formatted serialized Cool abstract syntax tree. Your program's main parser component must be constructed by a parser generator. The "glue code" for unserializing tokens and serializing the resulting abstract syntax tree should be written by hand.

Your main program should be in a module named parser. Thus, the following two commands should produce the same output (for a valid Cool program):
```
              
              python3 parser.py < file.cl-lex > file.cl-ast
              cool --parse file.cl
              
            
```
Note that < performs input redirection to load standard input from a file, and > performs output redirection to save standard output to a file. Your program will consist of a number of Python files.
A plain ASCII text file called readme.txt describing your design decisions and choice of test cases. See the grading rubric. A few paragraphs should suffice.
A plain ASCII text file called references.txt providing a citation for each resource you used (excluding class notes, and assigned readings) to complete the assignment. For example, if you found a Stack Overflow answer helpful, provide a link to it. Additionally, provide a brief description of how the resource helped you.
Testcases good.cl and bad.cl. The first should parse correctly and yield an abstract syntax tree. The second should contain an error.

You must use ply (or a similar tool or library). Do not write your entire parser by hand. Parts of it must be tool-generated from context-free grammar rules you provide.

Line Numbers

The line number for an expression is the line number of the first token that is part of that expression.

Example:


while x <= 
       99 loop 
  x <- x + 1 
pool

The while expression is on line 5, the x <= 99 expression is on line 5, the 99 expression is on line 6, and the x <- x + 1 and x + 1 expressions are on line 7. The line numbers for tokens are present in the serialized token .cl-ast file.

Your parser is responsible for keeping track of the line numbers (both for the output syntax tree and for error reporting).

Error Reporting

To report an error, write the string

ERROR: line_number: Parser: message

to standard output and terminate the program. You may write whatever you want in the message, but it should be fairly indicative.

Example erroneous input:


class Cons inherits List + IO {

Example error report output:

ERROR: 70: Parser: syntax error near +

The CL-AST File Format

If there are no errors in file.cl-lex your program should print the abstract syntax tree in CL-AST format to standard output (the default output of print()). This is the same format used by the reference compiler to create a file.cl-ast file when invoked with the --parse flag. The general format of CL-AST follows the Cool Reference Manual Syntax chart. Basically, we do a pre-order traversal of the abstract syntax tree, writing down every node as we come to it.

We will now describe exactly what to output for each kind of node. You can view this as specifying a set of mutually-recursive tree-walking functions. The notation "superclass:identifier" means "output the superclass using the rule (below) for outputting an identifier". The notation "\n" means "output a newline".

To Output an AST. A Cool AST is a list of classes. Output the list of classes.
To Output a List (of classes, or features, or whatever). Output the number of elements, then a newline, then output each list element in turn.
To Output a Class. Output the class name as an identifier. Then, output either:
- no_inherits \n
- inherits \n superclass:identifier
Then output the list of features.
To Output an Identifier. Output the source-file line number, then a newline, then the identifier string, then a newline.
To Output a Feature. Output the name of the feature and then a newline and then any subparts, as given below:
- attribute_no_init \n name:identifier type:identifier
- attribute_init \n name:identifier type:identifier init:exp
- method \n name:identifier formals-list \n type: identifier body:exp
To Output a Formal. Output the name as an identifier on a line and then the type as an identifier on a line.
To Output an Expression. Output the line number of the expression and then a newline. Output the name of the expression and then a newline and then any subparts, as given below:
- assign \n var:identifier rhs:exp
- dynamic_dispatch \n e:exp method:identifier args:exp-list
- static_dispatch \n e:exp type:identifier method:identifier args:exp-list
- self_dispatch \n method:identifier args:exp-list
- if \n predicate:exp then:exp else:exp
- while \n predicate:exp body:exp
- block \n body:exp-list
- new \n class:identifier
- isvoid \n e:exp
- plus \n x:exp y:exp
- minus \n x:exp y:exp
- times \n x:exp y:exp
- divide \n x:exp y:exp
- lt \n x:exp y:exp
- le \n x:exp y:exp
- eq \n x:exp y:exp
- not \n x:exp
- negate \n x:exp
- integer \n the_integer_constant \n
- string \n the_string_constant \n
- identifier \n variable:identifier (note that this is not the same as the integer and string cases above)
- true \n
- false \n
To Output a let Expression. (Output the line number, as usual.) Output let \n. Then output the binding list. To output a binding, do either:
- let_binding_no_init \n variable:identifier type:identifier
- let_binding_init \n variable:identifier type:identifier value:exp
Finally, output the expression that is the body of the let.
To Output a case Expression. (Output the line number, as usual.) Output case \n. Then output the case expression. Then output the case-elements list. To output a case-element, output the variable as an identifier, then the type as an identifier, then the case-element-body as an exp.

Example input:




class List {
  -- Define operations on lists.

  cons(i : Int) : List {
		(new Cons).init(i, self)
	};

};

Example CL-AST output with comments:

1                      -- number of classes                   
3                      --  line number of class name identifier
List                   --  class name identifier
no_inherits            --  does this class inherit? 
1                      --  number of features
method                 --   what kind of feature? 
6                      --   line number of method name identifier
cons                   --   method name identifier
1                      --   number of formal parameters
6                      --    line number of formal parameter identifier
i                      --    formal parameter identifier
6                      --    line number of formal parameter type identifier
Int                    --    formal parameter type identifier
6                      --   line number of return type identifier
List                   --   return type identifier
7                      --    line number of body expression 
dynamic_dispatch       --    kind of body expression 
7                      --     line number of dispatch receiver expression 
new                    --     kind of dispatch receiver expression  
7                      --      line number of new-class identifier 
Cons                   --      new-class identifier
7                      --     line number of dispatch method identifier
init                   --     dispatch method identifier
2                      --     number of arguments in dispatch 
7                      --      line number of first argument expression
identifier             --      kind of first argument expression
7                      --       line number of the identifier
i                      --       what is the identifier? 
7                      --      line number of second argument expression
identifier             --      kind of second argument expression
7                      --       line number of the identifier
self                   --       what is the identifier?

The CL-AST format is quite verbose, but it is particularly easy for later stages (e.g., the type checker) to read in again without having to go through all of the trouble of "actually parsing". It will also make it particularly easy for you to notice where things are going awry if your parser is not producing the correct output.

Writing the code to output a CL-AST file given an AST may take a bit of time but it should not be difficult; the reference compiler implementation (OCaml, so not directly comparable) does it in 116 lines and cleaves closely to the structure given above.

Parser Generators

You must use a parser generator or similar library for this assignment. In class, we discussed Ply, a parser generator for Python. You will find the documentation to be particularly helpful.

There exist similar tools for other programming languages:

The OCaml parser generator is called ocamlyacc and it comes with any OCaml distribution.
Haskell uses the Happy parser generator. You could also use a parser combinator library. Happy is part of the Haskell Platform.
A JavaScript parser generator called jison is available. You must download it yourself.
- Alternate tool: JS/CC.
A Ruby parser generator, called racc, is available, but you must download it yourself.

All of these parser generators are derived from yacc (or bison), the original parser generator for C. Thus you may find it handy to refer to the Yacc paper or the Bison manual. When you're reading, mentally translate the C code references.

Commentary

You can do basic testing as follows:

Example testing

            
              cool --lex file.cl
    					cool --out reference --parse file.cl
    					python parser.py < file.cl-lex > file.cl-ast
    					diff -b -B -E -w file.cl-ast reference.cl-ast

Note that the reference lexer is being explicitly instructed to save the output to reference.cl-ast (the --out flag). Since P3 prints CL-AST to standard output (rather than writing it to a file), we use a redirect to save this output to a file (> file.cl-ast). Finally, diff is a command line tool for comparing the contents of two files. You may also find Diffchecker to be helpful.

You may find the reference compiler's --unparse option useful for debugging your .cl-ast files.

Hint

If you are failing every negative test case, it is likely that you are not handling cross-platform compatibility correctly on all of your inputs and outputs.

If you are still stuck, you can post on the forum or approach the professor.

Video Guides

What to Turn In For P3

You must turn in a tar.gz file containing these files:

readme.txt: your README file
references.txt: your file of citations
team.txt: a file listing only the SLU email IDs of both team members (see below).
good.cl: a novel positive testcase
bad.cl: a novel negative testcase
source_files including:
- parser.py (the entrypoint for your parser)
- All python files should contain the identifier 3aeee2ffb6bdcec698011572b6bbcaf180807419 in a comment
- DO NOT include parsetab.py

The following directory layout is recommended for your tar.gz file. This is the default layout generated by make (see below) should you following the same project structure from the Lexer example in class.

        
        fullsubmit.tar.gz
        -- bad.cl
        -- good.cl
        -- Makefile
        -- parser.py
        -- readme.txt
        -- references.txt
        -- team.txt

Note, you can use the CS-364 Makefile to generate a submission archive:

          
          make fullsubmit

The Makefile is available on Sakai and here. Be sure to update the IDENTIFIER and EXECUTABLE variables appropriately. Note that you do not need to run make to compile anything for this project; we are just using a part of this script to generate the tarball that you can submit.

Working In Pairs

You must complete this assignment in a team of two. Teamwork imposes burdens of communication and coordination, but has the benefits of more thoughtful designs and cleaner programs. Team programming is also the norm in the professional world.

Students on a team are expected to participate equally in the effort and to be thoroughly familiar with all aspects of the joint work. Both members bear full responsibility for the completion of assignments. Partners turn in one solution for each programming assignment; each member receives the same grade for the assignment. If a partnership is not going well, the instructor will help to negotiate new partnerships. Teams may not be dissolved in the middle of an assignment.

both team members should submit to SLUGS. The submission should include the file team.txt, a two-line, two-word flat ASCII text file that contains the email ID of both teammates. Don't include the @stlawu.edu bit. Example: If sjtime10 and kaangs10 are working together, both kaangs10 and sjtime10 would submit fullsubmit.tar.gz with a team.txt file that contains:

          
          kaangs10
          sjtime10

Then, sjtime10 and kaangs10 will both receive the same grade for that submission.

This seems minor, but in the past students have failed to correctly format this file. Thus you now get a point on this assignment for either formatting this file correctly.

Grading Rubric

P3 Grading (out of 50 points):

37 points: autograder tests
- All tests are weighted equally for this assignment
1 point: a correct team.txt file
- 1 — file contains two SLU IDs (e.g., kaangs10 and sjtime10)
- 0 — file is present but contains something other than the two SLU IDs, each on separate lines
4 points: a clear description in your README and references
- 4 — thorough discussion of design decisions (e.g., the handling of let) and choice of test cases; a few paragraphs of coherent English sentences should be fine. Citations provided are well-formatted.
- 2 — vague or hard to understand; omits important details. Citations provided are well-formatted.
- 0 — little to no effort, or submitted an RTF/DOC/PDF file instead of plain TXT. Citations do not provide correct information.
4 points: valid and novel good.cl and bad.cl files
- 4 — wide range of test cases added, stressing most Cool features and an error condition, novel files
- 2 — added some tests, but the scope not sufficiently broad
- 0 — little to no effort, or submitted an RTF/DOC/PDF file instead of plain TXT, or submitted part of course files as test cases
4 points: code cleanliness
- 4 — code is mostly clean and well-commented
- 2 — code is sloppy and/or poorly commented in places
- 0 — little to no effort to organize and document code
-5 points: neglecting to include the grammar definition
- 0 — included grammar definition in submission (e.g., PLY file defining grammar)
- -5 — only submitted machine-generated parser; failed to submit grammar from which parser was generated

P3: The Parser (Python)

Due: 2021-03-05 at 11:50pm