Project 4 — Parser (Python)

CS-364: Programming Languages

Fall 2024

Programming assignments 3 through 5 will direct you to design and build an interpreter for snail. Each assignment will cover one component of the interpreter: lexical analysis, parsing, and operational semantics. Each assignment will ultimately result in a working interpreter phase which can interface with the other phases.

For this assignment you will write a parser using a parser generator. You will describe the snail grammar in an appropriate input format and the parser generator will generate actual code (in Python). You will also write additional code to unserialize the tokens produced by the lexer stage and to serialize the abstract syntax tree produced by your parser.

You will be implementing this stage of the interpreter in the Python programming language. You may work in a team of two people for this assignment.

Requirements

You must create a minimum of four artifacts:

  1. A python program that reads SL-Lex format (as described the language specification) from standard input and produces an SL-AST-formated abstract syntax tree.
    • The SL-Lex data will always be well-formed (i.e., there will be no syntax errors in the SL-Lex file itself). However, the SL-Lex file may describe a sequence of snail tokens that do not form a valid snail program.
    • Your program must either indicate that there is an error in the snail program described by the SL-Lex data (e.g., a parse error in the snail file) or emit a SL-AST-formatted serialized snail abstract syntax tree.
    • Your program's main parser component must be constructed by a parser generator. The "glue code" for unserializing tokens and serializing the resulting abstract syntax tree should be written by hand.
    • Your main program should be in a module named parser. Thus, the following two commands should produce the same output (for a valid snail program):
                            
                              python3 parser.py < file.sl-lex > file.sl-ast
                              snail --parse file.sl
                            
                          
      Note that < performs input redirection to load standard input from a file, and > performs output redirection to save standard output to a file.
    • Your program will consist of a number of Python files.
  2. A plain UTF-8 text file called readme.txt describing your design decisions and choice of test cases. See the grading rubric. A few paragraphs should suffice.
  3. A plain UTF-8 text file called references.txt providing a citation for each resource you used (excluding class notes, and assigned readings) to complete the assignment. For example, if you found a Stack Overflow answer helpful, provide a link to it. Additionally, provide a brief description of how the resource helped you.
  4. A suite of test cases (test-1-XXX.sl through test-N-XXX.sl>, which exercise parsing corner cases. Your test cases will be evaluated against buggy implementations of the snail parser.
Warning! You must use Ply (or a similar tool or library). Do not write your entire parser by hand. Parts of it must be tool-generated from context-free grammar rules you provide.

Specification

Your parser should conform to the syntax and SL-AST specification provided by the snail documentation.

Error Reporting

To report an error, write the string

ERROR: line_number:column_number: Parser: message

to standard output and terminate the program with exit code 0. You may write whatever you want in the message, but it should be fairly indicative.

Example erroneous input:

                class Cons : List + IO {
                
Example error report output:
ERROR: 70:19: Parser: syntax error near +

The SL-AST Format

If there are no errors in file.sl-lex your program should print the abstract syntax tree in SL-AST format to standard output (the default output of print()). This is the same format used by the reference compiler to create a file.sl-ast file when invoked with the --parse flag. You can find a detailed specification of this format here.

Note that the reference implementation (along with your upcoming P5 interpreter!) will read SL-AST input instead of .sl files.

The SL-AST format is quite verbose, but it is particularly easy for the interpreter to read in again using library code for JSON. It will also make it particularly easy for you to notice where things are going awry if your parser is not producing the correct output.

Outputting SL-AST should be fairly trivial using Python's json module, assuming you construct reasonable objects or dictionaries as part of parsing.

Note that SLUGS assumes that objects in arrays are emitted in the same order they appear in the source file. This is more strict than the SL-AST spec, but it makes for easier comparison with your output.

Parser Generators

You must use a parser generator or similar library for this assignment. In class, we discussed Ply, a parser generator for Python. You will find the documentation to be particularly helpful.

Installing Ply

Ply is pre-installed in the course development environment. Outside of this, the easiest way to get a recent version of Ply is to download and install it using Python's package index:

              
                python3 -m pip install --user ply
              
            

Test Suite Development

In this project, we continue to employ a form of test-driven development or mutation testing into our software development process and require you to construct a high-quality test suite.

The goal is to leave you with a high-quality test suite of programs that you can use to evaluate your own P4 Parser. While you you can check for correct "positive" behavior by comparing your lexer's output to the reference interpreters's output on the usual "good" snail programs, it is comparatively harder to check for "corner case" behavior.

SLUGS contains 9 variants of the Python-based parser, each with a secret intentionally-introduced defect related to Parsing. A high-quality test suite is one that reveals each introduced defect by showing a difference between the behavior of the true reference interpreter and the corresponding buggy verison. You desire a high-quality test suite to help you gain confidence in your own P4 submission.

For each bug, if one of your tests causes the reference and the buggy version to produce different output (that is different stdout), you win: that test has revealed that bug. For full credit your tests must reveal all 9 unknown defects.

SLUGS will tell you the correct output for test cases you submit that reveal bugs in your own implementation of P4. This is the same information you can determine by comparing your output with that of the reference compiler.

Tests should include a snail program named test-n-XXX.sl (XXX can be anything) where 1 ≤ n ≤ 20.

Your test files may contain no more than 2048 characters in any one file (including comments and whitspace). You may submit up to 20 tests (though it is possible to get full credit with fewer). Note that the tests the autograder runs on your solution are NOT limited to 2048 characters in a file, so your solution should not impose any size limits (as long as sufficient system memory is available).

Commentary

You can do basic testing as follows:

              
                snail --lex file.sl
                snail --parse file.sl 
                python3 parser.py < file.sl-lex > generated.sl-ast
              
            

Since P4 prints SL-AST to standard output (rather than writing it to a file), we use a redirect to save this output to a file (> generated.sl-ast).

You may find the reference interpreter's --unparse option useful for debugging your .sl-ast files.

Validating SL-AST Conforms to the Spec

You can use a python module to verify that your output conforms to the schema provided in the snail specification. Be sure to download an place a copy of the schema in your project.

              
                python3 -m pip install --user jsonschema

                python3 -m jsonschema --instance generated.sl-ast sl-ast.schema.json
              
            

Diffing JSON Files

There are various options for comparing your output to the reference implementation. The easiest approach is likely to use the JSON Diff tool.

There are also a command-line tool, jq, which can help with comparing JSON files (it can be installed in the development container with sudo apt install jq). Then you can run:

              
                diff -b -B -E -w <(jq -S . file.sl-ast) <(jq -S . generated.sl-ast)
              
            

Video Guides

Video guides are provided to help you get started with various aspects of this project. Note that there may be errors and mistakes in these videos. You should not blindly follow the steps taken in these videos!

Handling Regex in Grammars

Checking for Errors

TokenReader and Empty Lists

What to Submit

You must turn in the following files:

  • readme.txt: your README file
  • references.txt: your file of citations
  • source_files:
    • parser.py (contains the main entry point for your parser)
    • All other files containing code needed for your program to run
    • DO NOT include parsetab.py
    • These source files must include a comment with the project identifier: 201967cc8d6a85b5befe76602ccbb86bd55df57f
  • test_files:
    • Up to 20 test files named test-N-XXX.sl where N is a number in the range 1 through 20 and XXX is any descriptive text without spaces
    • Each testcase you submit must be at most 2048 characters (i.e., wc -c yourtest.sl says 2048 or less). You want each of your testcases to be meaningful so that it helps you narrow down a bug later.

Submitting These Files to Gradescope

You will submit your files to Gradescope. The easiet way to do this for the current assignment is to drag the individual files to the submission page on Gradescope.

Working in Pairs

You may complete this assignment in a team of two. Teamwork imposes burdens of communication and coordination, but has the benefits of more thoughtful designs and cleaner programs. Team programming is also the norm in the professional world.

Students on a team are expected to participate equally in the effort and to be thoroughly familiar with all aspects of the joint work. Both members bear full responsibility for the completion of assignments. Partners turn in one solution for each programming assignment; each member receives the same grade for the assignment. If a partnership is not going well, the instructor will help to negotiate new partnerships. Teams may not be dissolved in the middle of an assignment.

If you are working in a team, one team member should updload a single submission to Gradescope and add both members to the submission. There is an option to add members to a group once you upload a submission. Both students in the team will receive the same grade for their submission.

Grading Rubric

P4 Grading (out of 60 points):

  • 43 points: autograder tests
  • 9 points: test files (1 point per bug revealed)
  • 4 points: clear description in your README and References
    • 4 — thorough discussion of design decisions and choice of test cases; a few paragraphs of coherent English sentences should be fine. Citations provided are properly formatted.
    • 2 — vague or hard to understand; omits important details. Citations provided are properly formatted.
    • 0 — little to no effort. Citations do not provide correct information.
  • 4 points: code cleanliness
    • 4 — code is mostly clean and well-commented
    • 2 — code is sloppy and/or poorly commented in places
    • 0 — little to no effort to organize and document code
  • -5 points: neglecting to include the grammar definition
    • 0 — included grammar definition in submission (e.g., Ply file defining lexer)
    • -5 — only submitted machine-generated parser; failed to submit grammar from which parser was generated