CS-364 P2 - Kevin Angstadt :: Teaching

Programming assignments 2 through 5 will direct you to design and build an interpreter for Cool. Each assignment will cover one component of the interpreter: lexical analysis, parsing, semantic analysis, and operational semantics. Each assignment will ultimately result in a working interpreter phase which can interface with the other phases.

For this assignment, you will be programming in Java and performing lexical analysis.

You must work in a team of two people for this assignment. You do not need to keep the same teammate, but you are encouraged to do so.

Goal

For this assignment you will write a lexical analyzer, also called a scanner, using a lexical analyzer generator. You will describe the set of tokens for Cool in an appropriate input format and the analyzer generator will generate actual code. You will then write additional code to serialize the tokens for use by later interpreter stages.

Specification

You must create four artifacts:

A Java program that takes a single command-line argument (e.g., file.cl). That argument will be an ASCII text Cool source file. Your program must either indicate that there is an error in the input (e.g., a malformed string) or emit a CL-Lex-formatted serialized list of Cool tokens. Your program's main lexer component must be constructed by a lexical analyzer generator. The "glue code" for processing command-line arguments and serializing tokens should be written by hand. If your Java project compiles to lexer.jar, the following two commands should produce the same output (for a valid Cool program):
```
            
              java -jar lexer.jar file.cl > file.cl-lex
              cool --lex file.cl
            
          
```
Note that > performs output redirection to save standard output to a file. Your program will consist of a number of Java files.
A plain ASCII text file called readme.txt describing your design decisions and choice of test cases. See the grading rubric. A few paragraphs should suffice.
A plain ASCII text file called references.txt providing a citation for each resource you used (excluding class notes, and assigned readings) to complete the assignment. For example, if you found a Stack Overflow answer helpful, provide a link to it. Additionally, provide a brief description of how the resource helped you.
Testcases good.cl and bad.cl. The first should lex correctly and yield a sequence of tokens. The second should contain an error.

You must use JFlex (or a similar tool or library). Do not write your entire lexer by hand. Parts of it must be tool-generated from regular expressions you provide.

Line Numbers

The first line in a file is line 1. Each successive '\n' newline character increments the line count. Your lexer is responsible for keeping track of the current line number.

Error Reporting

To report an error, write the string

ERROR: line_number: Lexer: message

to standard output (System.out) and terminate the program. You may write whatever you want in the message, but it should be fairly indicative. Your program should not print any tokens if there is a lexing error.

Example erroneous input:

					Backslash not allowed \
				

Example error report output:

					ERROR: 1: Lexer: invalid character: \ 
				

The CL-Lex Format

If there are no errors in file.cl your program should print a list of scanned tokens in CL-Lex format to standard output (System.out). This is the same format used by the reference compiler to create a file.cl-lex when invoked with the --lex flag. Each token is represented by a pair (or triplet) of lines. The first line holds the line number. The second line gives the name of the token. The optional third line holds additional information (i.e., the lexeme) for identifiers, integers, strings and types. For example, for an integer token the third line should contain the decimal integer value.

Example input:

					Backslash not 
 
					   allowed

Corresponding .cl-lex output:

					1 
 
					type 
 
					Backslash 
 
					1 
 
					not 
 
					2 
 
					identifier 
 
					allowed

The official list of token names is:

at case class colon comma divide dot else equals esac false fi identifier if in inherits integer isvoid larrow lbrace le let loop lparen lt minus new not of plus pool rarrow rbrace rparen semi string then tilde times true type while

In general the intended token is evident. For the more exotic names:

at = @, larrow = <-, lbrace = {, le = <=, lparen = (, lt = <, rarrow = =>, rbrace = }, semi = ;, tilde = ~.

The CL-Lex format is exactly the same as the one generated by the reference compiler when you specify --lex. In addition, the reference compiler (and your upcoming PA3 parser!) will read CL-Lex formatted input instead of .cl files.

Lexical Analyzer Generators

You must use a lexical analyzer generator or similar library for this assignment. In class, we discuss JFlex, a lexical analyzer generator for Java. You will find the documentation to be particularly helpful. Because you are producing a Java program for this assignment, it is highly encouraged that you use JFlex.

For your reference, there exist similar tools for other programming languages:

The OCaml lexical analyzer generator is called ocamllex and it comes with any OCaml distribution.
Haskell uses the Alex lexical analyzer generator. It comes with the Haskell Platform.
A Ruby lexical analyzer generator called lexeme is available, but you must download it yourself.
A JavaScript lexical analyzer generator called jison is available. You must download it yourself.
- Alternate tool: JS/CC.
A Python lexical analyzer generator called ply is available, but you must download it yourself. (We will use this for PA3.)

All of these lexical analyzer generators are derived from lex (or flex), the original lexical analyzer generator for C. Thus you may find it handy to refer to the Lex paper or the Flex manual. When you're reading, mentally translate the C code references.

Downloading JFlex

A copy of JFlex is available here. To work with the provided ant build file, you should place this jar in the lib directory at the top level of your project.

Commentary

You can do basic testing with something like the following:

Example testing

            
              cool --out reference --lex file.cl
    					ant
    					java -jar lexer.jar file.cl > file.cl-lex
    					diff -b -B -E -w file.cl-lex reference.cl-lex

Note that the reference lexer is being explicitly instructed to save the output to reference.cl-lex (the --out flag). Since P2 prints CL-Lex to standard output (rather than writing it to a file), we use a redirect to save this output to a file (> file.cl-lex). Finally, diff is a command line tool for comparing the contents of two files. You may also find Diffchecker to be helpful.

You may find the reference compiler's --unlex option useful for debugging your .cl-lex files.

Need more testcases? Any Cool file you have (including the one you wrote for PA1) works fine. The contents of cool-examples.zip should be a good start. There's also one among the PA1 hints. You'll want to make more complicated test cases—in particular, you'll want to make negative testcases (e.g., testcases with malformed string constants).

If you are still stuck, you can post on the forum or approach the professor.

Video Guides

What to Submit for P2

You must turn in a tar.gz file containing these files:

readme.txt: your README file
references.txt: your file of citations
team.txt: a file listing only the SLU email IDs of both team members (see below).
good.cl: a novel positive testcase
bad.cl: a novel negative testcase
build.xml: your ant build file for this project
source_files: including
- main.java (contains the main method)
- some_file.flex (or similar)
- All other files containing code needed for your program to run
If your regular expressions and lexer definition are in some other files (e.g., lexer.flex), be sure to include those as well!

Do not submit any pre-compiled jar files with your submisison. This includes, but is not limited to, your lexer, JFlex, etc.

The following directory layout is recommended for your tar.gz file. This is the default layout generated by ant (see below) should you following the same project structure from the Lexer example in class.

        
        fullsubmit.tar.gz
        -- bad.cl
        -- build.xml
        -- good.cl
        -- readme.txt
        -- references.txt
        -- src/
           -- lexer.flex
           -- Main.java
           -- ...
        -- team.txt

Using the `Ant` Build File

Ant is a build system for Java (much like make or dune). You may download a (mostly) pre-configured build file (build.xml) here. Add this file to the root of your java project.

Update the following entries in build.xml:

The main-class property (line 8) to be the name of the class in your project containing main()

The flex-file property (line 9) to be the name of your JFlex specification

Note: the project-name must remain lexer for SLUGS to correctly compile and run your code.

There are three (3) key operations that build.xml supports:

ant dist: compile your project (this is the same as just running ant)

ant clean: remove all generated Java files and compiled Java classes

ant fullsubmit: generate the tar.gz submission archive. Note that ant will print out the list of files that are found in this archive. Review this list to be sure you are submitting what you think you are submitting!

Working In Pairs

You must complete this assignment in a team of two. Teamwork imposes burdens of communication and coordination, but has the benefits of more thoughtful designs and cleaner programs. Team programming is also the norm in the professional world.

Students on a team are expected to participate equally in the effort and to be thoroughly familiar with all aspects of the joint work. Both members bear full responsibility for the completion of assignments. Partners turn in one solution for each programming assignment; each member receives the same grade for the assignment. If a partnership is not going well, the instructor will help to negotiate new partnerships. Teams may not be dissolved in the middle of an assignment.

both team members should submit to SLUGS. The submission should include the file team.txt, a two-line, two-word flat ASCII text file that contains the email ID of both teammates. Don't include the @stlawu.edu bit. Example: If sjtime10 and kaangs10 are working together, both kaangs10 and sjtime10 would submit fullsubmit.tar.gz with a team.txt file that contains:

          
          kaangs10
          sjtime10

Then, sjtime10 and kaangs10 will both receive the same grade for that submission.

This seems minor, but in the past students have failed to correctly format this file. Thus you now get a point on this assignment for either formatting this file correctly.

Grading Rubric

P2 Grading (out of 50 points):

37 points: autograder tests
- All tests are weighted equally for this assignment
1 point: a correct team.txt file
- 1 — file contains two SLU IDs (e.g., kaangs10 and sjtime10)
- 0 — file is present but contains something other than the two SLU IDs, each on separate lines
4 points: clear description in your README and References
- 4 — thorough discussion of design decisions (including handling of strings and comments) and choice of test cases; a few paragraphs of coherent English sentences should be fine. Citations provided are well-formed.
- 2 — vague or hard to understand; omits important details. Citations provided are well-formed.
- 0 — little to no effort. Citations do not provide correct information.
4 points: valid and novel good.cl and bad.cl files
- 4 — wide range of test cases added, stressing most Cool features and an error condition, novel files
- 2 — added some tests, but the scope not sufficiently broad
- 0 — little to no effort, or submitted an RTF/DOC/PDF file instead of plain TXT, or submitted part of course files as test cases
4 points: code cleanliness
- 4 — code is mostly clean and well-commented
- 2 — code is sloppy and/or poorly commented in places
- 0 — little to no effort to organize and document code
-5 points: neglecting to include the lexer definition
- 0 — included lexer definition in submission (e.g., lexer.flex file defining lexer)
- -5 — only submitted machine-generated lexer; failed to submit definition from which lexer was generated

P2: The Lexer (Java)

Due: 2021-02-15 at 11:50pm

Goal

Specification

Line Numbers

Error Reporting

The CL-Lex Format

Lexical Analyzer Generators

Downloading JFlex

Commentary

Video Guides

What to Submit for P2

Using the `Ant` Build File

Working In Pairs

Grading Rubric

P2: The Lexer (Java)

Due: 2021-02-15 at 11:50pm

Goal

Specification

Line Numbers

Error Reporting

The CL-Lex Format

Lexical Analyzer Generators

Downloading JFlex

Commentary

Video Guides

What to Submit for P2

Using the Ant Build File

Working In Pairs

Grading Rubric

Using the `Ant` Build File