Programming assignments 3 through 5 will direct you to design and build an interpreter for snail. Each assignment will cover one component of the interpreter: lexical analysis, parsing, and operational semantics. Each assignment will ultimately result in a working interpreter phase which can interface with the other phases.
For this assignment you will write a lexical analyzer, also called a scanner, using a lexical analyzer generator. You will describe the set of tokens for snail in an appropriate input format and the analyzer generator will generate actual code. You will then write additional code to serialize the tokens for use by later interpreter stages.
You will be implementing this stage of the interpreter in the Java programming language. You may work in a team of two people for this assignment.
You must create a minimum of four artifacts:
System.in
) and prints to standard output
(System.out
). The input will be UTF-8 source code
for a snail program. Your program must either indicate that
there is an error in the input (e.g., a malformed string) or
emit an SL-Lex
-formatted serialized list of snail
tokens. Your program's main lexer component must be constructed
by a lexical analyzer generator. The "glue code" for serializing
tokens should be written by hand. If your Java project compiles
to lexer.jar
, the following two commands should
produce the same output (for a valid Cool program):
java -jar lexer.jar < file.sl > file.sl-lex
snail --lex file.sl
Note that >
performs output redirection
to save standard output to a file. Your program will consist of
a number of Java files.
readme.txt
describing your design decisions and choice of test cases. See
the grading rubric. A few paragraphs should suffice.
references.txt
providing a citation for each resource you used (excluding class
notes, and assigned readings) to complete the assignment. For
example, if you found a Stack Overflow answer helpful, provide a
link to it. Additionally, provide a brief description of how the
resource helped you.
test-1-XXX.sl
through
test-N-XXX.sl
>, which exercise lexing corner cases.
Your test cases will be evaluated against buggy implementations
of the snail lexer.
Your lexer should conform to the lexical analysis and SL-LEX specification provided by the snail documentation.
To report an error, write the string
to standard output (System.out
) and terminate the
program with exit code 0. You may write whatever you want in the
message, but it should be fairly indicative.
Your program should print all tokens that were sucessfully discovered before the error.
If there are no errors in file.sl
your program should
print a list of scanned tokens in SL-Lex format to standard output
(System.out
). This is the same format used by the
reference compiler to create a file.sl-lex
when
invoked with the --lex
flag. You can find a detailed
specification of this format here.
Note that the reference implementation (along with your upcoming
P4 parser!) will read SL-Lex
input instead of
.sl
files.
You must use a lexical analyzer generator or similar library for this assignment. In class, we discuss JFlex, a lexical analyzer generator for Java. You will find the documentation to be particularly helpful. Because you are producing a Java program for this assignment, it is highly encouraged that you use JFlex.
A copy of JFlex is available here.
To work with the provided ant
build file, you should place this
jar
in the lib
directory at the top level of your project.
In this project, we introduce a form of test-driven development or mutation testing into our software development process and require you to construct a high-quality test suite.
The goal is to leave you with a high-quality test suite of programs that you can use to evaluate your own P3 Lexer. While you you can check for correct "positive" behavior by comparing your lexer's output to the reference interpreters's output on the usual "good" snail programs, it is comparatively harder to check for "corner case" behavior.
SLUGS contains 9 variants of the Java-based lexer, each with a secret intentionally-introduced defect related to Lexing. A high-quality test suite is one that reveals each introduced defect by showing a difference between the behavior of the true reference interpreter and the corresponding buggy verison. You desire a high-quality test suite to help you gain confidence in your own P3 submission.
For each bug, if one of your tests causes the reference and the buggy version to produce different output (that is different stdout), you win: that test has revealed that bug. For full credit your tests must reveal all 9 unknown defects.
SLUGS will tell you the correct output for test cases you submit that reveal bugs in your own implementation of P3. This is the same information you can determine by comparing your output with that of the reference compiler.
Tests should include a snail program
named
test-n-XXX.sl
(XXX can be anything, but must be the
same for both files) where 1 ≤ n ≤ 20.
Your test files may contain no more than 2048 characters in any one file (including comments and whitspace). You may submit up to 20 tests (though it is possible to get full credit with fewer). Note that the tests the autograder runs on your solution are NOT limited to 2048 characters in a file, so your solution should not impose any size limits (as long as sufficient system memory is available).
You can do basic testing with something like the following:
snail --lex file.sl
ant
java -jar lexer.jar < file.sl > generated.sl-lex
diff -b -B -E -w file.sl-lex generated.sl-lex
Since P3 prints SL-Lex to standard output (rather than writing it
to a file), we use a redirect to save this output to a
file (> generated.sl-lex
). Further,
diff
is a command line tool for comparing the
contents of two files. You may also find Diffchecker
as well as VSCode's built-in comparison or IntelliJ's file comparison to be helpful.
You may find the reference interpreter's --unlex
option useful
for debugging your .sl-lex
files.
JFlex handles counting lines a little different from the reference
interpreter (i.e., more characters count as newlines than the
specification allows). To handle this, you can remove the
%line
directive from your JFlex specification and
instead modify yyline
direclty in rules that match
newlines.
You will benefit from using lexer states in this project. This will likely require reading the JFlex documentation to understand how they work.
Need more testcases? Any snail file you have (including the one
you wrote for P2) works fine. The snail-examples
repository should also be a good start. There's also one among the
P2 resources. You'll want to make more complicated test
cases—in particular, you'll want to make
negative testcases (e.g., testcases with malformed string
constants).
If you are still stuck, you can post on the forum or approach the professor.
Video guides are provided to help you get started with various aspects of this project. Note that there may be errors and mistakes in these videos. You should not blindly follow the steps taken in these videos!
You must turn in a tar.gz
file containing these files:
readme.txt
: your README file references.txt
: your file of citationsteam.txt
: a file listing only the SLU
email IDs of both team members (see below). If you are working
alone, you should turn in a file with just your email ID.
build.xml
: your ant build file for this project
main.java
(contains the main method) some_file.flex
(or similar) test-N-XXX.sl
where
N
is a number in the range 1 through 20 and
XXX
is any descriptive text without spaces
wc -c yourtest.sl
says 2048 or
less). You want each of your testcases to be meaningful so
that it helps you narrow down a bug later.
Lexer.flex
), be sure to
include those as well!
jar
files with
your submisison. This includes, but is not limited to, your
lexer, JFlex, etc.
The following directory layout is recommended for your tar.gz
file. This is the default layout generated by ant
(see below) should you following the same project structure from
the Lexer example in class.
tar tzf fullsubmit.tar.gz
(out)src/Lexer.flex
(out)src/Main.java
(out)build.xml
(out)readme.txt
(out)references.txt
(out)team.txt
(out)test-1-xxx.sl
(out)test-2-something.sl
Ant
Build File
Ant
is a build system for Java (much like make
or
dune
).
You may download a (mostly) pre-configured build file (build.xml
)
here.
Add this file to the root of your java project.
Update the following entries in build.xml
:
main-class
property (line 8) to be the name of the class in your project
containing main()
flex-file
property (line 9) to be the name of your JFlex specification
project-name
must remain lexer
for SLUGS to correctly compile and run your code.
There are three (3) key operations that build.xml
supports:
ant dist
: compile your project (this is the same as just running ant
)
ant clean
: remove all generated Java files and compiled Java classes
ant fullsubmit
: generate the tar.gz
submission archive.
Note that ant
will print out the list of files that are found in this archive.
Review this list to be sure you are submitting what you think you are submitting!
You may complete this assignment in a team of two. Teamwork imposes burdens of communication and coordination, but has the benefits of more thoughtful designs and cleaner programs. Team programming is also the norm in the professional world.
Students on a team are expected to participate equally in the effort and to be thoroughly familiar with all aspects of the joint work. Both members bear full responsibility for the completion of assignments. Partners turn in one solution for each programming assignment; each member receives the same grade for the assignment. If a partnership is not going well, the instructor will help to negotiate new partnerships. Teams may not be dissolved in the middle of an assignment.
If you are working in a team, both team members should
submit to SLUGS. All submissions should include the file
team.txt
, a two-line, two-word flat UTF-8 text file
that contains the email ID of both teammates. Don't include the
@stlawu.edu bit. Example: If
sjtime10
and
kaangs10
are working together, both
kaangs10
and sjtime10
should submit
fullsubmit.tar.gz
with a team.txt
file
that contains:
kaangs10
sjtime10
Then, sjtime10
and kaangs10
will both
receive the same grade for that submission.
P3 Grading (out of 60 points):
Lexer.flex
file defining lexer)