CS 364 - Lexical Analyzer for CLite

In this assignment you are going to complete the lexical analyzer (a.k.a. lexer) for CLite that we began in class. The project is due on Friday March 7 at 5PM. Put your assignment in a folder on the T: drive named Clite. I assume you are using Java, if you are not that's okay, but you need to let me know what you are using with very specific instructions how to run it. Also make sure you turn in a stapled printout. Watch for bad line breaks, split lines of code and comments across lines nicely.

Here are the requirements. Your lexer should ...

  1. recognize all of the CLite tokens described in the grammar on pages 37-40. Clite lexical tokens are treated in detail on pages 60-61.
  2. read the name of the files being processed from the command line. The user should be allowed to specify one or more file names on the command line.
  3. exit gracefully with an appropriate error message if the file being processed does not exist.
  4. have nicely formatted output listing the token recognized, the name of the token, and the line number the token was found on. For example:
    File test1.c
    
    Token            Name                      Line Number
    ------------------------------------------------------
    Keyword          class                     1
    Left paren       (                         1
    Right paren      )                         2
    identifer        count                     2
    greater-equal    >=                        4
    equal-equal      ==                        4
    assignment       =                         8
    
  5. recognize // comments properly.
  6. Your lexer should ignore (toss) all whitespace and comment tokens.
  7. correctly identify lexical errors. What are they?
  8. provide one public function named nextToken that returns a Token.
  9. Your program should use data abstraction (classes and functions apropriately). In particular your next token method should not get too big. For example, your lexer needs to peek ahead in the input to distinguish between recognizing tokens such as > and >= . If the current token is > you still don't know if the next character is =. If the lookahead is = then recognize >= as a token, if it is not then recognize > and then put back the lookahead. The best way to do this is to use a one-place buffer (queue). You should hide all of the logic that reads a line of text and gets input.
  10. Make sure that the class that contains the main method is named Main. Here is an example sequence of commands that should work to run your lexer from the command line.
    javac *.java
    java Main test1.c test2.c test3.c test4.c
    
  11. Every function should be commented apropriately with purpose, pre/post conditions.
  12. Your lexer should use access modifiers appropriately (private, public, protected).
  13. Your program should produce the correct output.
  14. Your program should not crash or hang.
  15. Your program should fail gracefuly with an appropriate usage error message if the user uses the wrong number of command line arguments.
  16. All error messages should be user friendly and not pertain to the implementation. For example Illegal token "123abc" on line 5 is a decent message whereas nextToken() returned an error IOException at token 123abc is unfriendly.