Project 3 — Test Coverage and Automation

CS-340: Software Engineering

Summer 2021

For project 3, you will manually create two high coverage test suites, one for each of two languages spanning two different application domains and programming languages. You will then use a tool to automatically create a high-coverage test suite for one of these programs. You will also write a report reflecting on this activity.

Two of the key properties found in industrial software engineering, but not in most undergraduate academic coursework, are: the program is large and you did not write the program. If you are hired by Microsoft as a software engineer, they will not ask you to write Office from scratch. Instead, as discussed in class, the vast majority of software engineering is the comprehension and maintenance of large legacy codebases (i.e., old code you did not write).

Thus, there is no particular assumption that you are familiar with any of the subject programs (or languages!) in this assignment. Indeed, that is the point: you will be applying testing concepts to unknown software.

You may use files, benchmarks or resources from the Internet (unlike in many other classes) or instructors, provided you cite them in your written report.

Requirements

This project is not-so-secretly three mini projects with common themes. Thus, we will define the requirements for each part separately.

  • Project 3A:
    • A suite of no more than 50 .avl files that are inputs to avl.py.
  • Project 3B:
    • A suite of no more than 50 .png files that are inputs to pngtest.
  • Project 3C:
    • A PDF report detailing your experiences with test input generation (both manual and automatic).

Subject Programs and Tools

Project 3A — AVL Trees (Python)

The first subject program is a simple python implementation of an abstract data type: the AVL Tree container data structure.

Test cases for this program take the form of sequences of instructions for manipulating a single global AVL tree:

  • inumber — insert number into the tree
  • dnumber — delete number from the tree
  • p — print the current tree
For example, if the text file test1.avl contains i1 i2 i3 i4 i5 d4 p, then python avl.py test1.avl produces the output:
              
              ​  2.
               /  \
              1   5
              /\  /\
                 3
                 /\
              

            

We compute coverage for this program using the coverage package for Python. Both statement and branch coverage are considered. (For more information about statement and branch coverage, read here.)

The reference implementation is available here. The program is about 300 lines long and is a self-contained single file.

A test suite for this application is a collection of .avl files. The reference starter test suite is available here.

Python Build, Installation, and Coverage Details

 Download avl.py  Download avl-tests.zip

This assignment assumes you are using your Ubuntu (linux) server. The following lines should install the relevant system and Python packages.

              
                sudo apt update
                sudo apt install build-essential python3-pip python3-dev
                sudo pip3 install --upgrade pip
                sudo pip3 install coverage
              
            

Note that apt is the primary package manager on Debian/Ubuntu distributions of Linux. We use this tool to update the list of available packages and install several. The pip propgram is used to manage Python packages.

After completing this installation, you can use the coverage utility on multiple test cases and view combined coverage results.

              
                coverage run --append avl.py simple1.avl
                 1
                / \
                   2
                  / \

                coverage report
                Name     Stmts   Miss  Cover
                ----------------------------
                avl.py     182     99    46%

                coverage run --append avl.py simple2.avl
                  2.
                 /  \
                1   5
                /\  /\
                   3
                   /\

                coverage report
                Name     Stmts   Miss  Cover
                ----------------------------
                avl.py     182     54    70%

                coverage erase
              
            

Note how the measured coverage of only the first test is low (46%) but the combined coverage result for the two tests is higher. The --append option is necessary to avoid overwriting the stored coverage data with each new test run.

To measure branch coverage, simply add --branch

              
                coverage run --append --branch avl.py simple1.avl > /dev/null

                coverage run --append --branch avl.py simple2.avl > /dev/null

                coverage report
                Name     Stmts   Miss Branch BrPart  Cover
                ------------------------------------------
                avl.py     182     54     90     19    66%

                coverage report -m
                Name     Stmts   Miss Branch BrPart  Cover   Missing
                ----------------------------------------------------
                avl.py     182     54     90     19    66%   82-85, 88, 101, 109-112, 121,
                123-127, 139-141, 145, 189, 201-202, 211, 216, 223-238, 244-248, 253-254,
                284, 286-292, 304-306, 81->82, 87->88, 100->101, 107->109, 120->121,
                122->123, 138->139, 144->145, 171->exit, 210->211, 212->214, 215->216,
                243->244, 250->253, 283->284, 285->286, 301->303, 303->304, 319->312

                coverage erase
              
            

The additional columns give branch-specification information (e.g., about partial branches; see the manual), but the final column now gives the branch coverage. The columns may look different on other setups — that's fine. Note also how the missing column guides you to line numbers that are not exercised by the current suite.

There is documentation for this coverage utility. Finally, note that this utility can also produce XML and HTML reports, which you may find easier to interpet.

Project 3B — PNG Graphics (C)

The second subject program is the portable network graphics file format reference library, libpng. It is used for image manipulation and conversion.

We will be using the developer-provided pngtest driver. It takes as input a single .png file.

We compute coverage for this program using the gcov test coverage program for gcc. Only statement coverage is considered.

The reference implementation (version 1.6.34) is available here. It contains about 85,000 lines of code spread over about 70 files.

A test suite for this application is a collection of .png files. The reference implementation comes with over 150 such example files. Feel free to use an image manipulation or conversion program to create your own files or to use files or benchmarks you find on the Internet.

C Build, Installation, and Coverage Details

 Download libpng-1.6.34.tar.gz

The gcov utility usually comes with gcc (a common compiler for C), so you should not need to do anything specific to install it (apart from making sure you have gcc installed). We use a python wrapper around gcov called gcovr to give improved output. Further, our subject program depends on the development version of a compression library. The following commands should install all needed software on your linux server. (Note that some of these packages are already installed in 3A, but are listed here for completeness.)

              
                sudo apt update
                sudo apt install build-essential libz-dev python3-pip python3-dev
                sudo pip install --upgrade pip
                sudo pip install gcovr
              
            

In addition, you will have to compile the project with coverage instrumentation (taking care to use static linking or to otherwise make all of the libpng source files, and not just the test harness, visible to coverage).

              
                tar xzf libpng-1.6.34.tar.gz
                cd libpng-1.6.34
                sh ./configure CFLAGS="--coverage -static"
                make clean ; make 

                ./pngtest pngtest.png
                (out)Testing libpng version 1.6.34
                (out)        ...

                gcovr -s -e contrib -e intel -e mips -e powerpc -r .
                (out)------------------------------------------------------------------------------
                (out)               GCC Code Coverage Report
                (out)Directory: .
                (out)------------------------------------------------------------------------------
                (out)File                                       Lines    Exec  Cover   Missing
                (out)------------------------------------------------------------------------------
                (out)png.c                                       1267     478    37%   47,49,53-54,56
                (out)  # your numbers may differ!
                (out)        ...
                (out)------------------------------------------------------------------------------
                (out)TOTAL                                      11201    3086    27%  # your numbers may differ!
                (out)------------------------------------------------------------------------------
                (out)lines: 27.6% (3086 out of 11201)     # your numbers may differ!
                (out)branches: 20.8% (1597 out of 7687)   # your numbers may differ!

                ./pngtest contrib/gregbook/toucan.png
                (out)        ...

                gcovr -s -e contrib -e intel -e mips -e powerpc -r .
                (out)        ...
                (out)------------------------------------------------------------------------------
                (out)TOTAL                                      11206    3301    29%
                (out)------------------------------------------------------------------------------
                (out)lines: 29.5% (3301 out of 11206)      # your numbers may differ
                (out)branches: 23.0% (1770 out of 7687)    # and everything is still fine!                             

                # delete coverage and test output
                rm *.gcda pngout.png
              
            

Note how gcovr gives a separate report for each source file specified on the command line, but also gives a sum total for statement coverage (the final lines: reported). Your coverage numbers may be slightly different from the example ones listed above. The output also lists lines from each file that were not covered.

While gcovr report branch coverage, we will not be using this for the project.

Note that pngtest creates pngout.png. (This can confuse some students, because if you collect coverage after running on *.png and then do it again, you may get a different answer the second time if you're not careful because of the newly-added file!)

There is documentation for this coverage utility. Finally, note that this utility can also produce JSON, XML, and HTML reports, which you may find easier to interpet.

Project 3C — PNG Graphics (C) + American Fuzzy Lop

We now revisit libpng from 3B. Instead of manually creating test images like you did for 3B, you will use a test input generator. We will use American Fuzzy Lop, version 2.52b. A mirror copy of the AFL tarball is here, but you should visit the project webpage for documentation. As the (delightful) name suggests, it is a fuzz testing tool.

You will use AFL to create a test suite of png images to exercise the pngtest program, starting from all of the png images provided with the original tarball. Note that you can terminate AFL as soon as you reach 510 paths_total (also called total paths in the GUI) — AFL does not stop on its own.

DANGER! The tools you must use for this assignment may literally take multiple hours to run. Some students reported that it took over 14 hours to run AFL on libpng! (However, others were able to finish in about five minutes. Regardless of when you finish, everything is fine.) Even if you are fast and can finish your work at the last minute, the tools are not and can not. On my high-powered server-class research test machine with a high-speed SSD it took 10 minutes, but signifiantly longer to run on my cs-linuxlab virtual server. However, as soon as you get enough data (see below) you can stop early.

AFL Build, Installation, and Running Details

 Download afl-2.52b.tar.gz  Download libpng-1.6.34.tar.gz

AFL claims that one of its key advantages is ease of use. Just follow along with their quick start guide. First, extract the AFL tarball and run make. Then, change into a fresh copy of the libpng directory (you will need to configure it differently from 3B) and compile it with a configuration akin to the following.

              
                CC=/REPLACE/THIS/TEXT/WITH/YOUR/PARTICULAR/PATH/TO/afl-gcc_(don't_just_copy_this_in_unchanged) ./configure --disable-shared CFLAGS="-static" 
                make clean ; make
              
            

The "CC" variable will look something like (but maybe different for you) CC=/home/kangstadt/p3c/afl-2.52b/afl-gcc — note that there is no trailing slash. If you see configure: error: C compiler cannot create executables, double-check your spelling here.

Note that you are not using "coverage" or gcov for this homework assignment. I recommend re-extracting the libpng tarball into a new, separate directory just to make sure that you are not mistakenly leaving around any gcov-instrumented files. We only want AFL instrumentation for this part of the assignment.

Now, make a new subdirectory and copy all of the test images (including those in subdirectories) to it. These will be the starting inputs for AFL to randomly generate test input. You may also use all of your manually-created test images (from 3b); both are full-credit options. Also, make a new subdirectory for all of the AFL ouput (e.g., findings_dir). You can now launch AFL with a command similar to the following.

WARNING! Be sure to run this command inside of a tmux session so that you can disconnect if needed without killing the process.
              
                /REPLACE/THIS/TEXT/WITH/YOUR/path/to/afl-fuzz -i testcase_dir -o findings_dir -- /path/to/pngtest_(not_.c_nor_.png_but_the_executable_you_built) @@
                
                # the path to afl-fuzz will be something like
                #     /home/kangstadt/p3c/afl-2.52b/afl-fuzz
                # the path to pngtest will be something like
                #     /home/kangstadt/p3c/libpng-1.6.34/pngtest
              
            

Go back and double-check the end of the previous line for @@. It is required, it is not a typo, and if you did not type it in (to tell AFL where its randomly-created arguments to pngtest go) you are likely to "get stuck" when enumerating paths and tests (see FAQ below).

Note that you must stop afl-fuzz yourself, otherwise it will run forever — it does not stop on its own. Read the Report instructions below for information on the stopping condition and knowing "when you are done".

Note also that you can resume afl-fuzz if it is interrupted or stopped in the middle (you don't "lose your work"). When you try to re-run it, it will give you a helpful message like:

              
                To resume the old session, put '-' as the input directory in the command
                line ('-i -') and try again.
              
            

Just follow its directions. Warning: when you resume AFL it will overwrite your findings/plot_data file (which you need for the final report), so be sure to save a copy of that somewhere before resuming.

Note that afl-fuzz will likely abort the first few times you run it and ask you to change some system settings (e.g., echo core | sudo tee /proc/sys/kernel/core_pattern, echo core >/proc/sys/kernel/core_pattern etc.). For example, on Ubuntu systems it often asks twice. Just become root and execute the commands. Note that sudo may not work for some of the commands (e.g., sudo echo core >/proc/sys/kernel/core_pattern will fail because bash will do the > redirection before running sudo so you will not yet have permissions, etc.) — so just become root (e.g., sudo sh) and then execute the commands in a root shell. You can leave the root shell by typing exit or ctrl + d.

The produced test cases are in the findings/queue/ directory. They will not have the .png extension (instead, they will have names like 000154,sr...pos/36,+cov), but you can rename them if you like.

While AFL is running, read the technical whitepaper to learn about how it works and compare the techniques it uses to the basic theory discussed in class.

Written Report

You must write a multi-paragraph report for Project 3C. The first part of your report is a short, two-paragraph report reflecting on your experiences manually creating high-coverage tests for this assignment. Consider addressing points such as:

  • How did you go about manually creating test cases?
  • Did you use or modify any of the tests we provided or tests from the Internet as part of your answer? (Reminder: You are allowed to do that in this class!)
  • What was harder than you expected or different than you expected?
  • What was similar or different between the two subject programs?

The remainder of your report should reflect on your experiences with automatic test generation. In particular:

  1. In a few sentences, your report should describe one test case that AFL created that covered part of libpng that your manual test suite from 3B did not, and vice versa. (If no such tests exist, for example because you seeded AFL with all of your manual tests, instead describe what did happen. For example, if you seeded AFL with all of your manual tests, then AFL's output will be a superset of yours.) You should also indicate how many tests your run of AFL created, your total runtime, and the final coverage of the AFL-created test suite (use the technique described in 3B to compute coverage; note that AFL will include all of the original tests as well — yes, you should include those).
  2. Your report should include, as inlined images, one or two "interesting" valid PNG files that AFL created and a one-sentence explanation of why they are "interesting".
  3. Your report should include a scatterplot in which the x axis is "seconds" (or some other notion of total execution time, such as unix_time) and the y axis is paths_total as reported in the findings/plot_data CSV file produced by your run of AFL. You can create this plot yourself (using the program of your choice). Your scatterplot must include data reaching at least 510 paths_total on the y axis — which means you must run AFL until you have at least that much data. (See here for plot examples that include much more detail than you are asked to include here. Include a sentence that draws a conclusion from this scatterplot. If you do not like scatterplots you can make a line graph (etc.) as long as it conveys the same information.
    • Note that it does not matter how many rows are in your plot_data file or if you are missing some rows at the start or middle as long as you got up to 510 paths_total (also called paths total in the GUI) — it is common to have fewer than 510 rows.
    • Note that if you suspended afl-fuzz you may have a big time gap in your plot. We do not care how you handle that (e.g., ugly graph, big gap, fudge the times, whatever) — any approach is full credit.

Commentary

As befitting a 300-level elective, this assignment is somewhat open-ended. There is no "one true path" to manually creating high-coverage test suites.

You may find yourself thinking something like "I know what to do at a high level, but I don't know specifically what to create here. I could try to read and understand the program to make tests that cause it to do different things, but reading and understanding the program takes a long time, especially since I didn't write this code and I don't really care about it and I'm only doing it because someone is telling me to." Such concerns are, in some sense, the point: they are indicative of industrial software engineering.

Finally, you might wonder if it is possible to get 100% coverage on 1a. While it should be possible to get 100% coverage for the general AVL tree algorithm, this particular implementation of the algorithm may well have branches that you are unable to reach — especially given this testing setup.

Recommended Approach

You will almost certainly need to inspect the source code of the various programs (i.e., white-box testing) to construct high-coverage test suites.

For each program, start with any test cases we have provided. You might also start with image files (for 3B) you find on the Internet. Because of limits on autograding submissions, and so that you learn how coverage computations work across various systems (a relevant skill for interviews, etc.), you will almost certainly want to install the coverage utilities yourself. As you read each project's source code and expand your test suite, recompute the coverage locally. When you perceive that it has gone up significantly, submit to the grading server for a more official check.

Note the increasing size and complexity of the subject programs. Splitting your time evenly between them is probably not the optimal allocation. The first project is due earlier to prevent students from putting things off until the last minute.

Supported Platforms

If you are having trouble using your cs-linuxlab server, I will help you get things working. I will not support any other platform. In particular, MacOS is notorious for poor compatibility with Project 3B.

Project 3C is Different

This assignment is perhaps a bit different than the usual CS homework: instead of you, yourself, doing the "main activity" (i.e., creating test suites), you are asked to invoke tools that carry out that activity for you. This sort of automation (from testing to refactoring to documentation, etc.) is indicative of modern industrial software engineering.

Asking you to submit the generated tests is, in some sense, uninteresting (e.g., we could rerun the tool on the grading server if we just wanted tool-generated tests). Instead, you are asked to write a report that includes information and components that you will only have if you used the tools to generate the tests. Writing reports (e.g., for consumption by your manager or team) is also a common activity in modern industrial software engineering.

What to Submit for P3

Be sure that you are submitting to the correct assignment on Gradescope! Further, be sure to read the output provided by Gradescope after your submission is run by the grading scripts. Don't wait until the last minute to submit.

For each of the sub-projects, submit the following:

  • Project 3A:
    • *.avl: no more than 50 inputs for testing AVL trees
  • Project 3B:
    • *.png: no more than 50 images for testing the PNG library
  • Project 3C:
    • A PDF containing your written report and your references for ALL parts of this assignment.

You can submit these by dragging the files into the submission window on the Gradescope interface.

Grading Rubric

P3 Grading (out of 45 points):

  • 16 points: 3A coverage
    • 50% (one point) to 94% (nine points) statement coverage
    • 50% (one point) to 89% (seven points) branch coverage
  • 14 points: 3B coverage
    • 27.5% (one point) to 36% (fourteen points) statement coverage
  • 15 points: 3C report
    • 5 — two paragraphs describing your manual inputs
    • 5 — discussion of AFL vs. manual coverage
    • 2 — "interesting" image and explanation
    • 3 — graph of progress and discussion
    • -1 — English prose or grammatical errors
    • -all points — submission does not provide a references or works cited section.

FAQ and Troubleshooting

This section details common pitfalls, questions, and resolutions.

  1. Q: When I type the command sudo apt-get install python-pip python-dev build-essential I am prompted for a password — which should I use?

    A: The password should be the one associated with your user account (the one you used to log in to the OS).

  2. Q: Can I really use images or code snippets I found online to help with this assignment?

    A: Yes, provided that you cite them at the end of your report. This class is about "everything except writing code", in some sense, and in the real world people do everything they can to get high-coverage test suites (including paying people to construct them and using special tools). So you can use image files or benchmarks you find on the Internet if you like — but you'll still have to do the work of paring things down to a small number of high-coverage files. You'll likely get the most out of the assignment if you use a combination of white-box testing, black-box testing and searching for resources — but there are many ways to complete it for full credit.

  3. Q: In 3B, I get 0% code coverage with running pngtest on my Mac.

    A: Sorry, I will not be supporting any platform other than the cs-linuxlab servers. If you run into trouble with the server, I'll be more than happy to help!

  4. Q: For Project 3B:

                        
                          ./.libs/libpng16.a(pngrutil.o): In function `png_inflate_claim':
                          pngrutil.c:(.text+0xd5b): undefined reference to `inflateValidate'
                          collect2: error: ld returned 1 exit status
                          Makefile:1005: recipe for target 'pngfix' failed
                        
                      
    or
                        
                          Note, selecting 'zlib1g-dev' instead of 'libz-dev'
                        
                      

    A: You do not have libz-dev installed correctly.

  5. Q: I get this error

                        
                          make: *** No targets specified and no makefile found.  Stop.
                        
                      

    A: You need to run configure before running make. Double-check the build and installation instructions above.

  6. Q: When I try to run AFL, I get:

                        
                          [-] PROGRAM ABORT : No instrumentation detected
                        
                      

    A: You are pointing AFL to the wrong pngtest executable. Double-check the instructions near CC=/path/to/afl-gcc ./configure --disable-shared CFLAGS="-static" , rebuild pngtest using "make", and then point to exactly that executable and not a different one.

  7. Q: When I try to run configure with AFL via something like CC=/home/kangstadt/p3c/afl-2.52b/afl-gcc/ ./configure --disable-shared CFLAGS="-static", I get:

                        
                          checking whether the C compiler works... no
                          configure: error: in `/home/vagrant/eecs481/p2/libpng-1.6.34':
                          configure: error: C compiler cannot create executables
                        
                      

    A: You need to specify afl-gcc, not afl-gcc.c or afl-gcc/ (note trailing slash!).

  8. Q: When I am running AFL, it gets "stuck" at 163 (or 72, or another small number) paths.

    A: In multiple instances, students have forgotten the @@ at the end of the AFL command. Double check the command you are running! Other students report that using large image files (100s of KB is considered large here) slows down the search significantly.

  9. Q: When I try to compile libpng with AFL, I get:

                        
                          configure: error: C compiler cannot create executables
                        
                      

    A: You need to provide the full path to the afl-gcc executable, not just the path to p3c/afl-2.52b/.

  10. Q: I can't open the files produced by AFL.

    A: You will almost certainly find that AFL's output queue folder does not contain files with the .png extension. In addition, you will almost certainly find that most of the files produced by AFL are "invalid" PNG files that cause program executions to cover error cases (e.g., libpng read error).

    This is normal. If you haven't tried already, consider appending .png to the files generated by AFL.

    Double-check all of the instructions here, and the explanations in the course lectures for how these tools work: there's no need to panic. In addition, you might try alternate image viewers (rather than just the default one). For example, uploading apparently-invalid images to GitHub (no, really) works well for viewing them.

  11. Q: Some of the so-called "png" files that AFL produces cannot be opened by my image viewer and may not even be valid "png" files at all!

    A: Yes, you are correct. (Thought question: why are invalid inputs sometimes good test cases for branch coverage?)

Acknowledgements

This assignment has be adapted, updated, and revised (with permission) from a pair of assignments used in Wes Weimer's EECS 481 Software Engineering course at the University of Michigan [1, 2].

Special thanks to Will Hawkins for reviewing this assignment.