For project 3, you will manually create two high coverage test suites, one for each of two languages spanning two different application domains and programming languages. You will then use a tool to automatically create a high-coverage test suite for one of these programs. You will also write a report reflecting on this activity.
Two of the key properties found in industrial software engineering, but not in most undergraduate academic coursework, are: the program is large and you did not write the program. If you are hired by Microsoft as a software engineer, they will not ask you to write Office from scratch. Instead, as discussed in class, the vast majority of software engineering is the comprehension and maintenance of large legacy codebases (i.e., old code you did not write).
Thus, there is no particular assumption that you are familiar with any of the subject programs (or languages!) in this assignment. Indeed, that is the point: you will be applying testing concepts to unknown software.
You may use files, benchmarks or resources from the Internet (unlike in many other classes) or instructors, provided you cite them in your written report.
This project is not-so-secretly three mini projects with common themes. Thus, we will define the requirements for each part separately.
.avl
files that are inputs to avl.py
.
.png
files that are inputs to pngtest
.
The first subject program is a simple python implementation of an abstract data type: the AVL Tree container data structure.
Test cases for this program take the form of sequences of instructions for manipulating a single global AVL tree:
inumber
— insert number into the treednumber
— delete number from the treep
— print the current treetest1.avl
contains i1 i2 i3 i4
i5 d4 p
, then python avl.py test1.avl
produces the output:
​ 2.
/ \
1 5
/\ /\
3
/\
We compute coverage for this program using the coverage
package for Python. Both
statement and branch coverage are considered. (For
more information about statement and branch coverage, read here.)
The reference implementation is available here. The program is about 300 lines long and is a self-contained single file.
A test suite for this application is a collection of .avl
files. The reference starter test suite is available here.
Download avl.py
Download avl-tests.zip
This assignment assumes you are using your Ubuntu (linux) server. The following lines should install the relevant system and Python packages.
sudo apt update
sudo apt install build-essential python3-pip python3-dev
sudo pip3 install --upgrade pip
sudo pip3 install coverage
Note that apt
is the primary package manager on Debian/Ubuntu
distributions of Linux. We use this tool to update the list of
available packages and install several. The pip
propgram
is used to manage Python packages.
After completing this installation, you can use the coverage
utility on multiple test cases and view combined coverage results.
coverage run --append avl.py simple1.avl
1
/ \
2
/ \
coverage report
Name Stmts Miss Cover
----------------------------
avl.py 182 99 46%
coverage run --append avl.py simple2.avl
2.
/ \
1 5
/\ /\
3
/\
coverage report
Name Stmts Miss Cover
----------------------------
avl.py 182 54 70%
coverage erase
Note how the measured coverage of only the first test is low (46%) but the
combined coverage result for the two tests is higher. The --append
option is necessary to avoid overwriting the stored coverage data with each
new test run.
To measure branch coverage, simply add --branch
coverage run --append --branch avl.py simple1.avl > /dev/null
coverage run --append --branch avl.py simple2.avl > /dev/null
coverage report
Name Stmts Miss Branch BrPart Cover
------------------------------------------
avl.py 182 54 90 19 66%
coverage report -m
Name Stmts Miss Branch BrPart Cover Missing
----------------------------------------------------
avl.py 182 54 90 19 66% 82-85, 88, 101, 109-112, 121,
123-127, 139-141, 145, 189, 201-202, 211, 216, 223-238, 244-248, 253-254,
284, 286-292, 304-306, 81->82, 87->88, 100->101, 107->109, 120->121,
122->123, 138->139, 144->145, 171->exit, 210->211, 212->214, 215->216,
243->244, 250->253, 283->284, 285->286, 301->303, 303->304, 319->312
coverage erase
The additional columns give branch-specification information (e.g.,
about partial branches; see the manual), but the final column now
gives the branch coverage. The columns may
look different on other setups — that's fine. Note also
how the missing
column guides you to line numbers that
are not exercised by the current suite.
There is documentation for this coverage utility. Finally, note that this utility can also produce XML and HTML reports, which you may find easier to interpet.
The second subject program is the portable network graphics file format reference
library, libpng
. It is used for image
manipulation and conversion.
We will be using the developer-provided pngtest
driver. It takes
as input a single .png
file.
We compute coverage for this program using the gcov
test coverage program for gcc.
Only statement coverage is
considered.
The reference implementation (version 1.6.34) is available here. It contains about 85,000 lines of code spread over about 70 files.
A test suite for this application is a collection of .png
files. The reference implementation comes with over 150 such example
files. Feel free to use an image manipulation or conversion program to
create your own files or to use files or benchmarks you find on the
Internet.
The gcov
utility usually comes with gcc
(a
common compiler for C), so you should not need to do anything specific
to install it (apart from making sure you have gcc
installed). We use a python wrapper around gcov
called
gcovr
to give improved output. Further, our subject program depends
on the development version of a compression library. The following
commands should install all needed software on your linux server.
(Note that some of these packages are already installed in 3A, but are
listed here for completeness.)
sudo apt update
sudo apt install build-essential libz-dev python3-pip python3-dev
sudo pip install --upgrade pip
sudo pip install gcovr
In addition, you will have to compile the project with coverage
instrumentation (taking care to use static linking or to otherwise
make all of the libpng
source files, and not just the test
harness, visible to coverage).
tar xzf libpng-1.6.34.tar.gz
cd libpng-1.6.34
sh ./configure CFLAGS="--coverage -static"
make clean ; make
./pngtest pngtest.png
(out)Testing libpng version 1.6.34
(out) ...
gcovr -s -e contrib -e intel -e mips -e powerpc -r .
(out)------------------------------------------------------------------------------
(out) GCC Code Coverage Report
(out)Directory: .
(out)------------------------------------------------------------------------------
(out)File Lines Exec Cover Missing
(out)------------------------------------------------------------------------------
(out)png.c 1267 478 37% 47,49,53-54,56
(out) # your numbers may differ!
(out) ...
(out)------------------------------------------------------------------------------
(out)TOTAL 11201 3086 27% # your numbers may differ!
(out)------------------------------------------------------------------------------
(out)lines: 27.6% (3086 out of 11201) # your numbers may differ!
(out)branches: 20.8% (1597 out of 7687) # your numbers may differ!
./pngtest contrib/gregbook/toucan.png
(out) ...
gcovr -s -e contrib -e intel -e mips -e powerpc -r .
(out) ...
(out)------------------------------------------------------------------------------
(out)TOTAL 11206 3301 29%
(out)------------------------------------------------------------------------------
(out)lines: 29.5% (3301 out of 11206) # your numbers may differ
(out)branches: 23.0% (1770 out of 7687) # and everything is still fine!
# delete coverage and test output
rm *.gcda pngout.png
Note how gcovr
gives a separate report for each source
file specified on the command line, but also gives a sum total for
statement coverage (the final lines:
reported). Your
coverage numbers may be slightly different from the example ones
listed above. The output also lists lines from each file that were not covered.
While gcovr
report branch coverage, we will not be using
this for the project.
Note that pngtest
creates pngout.png
. (This
can confuse some students, because if you collect coverage after
running on *.png
and then do it again, you may get a
different answer the second time if you're not careful because of the
newly-added file!)
There is documentation for this coverage utility. Finally, note that this utility can also produce JSON, XML, and HTML reports, which you may find easier to interpet.
We now revisit libpng
from 3B. Instead of manually
creating test images like you did for 3B, you will use a test input
generator. We will use American Fuzzy Lop, version 2.52b. A mirror copy
of the AFL tarball is here,
but you should visit the project webpage for documentation. As
the (delightful) name suggests, it is a fuzz
testing tool.
You will use AFL to create a test suite of png
images to
exercise the pngtest
program, starting from all of the
png
images provided with the original tarball. Note that you
can terminate AFL as soon as you reach 510 paths_total
(also
called total paths
in the GUI) — AFL does not stop on
its own.
libpng
! (However, others were able
to finish in about five minutes. Regardless of when you finish,
everything is fine.) Even if you are fast and can finish your work at
the last minute, the tools are not and can not. On my high-powered
server-class research test machine with a high-speed SSD it took 10
minutes, but signifiantly longer
to run on my cs-linuxlab
virtual server. However, as soon
as you get enough data (see below) you can
stop early.
Download afl-2.52b.tar.gz
Download libpng-1.6.34.tar.gz
AFL claims that one of its key advantages is ease of use. Just follow
along with their quick start guide. First, extract the AFL tarball and run
make
. Then, change into a fresh copy of the libpng
directory (you will need to configure it differently from 3B) and compile
it with a configuration akin to the following.
CC=/REPLACE/THIS/TEXT/WITH/YOUR/PARTICULAR/PATH/TO/afl-gcc_(don't_just_copy_this_in_unchanged) ./configure --disable-shared CFLAGS="-static"
make clean ; make
The "CC" variable will look something like (but maybe different for you)
CC=/home/kangstadt/p3c/afl-2.52b/afl-gcc
— note that
there is no trailing slash. If you see configure: error: C compiler
cannot create executables
, double-check your spelling here.
Note that you are not using
"coverage" or gcov
for this homework assignment. I
recommend re-extracting the libpng
tarball into a new,
separate directory just to make sure that you are not mistakenly
leaving around any gcov
-instrumented files. We only want
AFL instrumentation for this part of the assignment.
Now, make a new subdirectory and copy all of the test images
(including those in subdirectories) to it. These will be the starting
inputs for AFL to randomly generate test input. You may also use all
of your manually-created test images (from 3b); both are full-credit
options. Also, make a new subdirectory for all of the AFL ouput
(e.g., findings_dir
). You can now launch AFL with a
command similar to the following.
tmux
session so that you can disconnect if
needed without killing the process.
/REPLACE/THIS/TEXT/WITH/YOUR/path/to/afl-fuzz -i testcase_dir -o findings_dir -- /path/to/pngtest_(not_.c_nor_.png_but_the_executable_you_built) @@
# the path to afl-fuzz will be something like
# /home/kangstadt/p3c/afl-2.52b/afl-fuzz
# the path to pngtest will be something like
# /home/kangstadt/p3c/libpng-1.6.34/pngtest
Go back and double-check the end of the previous line for
@@
. It is required, it
is not a typo, and if you did not type it in (to tell AFL where its
randomly-created arguments to pngtest
go) you are likely
to "get stuck" when enumerating paths and tests (see FAQ below).
Note that you must stop afl-fuzz
yourself, otherwise
it will run forever — it does not stop on its own. Read the
Report instructions below for information on the stopping condition
and knowing "when you are done".
Note also that you can resume afl-fuzz
if it is
interrupted or stopped in the middle (you don't "lose your work").
When you try to re-run it, it will give you a helpful message like:
To resume the old session, put '-' as the input directory in the command
line ('-i -') and try again.
Just follow its directions. Warning: when you resume AFL it will
overwrite your findings/plot_data
file (which you need
for the final report), so be sure to save a copy of that somewhere
before resuming.
Note that afl-fuzz
will likely abort the first few times
you run it and ask you to change some system settings (e.g.,
echo core | sudo tee /proc/sys/kernel/core_pattern
,
echo core >/proc/sys/kernel/core_pattern
etc.). For
example, on Ubuntu systems it often asks twice. Just become
root
and execute the commands. Note that
sudo
may not work for some of the commands (e.g.,
sudo echo core >/proc/sys/kernel/core_pattern
will
fail because bash
will do the > redirection before
running sudo
so you will not yet have permissions, etc.)
— so just become root
(e.g., sudo sh
)
and then execute the commands in a root shell. You can leave the root shell
by typing exit
or ctrl + d.
The produced test cases are in the findings/queue/
directory. They will not have
the .png
extension (instead, they will have names like
000154,sr...pos/36,+cov
), but you can rename them if you
like.
While AFL is running, read the technical whitepaper to learn about how it works and compare the techniques it uses to the basic theory discussed in class.
You must write a multi-paragraph report for Project 3C. The first part of your report is a short, two-paragraph report reflecting on your experiences manually creating high-coverage tests for this assignment. Consider addressing points such as:
The remainder of your report should reflect on your experiences with automatic test generation. In particular:
libpng
that your manual
test suite from 3B did not, and vice versa. (If no such
tests exist, for example because you seeded AFL with all of your
manual tests, instead describe what did happen. For example, if
you seeded AFL with all of your manual tests, then AFL's output
will be a superset of yours.) You should also indicate how many
tests your run of AFL created, your total runtime, and the final
coverage of the AFL-created test suite (use the technique
described in 3B to compute coverage; note that AFL will
include all of the original tests as well — yes, you should
include those).
paths_total
as
reported in the findings/plot_data
CSV file produced
by your run of AFL. You can create this plot yourself (using the
program of your choice). Your scatterplot must include data
reaching at least 510 paths_total
on the y axis — which means you must run
AFL until you have at least that much data. (See here for plot examples that include much more
detail than you are asked to include here. Include a sentence that
draws a conclusion from this scatterplot. If you do not like
scatterplots you can make a line graph (etc.) as long as it
conveys the same information.
plot_data
file or if you are missing some rows at
the start or middle as long as you got up to 510
paths_total
(also called paths total
in the GUI) — it is common to have fewer than 510 rows.
afl-fuzz
you may have
a big time gap in your plot. We do not care how you handle
that (e.g., ugly graph, big gap, fudge the times, whatever)
— any approach is full credit.
As befitting a 300-level elective, this assignment is somewhat open-ended. There is no "one true path" to manually creating high-coverage test suites.
You may find yourself thinking something like "I know what to do at a high level, but I don't know specifically what to create here. I could try to read and understand the program to make tests that cause it to do different things, but reading and understanding the program takes a long time, especially since I didn't write this code and I don't really care about it and I'm only doing it because someone is telling me to." Such concerns are, in some sense, the point: they are indicative of industrial software engineering.
Finally, you might wonder if it is possible to get 100% coverage on 1a. While it should be possible to get 100% coverage for the general AVL tree algorithm, this particular implementation of the algorithm may well have branches that you are unable to reach — especially given this testing setup.
You will almost certainly need to inspect the source code of the various programs (i.e., white-box testing) to construct high-coverage test suites.
For each program, start with any test cases we have provided. You might also start with image files (for 3B) you find on the Internet. Because of limits on autograding submissions, and so that you learn how coverage computations work across various systems (a relevant skill for interviews, etc.), you will almost certainly want to install the coverage utilities yourself. As you read each project's source code and expand your test suite, recompute the coverage locally. When you perceive that it has gone up significantly, submit to the grading server for a more official check.
Note the increasing size and complexity of the subject programs. Splitting your time evenly between them is probably not the optimal allocation. The first project is due earlier to prevent students from putting things off until the last minute.
If you are having trouble using your cs-linuxlab
server,
I will help you get things working. I will not
support any other platform. In particular, MacOS is notorious for poor
compatibility with Project 3B.
This assignment is perhaps a bit different than the usual CS homework: instead of you, yourself, doing the "main activity" (i.e., creating test suites), you are asked to invoke tools that carry out that activity for you. This sort of automation (from testing to refactoring to documentation, etc.) is indicative of modern industrial software engineering.
Asking you to submit the generated tests is, in some sense, uninteresting (e.g., we could rerun the tool on the grading server if we just wanted tool-generated tests). Instead, you are asked to write a report that includes information and components that you will only have if you used the tools to generate the tests. Writing reports (e.g., for consumption by your manager or team) is also a common activity in modern industrial software engineering.
Be sure that you are submitting to the correct assignment on Gradescope! Further, be sure to read the output provided by Gradescope after your submission is run by the grading scripts. Don't wait until the last minute to submit.
For each of the sub-projects, submit the following:
*.avl
: no more than 50 inputs for testing AVL trees
*.png
: no more than 50 images for testing the PNG library
PDF
containing your written report and
your references for ALL parts of
this assignment.
You can submit these by dragging the files into the submission window on the Gradescope interface.
P3 Grading (out of 45 points):
This section details common pitfalls, questions, and resolutions.
Q: When I type the command sudo apt-get install python-pip python-dev
build-essential
I am prompted for a password — which should I use?
A: The password should be the one associated with your user account (the one you used to log in to the OS).
Q: Can I really use images or code snippets I found online to help with this assignment?
A: Yes, provided that you cite them at the end of your report. This class is about "everything except writing code", in some sense, and in the real world people do everything they can to get high-coverage test suites (including paying people to construct them and using special tools). So you can use image files or benchmarks you find on the Internet if you like — but you'll still have to do the work of paring things down to a small number of high-coverage files. You'll likely get the most out of the assignment if you use a combination of white-box testing, black-box testing and searching for resources — but there are many ways to complete it for full credit.
Q: In 3B, I get 0% code coverage with running pngtest
on
my Mac.
A: Sorry, I will not be supporting any platform other than the cs-linuxlab
servers. If you run into trouble with the server, I'll be more than happy to help!
Q: For Project 3B:
./.libs/libpng16.a(pngrutil.o): In function `png_inflate_claim':
pngrutil.c:(.text+0xd5b): undefined reference to `inflateValidate'
collect2: error: ld returned 1 exit status
Makefile:1005: recipe for target 'pngfix' failed
or
Note, selecting 'zlib1g-dev' instead of 'libz-dev'
A: You do not have libz-dev
installed correctly.
Q: I get this error
make: *** No targets specified and no makefile found. Stop.
A: You need to run configure
before running
make
. Double-check the build and installation instructions above.
Q: When I try to run AFL, I get:
[-] PROGRAM ABORT : No instrumentation detected
A: You are pointing AFL to the wrong pngtest
executable.
Double-check the instructions near CC=/path/to/afl-gcc ./configure
--disable-shared CFLAGS="-static"
, rebuild pngtest
using
"make", and then point to exactly that executable and not a different
one.
Q: When I try to run configure with AFL via
something like CC=/home/kangstadt/p3c/afl-2.52b/afl-gcc/
./configure --disable-shared CFLAGS="-static"
, I get:
checking whether the C compiler works... no
configure: error: in `/home/vagrant/eecs481/p2/libpng-1.6.34':
configure: error: C compiler cannot create executables
A: You need to specify afl-gcc
, not
afl-gcc.c
or afl-gcc/
(note trailing slash!).
Q: When I am running AFL, it gets "stuck" at 163 (or 72, or another small number) paths.
A: In multiple instances, students have forgotten the @@
at the end of the AFL command. Double check the command you are running!
Other students report that using large image files (100s of KB is considered
large here) slows down the search significantly.
Q: When I try to compile libpng with AFL, I get:
configure: error: C compiler cannot create executables
A: You need to provide the full path to the afl-gcc
executable, not just the path to p3c/afl-2.52b/
.
Q: I can't open the files produced by AFL.
A: You will almost certainly find that AFL's output queue folder does not
contain files with the .png
extension. In addition, you will
almost certainly find that most of the files produced by AFL are "invalid"
PNG files that cause program executions to cover error cases
(e.g., libpng read error
).
This is normal. If you haven't tried already, consider appending .png
to the files generated by AFL.
Double-check all of the instructions here, and the explanations in the course lectures for how these tools work: there's no need to panic. In addition, you might try alternate image viewers (rather than just the default one). For example, uploading apparently-invalid images to GitHub (no, really) works well for viewing them.
Q: Some of the so-called "png" files that AFL produces cannot be opened by my image viewer and may not even be valid "png" files at all!
A: Yes, you are correct. (Thought question: why are invalid inputs sometimes good test cases for branch coverage?)
This assignment has be adapted, updated, and revised (with permission) from a pair of assignments used in Wes Weimer's EECS 481 Software Engineering course at the University of Michigan [1, 2].
Special thanks to Will Hawkins for reviewing this assignment.