O.O.M.P.H.
Object Oriented Mutations for Protocol Hardening.
Seriously though, I want to write about the process through which we create mutations and how ultimately it gets linked up in an assorted set of seemingly unrelated protocols. The take aways (for the impatient) are that mutations are really unit-tests, but have their origins in the following:
- Protocol specification
- Code reviews
- Known vulnerabilities that other people have found
- Expertise
- Structure
- Semantics
- State
Let’s take IPv6 address strings as an example. We are going to go through the process of identifying a set of inputs to the IPv6 parsing routine (inet_pton6) that stresses all parts of the code. Incidentally, this code was written by Paul Vixie back in 1996.
First the RFC that describes the set of valid strings for an IPv6 address is rfc-2373. The first reaction for creating mutations is to stick a bunch of ‘:’ in there or even worse - a string of A’s. But we are going to approach the problem a little bit more metric driven. The utlimate goal of course is that we exercise most, if not all, code paths in the parser and potentially find problems.
Here’s the basic test program that essentially contains the code for the IPv6 string address parser along with a simple driver that reads from stdin and passes the strings to the parser. Note that we are not interested in conformance or the correctness of the parser, though that’s fairly easy to write as well.
So how can we go about measuring the impact of our mutations to this parsing routine? We are going to be using gcov which is a code coverage tool. First we compile this program with code coverage enabled:
% gcc -fprofile-arcs -g inet_pton6.c
which produces a.out in the same directory. Let’s try the simplest input to this program, which also happens to be a valid IPv6 address:
% rm -f inet_pton6.gcda % echo "::" | ./a.out % gcov inet_pton6.c File 'inet_pton6.c' Lines executed:41.38% of 87 inet_pton6.c:creating 'inet_pton6.c.gcov'
Not bad for the very first attempt. The inet_pton6.gcda file is something that a.out updates each time it’s run so we can measure aggregate coverage counts across multiple runs. For now, we only care about the result of a single run so that we can see if we can get closer to 100% coverage.
By using intuition, the protocol specification, as well looking at the code, let’s create a file that contains the following IPv6 string’s.
% cat > v6.txt :: : ::: :::: fffffffffffffffffffffffffff ff aa:bb:cc:dd:ee:ff:11:22: ffffffff:::: ffffffffffffffff:::: ::% %x::%x::%x::%x::%x::%x::%x::%x aa%x::bb%x::cc%x::dd%x::ee%x::ff%x::11%x::22%x .::.::.::.::. ::.::.::.::.::. ::.::.::.::.::.:: ::abcdefghijklmnopqrstuvwxyz aaaa:bbbb:cccc:dddd:eeee:ffffffffffffffff
and check our coverage again:
% rm -f inet_pton6.gcda % cat v6.txt | ./a.out % gcov inet_pton6.c File 'inet_pton6.c' Lines executed:74.71% of 87 inet_pton6.c:creating 'inet_pton6.c.gcov'
Not bad at all. We are now upto covering 74.71% of the parsing code. What are we missing? When we run gcov, it outputs a inet_pton6.c.gcov which contains the lines of code that were never hit during the run. After inspecting them and also realizing that you can have IPv4 addresses as part of the IPv6 addresses, let’s add a few more mutations:
% cat >> v6.txt ::. ::127. ::127.0.0.1 ::256.0.0.1 ::256.0.0. ::.256.0.0. ::.256/0.0 ::1.2.3.4.5.6.7.9.10 ::123456789.123456789.123456789.123456789 ::ffff::. ::ffff::127. ::ffff::127.0.0.1 ::ffff::256.0.0.1 ::ffff::256.0.0. ::ffff::.256.0.0. ::ffff::.256/0.0 ::ffff::1.2.3.4.5.6.7.9.10 ::ffff::123456789.123456789.123456789.123456789
and running the program again with this input shows:
% rm -f inet_pton6.gcda % cat v6.txt | ./a.out % gcov inet_pton6.c File 'inet_pton6.c' Lines executed:96.55% of 87 inet_pton6.c:creating 'inet_pton6.c.gcov'
Nice. I’m sure we can continue to push the envelope to achieve 100% and that just becomes the starting point for mutations. The parser has loops and branches and integers and buffers that we can address, but those become jump-off points to the basic set of mutations we have created so far. The set of inputs for the parser now become a mutated object in the system that carries with it the knowledge gained so far. One caveat: code coverage is a metric for the quality of the mutations. We might hit every branch in the code and still have vulnerabilities since the mutations have to take into account much more than the coverage. A simple example is the strcpy function. Any non-empty string you pass to it will result in 100% coverage, but that doesn’t take into account the length of the input string which might cause an overflow. Regardless, it’s still a great starting point.
So where are IPv6 string addresses actually used? Let’s find out:
% query-rfc.rb --back-refs 2373 references
Hmmm, 79 RFC’s seem to be pointing to the rfc-2373 for getting the syntax of IPv6 address strings. See the interconnectedness? So when we start attacking these protocols, all we really have to do is use the mutated object that we already have in the appropriate places in the protocol and we spread the knowledge gained by a simple old IPv6 string.
The key point is that we’ve modeled the IPv6 address string as a live object and connected them up into the other protocols. Any new attack on IPv6 address strings can be done in a single place and the other protocols are all automatically updated to reflect this. One person does the research and lots of people benefit from it. Basic software modularization at work.
Somehow, mutations feel like unit tests don’t they?
Well, they are.
Did we just fuzz on this code?
Yes, we did.
Is this negative testing?
Absolutely.
The difference is one of quality and coverage and how targeted the set of inputs are towards exposing vulnerabilities in the code.