Fieldomatic Complexity

May 23rd, 2008 by kowsik

If you’ve gone through my CanSecWest slides, I talk a lot about Field’s and how they are the fundamental units of protocols (network or file formats). The linkage information between the Field’s and across messages is a pretty powerful way to infer the cyclomatic complexity of the code that parses these messages. When generating test cases (fuzzing being one kind), we can leverage these structural and semantic linkages to generate systematic constraint violations that ultimately exercise the various branches taken in the parser.


There are two kinds of field dependencies within and across protocol messages. The first one is structural where the value of field is a function of another field. Common examples of such structural dependencies are checksum.of, length.of, crc.of, etc. Here the length field’s value is computed based on some other field in the same message. When the field pointed to changes (empty, truncated, overflow, etc), then the length needs to be recomputed. We see this a lot with protocols like SNMP and ASN in general, where the outermost SEQUENCE’s length depends on something deep inside of it.

The second kind of dependency is semantic. This is not always obvious with a grammar that describes the protocol, but it’s typically expressed as verbiage in the specification. A simple example of semantic dependency is the width and height field’s in any image format. The parser that reads the file format typically allocates memory using something like width * height * bits_per_pixel. Thusly changing these fields together in some meaningful way results in much better branch coverage than changing them one at a time.

Being an ex-graphics guy, I love seeing things. So here are some images auto generated from our engine after parsing basic file formats. For the various images, the black arrow indicate parent-child relationship while the red arrow indicate structural dependencies.

PNG

png.png

Here you can see that each CHUNK in the file format there are two depedencies: length.of and crc32.of. It’s also pretty obvious that crc32 doesn’t include the length of the data, which is interesting. The semantic dependency of course is that color-type field has a contraint imposed on the PLTE chunk. If it’s an 8-bit color type, then the PLTE chunk is mandatory. Of course the obvious test case is to have an 8-bit color type with no PLTE chunk and ensure that the code handles these corner case.

Quicktime

quicktime.png is an interesting one, where the the structural depdencies point upwards. In other words each ATOM in the quick time file contains a size field which is the size of the whole ATOM include the size itself! This is pretty common in protocols and what this means really is that minimum size expected has to be 4. Point really is that by expressing these dependencies declaratively, the test case generation can be made super intelligent and automatic. You don’t have to remember to add the test case that sets the size to 0, 1, 2 and 3. You see something very similar in IP options where the the length includes the 1-byte type and the 1-byte length which means the minimum value is 2.

Java class

class.png (2.5MBytes) is super interesting and has lots in common with executable file formats like PE, ELF, etc. If you look at the number of red edges, the majority of them point towards the constant-pool field which is a string table. All method, class, file, variable names are all stored here and referenced from all the other places. This essentially means that the parser has to do tons of looks up into this table to fetch these strings. So fuzzing the indexes into the constant pool systematically (and automatically!) all over the file is super interesting since the class name (for example) could actually be pointing to something that’s not a UTF8 string.

The semantic dependency appears all over the place as well. For example, the most obvious one is the this_class and the super_class. Obviously you can’t inherit from yourself or you can’t create cycles in the inheritance. Well, maybe you can’t do this from within Eclipse, but from the file format perspective you absolutely can.

So what?

Point really is that by modeling structural and semantic dependencies between fields and messages, you can fuzz a whole lot better and more importantly automatically. I really don’t want to remember that both IP options and the Quicktime size fields include themselves and hence I need to generate values the make sense in this situation. Laziness is truly a virtue.

Posted in Mutations, Research | Permalink | Trackback

Leave a Comment

Please note: Comment moderation is enabled and may delay your comment. There is no need to resubmit your comment.