PERversity at it’s worst

November 14th, 2007 by kowsik

Every now and then you look back and think about all the time you spent working on something that was so pointlessly convoluted and intentionally perverse, you wonder what’s wrong with the world. You heard me kvetch about ASN. Well, it’s another incarnation of the same beast, except it’s PER. It, BTW, stands for Perverse Encoding Rules. The true 50-ways-to-encode-your-lover.


Okay, I’m half kidding about the perversity of it all. It stands for Packed Encoding Rules. But, by golly, it’s the ultimate let-me-make-this-so-complex-so-i-look-cool encoding scheme. Even scarier is that a big part of the world uses this encoding for their VoIP infrastructure. *shiver*

To quote Lev Walkin, the author of asn1c (which is a C compiler for ASN.1):

The main reason why ASN.1 is still alive is that too much time and effort
is necessary for learning it more or less adequately, thus creating a gut
necessity to demonstrate that acquired skill everywhere afterwards.
No, I am not going to explain what the following stuff is.

Well, I saw this in his code. Can’t really be sure if he wrote it or someone else did. But, I couldn’t stop laughing when I read it. I’ve been pouring over ASN.1 complete (which, BTW, is one of the best reference books for ASN1), hexdumps, lots of ruby %b’s and a bunch of pcap’s.

The number of rules, exceptions to the rules, rules because the moon was aligned [sic] and rules that are there just to $#*@ with your mind are absolutely mind boggling.

My own thoughts on PER is:

If you think hard enough about it, it might just barely be intuitive!

Thinking about the encoding rules from the consumer side (input parsing) helps just a tad bit more. But really assuming that the entire world is bouncing packets from place to place through daisy-chained satellites operating at a hyper speed of 9600 baud rate really brings you much closer to understanding PER. Not.

The following set of blogs are for those who that have to deal with this nastiness. It also helps me clear the cruft in my mind. For the most part, we’ll be looking at the encoding from the consumer side.

Aligned and Unaligned

There are two variations of PER: aligned and unaligned. Unaligned is one big bit mess. Everything’s jammed together with absolute disregard to the bytes. Aligned is one that every now and then [sic] pads entries so they start at the byte boundary. There are like a gazillion exceptions that tell you when you should be octet-aligned, as the parlance goes.

For one thing, you need the original specification baked into the parser to be able to make sense of what’s hitting you. Typically this is done by going through a ASN.1 compiler that generates the appropriate language bindings. PER tries to hyper-optimize the bits transmitted, but also tries to be smart of protocol version mismatches between the sender and the receiver. The classic you say potayto, I say potahto.

We’ll start with the easy ones first. Just remember that the number of all bits to make up a message may not be a multiple of 8-bits. Obviously TCP and UDP don’t know how to transmit half or one-quarter byte. So you pad the bits to round it up. What if the message, for some reason, ends up with 0-bits? Well, you still send one-byte with all zeroes.

NULL

This one’s the simplest one of all. If in the specification, within a SET or SEQUENCE, you encounter a field who’s type is NULL, you send absolutely nothing to the other side. Zippo, nada, nothing. The sender knows what’s coming so it auto fills the field with NULL and moves on to the next one. If only the rest of the encoding was this trivial. *sigh*

BOOLEAN

Okay, the next easy one. 1-bit (could be anywhere on the byte) that’s set to 1 if you feel lucky and set to 0 if it sucks for you. In other words TRUE = 1 and FALSE = 0. Remember, it’s just a single little bit.

OBJECT IDENTIFIER

Another easy one. This is pretty much like the BER encoding, except you don’t transmit the type. It’s just the length and the standard object id encoding.

"1.2.3.4" => L:P03 C:2a 03 04

That’s 0×03 for the length of what follows, 0×2a (40*1 + 2) for the OID prefix compression and 0×03 0×04 for the rest of the OID. The whole shebang needs to be forced to start from a byte-boundary. We are using the ASN.1 complete way to indicate that “L” is for the length, “P” indicates that what follows is padded to start at a byte-boundary and “C” indicates the content.

We’ll go over encoding INTEGERs in the next blog. Notice the plurality?

Posted in Rants | Permalink | Trackback

Leave a Comment

Please note: Comment moderation is enabled and may delay your comment. There is no need to resubmit your comment.