Writing Sub-Byte Data in C

13 Jan 2022

When programming, I often find it necessary to write sub-byte data (raw bits), yet most languages unfortunately fall short of this capability. It’s actually incredibly easy to accomplish the same goal in just about any language using either math or bitwise operations, and I’ll use C to demonstrate this. This kind of ability will become very useful if you want to implement a protocol such as DNS or ARP from scratch, or maybe you just need to squeeze every single bit for maximum storage efficiency, or maybe you have some other motivating factor. Whatever your reason may be, it’s rather easy to do.

The Problem

Suppose that you’re implementing IPv4 and need to write 4 bits version, 4 bits header length, 8 bits type of service, etc. but you run into a problem: 8 bits for the type of service is fine since it’s exactly 1 byte, but how do you put 4 bits version and 4 bits header length next to each other? The minimum size that you can write is a char (1 byte, or 8 bits). You need some way to stuff two pieces of data in one byte.

The Mathematical Approach

The mathematical approach is to multiply each “chunk” by powers of two based on the sum of the length of each succeeding chunk. For example, suppose that the version was 0100 (4) and the header length was 0110 (6). Concatenated, we expect the output to be 01000110 (70). We can simply mathematically express this as \((2^2 \times 2^4) + (2^2 + 2^1)\). This is the first chunk of data (0100 = \(2^2\)) multiplied by 2 raised to the power of the size of the second chunk of data (0110 is 4 bits, therefore \(2^4\)), plus the second chunk of data (0110 = \(2^2 + 2^1\)). We can generalize this for any number of chunks of data. Note that the final chunk is implicitly multiplied by \(2^0 = 1\) as there are 0 bits following it.

I mathematically expressed this explicitly in powers of two to better illustrate the nature of binary math, but the same logic in this example could be expressed as \((4 \times 2^4) + 6\), again with 6 being implicitly \(6 \times 2^0\) as there are 0 bits following it (that is to say, the next chunk is 0 bits in size).

#include <stdio.h>
#include <math.h>

int main()
{
	char byte = (4 * pow(2, 4)) + 6;
	printf("byte = 0x%x\n", byte);

	return 0;
}

[skat@anubis:/tmp/tmp.iWcTwEIsL8] $ gcc source.c
[skat@anubis:/tmp/tmp.iWcTwEIsL8] $ ./a.out
byte = 0x46

I don’t like this method. It’s a mouthful and it’s a pain, and it’s also computationally inefficient. I prefer the bitwise approach.

The Bitwise Approach

Bit shifts and bitwise OR operations are how I prefer to approach this problem instead. This is mathematically the same as the previous approach, but is much more efficient and takes advantage of the fact that a shifting left n many bits is equivalent to multiplying by \(2^n\). From there, we can efficiently stuff the next chunk of data with a bitwise OR operation.

#include <stdio.h>

int main()
{
	char byte = (4 << 4) | 6;
	printf("byte = 0x%x\n", byte);

	return 0;
}

[skat@anubis:/tmp/tmp.4x5AwTdFiP] $ gcc source.c
[skat@anubis:/tmp/tmp.4x5AwTdFiP] $ ./a.out
byte = 0x46

Generalizing, you could easily stuff as many “chunks” of bits you’d like by first “clearing” space for the bits through bitwise shift left operations, and then writing the bits through an OR operation.

Happy hacking!

shawn@home:~/blog >_

shawnd.xyz archive