Mathematicians may call carryless multiplication "multiplication in GF(2^n)", but that doesn't explain how it works - recall the shift-and-add algorithm for multiplication:

static uint mul(uint a, uint b) { uint r = 0; while (b != 0) { if ((a & 1) != 0) r += b; a >>= 1; b <<= 1; } return r; }Carryless multiplication is a very simple variation on that: do the addition without carries. That's just a XOR.

static uint cl_mul(uint a, uint b) { uint r = 0; while (b != 0) { if ((a & 1) != 0) r ^= b; // carryless addition is xor a >>= 1; b <<= 1; } return r; }It has some applications in complicated cryptography related algorithms, but it also seems like this should be an interesting and powerful operation when working with bits, and it may well be, but I know of almost no uses for it (

As an aside, using associativity, it can be shown that the parallel suffix with XOR (which does have some known uses in bitmath, for example in implementing compress_right in software), code shown below, is equivalent to a carryless multiplication by -1.

// parallel suffix with XOR x ^= x << 1; x ^= x << 2; x ^= x << 4; x ^= x << 8; x ^= x << 16;Every step is clearly a carryless multiplication, by 3, 5, 17, 257, and 65537 respectively. So it's equivalent to:

`clmul(clmul(clmul(clmul(clmul(x, 3), 5), 17), 257), 65537)`

which can be rearranged (using associativity) to:`clmul(x, clmul(clmul(clmul(clmul(3, 5), 17), 257), 65537))`

which works out to `clmul(x, -1)`

. Of course it was supposed to work out that way, because every bit of the result should be the XOR of all bits up to (and including) that bit, but it's nice that it also follows easily from a basic property. Incidentally if you have a full-width carryless multiplication, multiplying by -1 also computes the parallel *prefix*with XOR in the upper bits (the upper bit of the low word, which is the parity of the input, is shared by the suffix and the prefix.)

Carryless multiplication also shares an other property with normal multiplication: there are multiplicative inverses modulo 2

^{n}(and also modulo other numbers, but 2

^{n}is of particular interest since we're working in that by default anyway). Again there are only inverses for odd numbers, and it's equally obvious (as for normal multiplication) why that should be so - an even multiplier will throw out at least one high order bit. First, here's an example of carrlessly multiplying

`x`

by -1 and then carrylessly multiplying that by 3.x = {d}{c}{b}{a} // the letters are bits y = cl_mul(x, -1) = {d^c^b^a}{c^b^a}{b^a}{a} z = cl_mulinv(-1) = 0011 cl_mul(y, z) = {d^c^b^a ^ c^b^a}{c^b^a ^ b^a}{b^a ^ a}{a} = {d}{c}{b}{a}Ok, so that worked out well, and it also gives part the answer to exercise 3 in chapter 5 of Hacker's Delight (about whether parallel prefix/suffix with XOR is invertible and how) because a carryless multiplication by -1 is the same as the parallel suffix with XOR. A carryless multiplication of

`y`

by 3 is of course just `y ^ (y << 1)`

.But back to actually computing the inverse. The inverse had better be odd, so bit 0 is already known, and for all the other bits follow these steps

- if the remainder is 1, stop
- if bit
`k`

is 0, go to step 5 - set bit
`k`

in the inverse - xor the remainder with
`input << k`

- increment k and go to step 1

`k`

is 32" (or some other number, depending on how many bits your ints are wide), which is easier to unroll and better suited for a hardware implementation (not that I've seen this operation implemented in hardware anywhere).For example, in C# it could look like this:

static uint clmulinv(uint x) { uint inv = 1; uint rem = x; for (int i = 1; i < 32; i++) { if (((rem >> i) & 1) != 0) { rem ^= x << i; inv |= 1u << i; } } return inv; }

A variation of the algorithm to find a multiplicative inverse modulo a power of two (see

`inv`here) also works, which is useful when clmul is fast:

static uint clmulinv(uint d) { uint x = 1; for (int i = 0; i < 5; i++) { x = clmul(x, clmul(x, d)); } return x; }

The first iteration sets x to d, that can be done immediately to skip an iteration.

Some sample inverses

1 0x00000001 3 0xFFFFFFFF 5 0x55555555 7 0xDB6DB6DB 9 0x49249249 11 0x72E5CB97 13 0xD3A74E9D 15 0x33333333

The definition of `clmul` at the start of the post was meant to be just that, a faster way to emulate it is this:

static uint clmul(uint a, uint b) { uint r = 0; do { r ^= a * (b & (0 - b)); b &= b - 1; r ^= a * (b & (0 - b)); b &= b - 1; r ^= a * (b & (0 - b)); b &= b - 1; r ^= a * (b & (0 - b)); b &= b - 1; } while (b != 0); return r; }This works by extracting a bit from

`b`and multiplying by it (which just shifts

`a`left), then resetting that bit. This can be unrolled safely since when

`b == 0`, no further changes are made to

`r`automatically. The

`0 - b`thing is due to an unfortunate misfeature of C#, negating unsigned integers converts it to a

`long`.

A similar trick works for the inverse:

static uint clmulinv(uint x) { uint inv = 1; uint rem = x - 1; while (rem != 0) { uint m = rem & (0 - rem); rem ^= x * m; inv += m; } return inv; }

By the way, why is this post so popular? Please let me know in the comments down below.