Bits, Math and Performance(?): grevmul

grev (generalized bit-reverse) is an operation that implements bit-permutations corresponding to XOR-ing the indices by some value. It has been proposed to be part of the Zbp extension of RISC-V, with this reference implementation (source: release v0.93)

uint32_t grev32(uint32_t rs1, uint32_t rs2)
{
    uint32_t x = rs1;
    int shamt = rs2 & 31;
    if (shamt &  1) x = ((x & 0x55555555) <<  1) | ((x & 0xAAAAAAAA) >>  1);
    if (shamt &  2) x = ((x & 0x33333333) <<  2) | ((x & 0xCCCCCCCC) >>  2);
    if (shamt &  4) x = ((x & 0x0F0F0F0F) <<  4) | ((x & 0xF0F0F0F0) >>  4);
    if (shamt &  8) x = ((x & 0x00FF00FF) <<  8) | ((x & 0xFF00FF00) >>  8);
    if (shamt & 16) x = ((x & 0x0000FFFF) << 16) | ((x & 0xFFFF0000) >> 16);
    return x;
}

grev looks in some ways similar to bit-shifts and rotates: the left and right operands have distinct roles with the right operand being a mask of k bits if the left operand has 2^k bits^[1].

Carry-less multiplication normally has a left-shift in it, grevmul is what you get when that left-shift is replaced with grev.

uint32_t grevmul32(uint32_t x, uint32_t y)
{
    uint32_t r = 0;
    for (int k = 0; k < 32; k++) {
        if (y & (1 << k))
            r ^= grev32(x, k);
    }
    return x;
}

grevmul is, at its core, very similar to clmul: take single-bit products (logical AND) of every bit of the left operand with every bit of the right operand, then do some XOR-reduction. The difference is in which partial products are grouped together. For clmul, the partial products that contribute to bit k of the result are pairs with indices i,j such that i + j = k. For grevmul, it's the pairs with indices such that i ^ j = k. This goes back to grev permuting the bits by XOR-ing their indices by some value, and that value is k here.

Now that grevmul has been defined, let's look at some of its properties, comparing it to clmul and plain old imul.

	grevmul	clmul	imul
zero^[2]	0	0	0
identity	1	1	1
commutative	yes	yes	yes
associative	yes	yes	yes
distributes over	xor	xor	addition
op(x, 1 << k) is	grev(x, k)	x << k	x << k
x has inverse if	popcnt(x) & 1	x & 1	x & 1
op(x, x) is	popcnt(x) & 1	pdep(x, 0x55555555)	x²

What is the "grevmul inverse" of `x`?

Time for some algebra. Looking just at the table above, and forgetting the actual definition of grevmul, can we say something about the solutions of grevmul(x, y) == 1? Surprisingly, yes.

Assuming we have some x with odd hamming weight (numbers with even hamming weight do not have inverses, so let's ignore them for now), we know that grevmul(x, x) == 1. The inverse in a monoid is unique so x is not just some inverse of x, it is the (unique) inverse of x.

Since the "addition operator" is XOR (for which negation is the identity function), this is a non-trivial example of a ring in which x = -x = x^-1, when x^-1 exists. Strange, isn't it?

We also have that f(x) = grevmul(x, c) (for appropriate choices of c) is a (non-trivial) involution, so it may be a contenter for the "middle operation" of an involutary bit finalizer, but probably useless without an efficient implementation.

~~I was going to write about implementing grevmul by an 8-bit constant with two GF2P8AFFINEQBs but I've had enough for now, maybe later.~~ E: see Implementing grevmul with GF2P8AFFINEQB where I went ahead and implemented the whole thing, not only the "multiply by 8-bit constant" case.

[1] The right operand of a shift is often called the shift count, but it can also be interpreted as a mask indicating some subset of shift-by-2ⁱ operations to perform. That interpretation is useful for example when implementing a shift-by-variable operation on a machine that only has a shift-by-constant instruction, following the same pattern as the reference implementation of grev32.

[2] This looks like a joke, but I mean that the numeric value 0 acts as the zero element of the corresponding semigroup.

Monday, 22 May 2023

grevmul

What is the "grevmul inverse" of x?

What is the "grevmul inverse" of `x`?