Sunday, 18 November 2012

The basics of working with the signbit

this is a filler (in that it is much easier than the usual material), but it seems like most readers only read the fillers anyway

When I write signbit, I mean the upper bit in a bit string that is interpreted as a two's complement signed integer.

Central to working with the signbit is the idea that signed shift right aka arithmetic shift right copies the signbit to other bits, and specifically, a signed shift right by 31 (or 63 or in general, one less than the size of your numbers) broadcasts the signbit to all other bits.

Perhaps the most obvious thing you can do with that is broadcasting an arbitrary bit to all other bits. Simply shift that bit into the signbit, and then shift right by 31:
static int broadcastbit(int value, int bitindex)
{
    // put the target bit in the sign
    int temp = value << (31 - bitindex);
    // copy it to all bits
    return temp >> 31;
}
In C, that's undefined behaviour (UB). Letting a left shift overflow (which could easily happen here) is UB, and signed right shift is UB in any case. But this is C# code (the source of this page will tell you so) where it's perfectly well-defined. And anyway, this is the kind of UB that is safe to use; the expected thing happens when you combine a sane compiler with a typical platform (say, MSVC on x86). But, of course, purists won't like it and on platforms without arithmetic right shift it's probably not going to work.

That actually applies to most of this blog, I suppose.

On to other tricks. This one is slightly harder to grasp, but more useful: calculating the absolute value of an integer without branching. First, the simple to understand version.
static int abs(int value)
{
    // make a mask that is all ones if negative, or all zeroes if non-negative
    int mask = value >> 31;
    // select -value if negative, or value if non-negative
    return (mask & -value) | (~mask & value);
}
That's just the usual branchless selection between two things.

The better way to do this has to do with how negation works. The negation of a number x is ~x + 1 (first definition) or ~(x - 1) (second definition). Those definitions are, of course, equivalent. The trick (and you may have seen this coming), is to make the complement and the increment/decrement conditional based on the mask.
static int abs(int value)
{
    // make a mask that is all ones if negative, or all zeroes if non-negative
    int mask = value >> 31;
    // conditionally complement and subtract -1 (first definition)
    return (value ^ mask) - mask;
    // conditionally add -1 and complement (second definition)
    return (value + mask) ^ mask;
}
I've heard that the version of abs using the first definition is patented. That probably doesn't hold up (there will be a mountain of prior art and it's an obvious trick that anyone could derive), and no one's going to find out you're using it much less sue you for it, but you could use the version using the second definition just to be on the safe side.

One good thing about the simple version of abs is that it's using a generic branchless selection. That means you're not limited to choosing between value and -value, you can select anything. For example, you can subtract two numbers and use the sign of the difference to select the (unsigned) smallest one. That doesn't always work. The subtraction must not overflow, otherwise it selects the wrong one. The problem goes away if the inputs are smaller than ints, for example if they are bytes.
static byte min(byte x, byte y)
{
    int difference = x - y;
    // make a mask that is all ones if x < y, or all zeroes if x >= y
    int mask = difference >> 31;
    // select x if x < y, or y if x >= y
    return (byte)((mask & x) | (~mask & y));
    // alternative: use arithmetic to select the minimum
    return (byte)(y + (difference & mask));
}
The weird mixing of signed and unsigned may be confusing. Try to think of numbers as pure bit strings and only look at the type when an operator depends on it. That's closer to what actually happens in a computer, and it's less confusing that way.

The problem also goes away if you can use the carry flag instead of the signbit, because then you're not using a bit of the result to hold a flag but a separate thing, and thus doesn't "eat into the range of values". But high level languages are too good for the carry flag or something like that, and don't enable you to use it. So here's min in x86 assembly:
    ; inputs are in eax and edx, result in eax
    sub eax, edx
    sbb ecx, ecx    ; makes ecx all ones if carry (ie. if eax < edx)
    and eax, ecx
    add eax, edx
Whether this or the more usual branchless version with cmov is faster depends on the processor.

And that has nothing to do with the signbit anymore, I know.

These tricks, and many others, also extend to tesseral arithmetic, which I'll cover in my next post, which isn't a filler.