Thursday, 13 September 2012

Calculating the lower bound of the bitwise OR of two bounded variables

What does that even mean?

Suppose you have the variables x in [a, b] and y in [c, d]. The question then is: what is the lowest possible value of x | y where x and y are both in their corresponding ranges. In other words, evaluate
\begin{equation} \min _{x \in [a, b], y \in [c, d]} x | y \end{equation} At a maximum of 264 iterations, direct evaluation is clearly not an option for 32-bit integers.

Fortunately, there is an algorithm that has a complexity linear in the number of bits, given by Warren in Hackers Delight, Propagating Bounds through Logical Operations, which the license permits me to show here:
unsigned minOR(unsigned a, unsigned b, 
               unsigned c, unsigned d) {
   unsigned m, temp; 
   m = 0x80000000; 
   while (m != 0) {
      if (~a & c & m) {
         temp = (a | m) & -m; 
         if (temp <= b) {a = temp; break;} 
      else if (a & ~c & m) {
         temp = (c | m) & -m; 
         if (temp <= d) {c = temp; break;} 
      m = m >> 1; 
   return a | c; 

So let's break down what it's doing. It starts at the MSB, and then it searches for either the highest bit that is zero a and one in c such that changing a to have that bit set and all bits the right of it unset would not make the new a higher than b, or, the highest bit that is zero c and one in a such that changing c to have that bit set and all bits the right of it unset would not make the new c higher than d, whichever one comes first.

That's literally easier to code than to explain, and I haven't even explained yet why it works.
Suppose the highest such bit is found in a. Setting that bit in a does not affect the value of a | c, after all, that bit must have been set in c already so it was already set in a | c, too. However, resetting the bits to the right of that bit however can lower a | c. Notice that it is pointless to continue looking at lower bits - in a there are no more bits to reset, and for c there are no more bits that have the corresponding bit in a set.

Warren notes that m could start at 0x80000000 >> nlz(a ^ c) (where nlz is the "number of leading zeros" function), meaning it starts looking at the first bit that is different in a and c. But we can do better. Not only can we start at the first bit which is different in a and c, we could look at only those bits. That requires frequent invocation of the nlz function (or bsr, bit scan reverse, giving the index of the leftmost bit), but it maps to a fast instruction on many platforms.
uint minOR(uint a, uint b, uint c, uint d)
    uint bits = a ^ c;
    while (bits != 0)
        // get the highest bit
        uint m = 1u << (nlz(bits) ^ 31);
        // remove the bit
        bits ^= m;
        if ((a & m) == 0)
            uint temp = (a | m) & -m;
            if (temp <= b) { a = temp; break; }
            uint temp = (c | m) & -m;
            if (temp <= d) { c = temp; break; }
    return a | c;
One interesting consequence of looking only at the bits that are different is that the second if disappears - the case where the bits are equal is ruled out by looking only at the different bits in the first place.

But that is not all. The bit positions at which the <= operators could return true, are precisely all those at and to the right of one important point: the highest set bit in a ^ b (or c ^ d for the other bound). Why? Well the upper bounds are not lower than the lower bounds, so the first bit at which they differ must be the first position at which the lower bound has a zero where the upper bound has a one. Setting that bit to one and all bits to the right to zero in the lower is clearly valid (ie doesn't make it higher than the upper bound), but whether that bit can actually be set depends on the other lower bound as well.

What that means in practical terms, is that the value of m that first passes the tests is directly computable. No loops required. Also, because the test to check whether the new bound is still less than or equal to the upper bound isn't necessary anymore (by construction, that test always passes), the bit doesn't even have to be set anymore - without the test the new value isn't really needed, and the entire idea was that setting that bit would not change the result, so setting it is pointless.
uint minOR(uint a, uint b, uint c, uint d)
    uint settablea = (a ^ b) == 0 ? 0 : 0xFFFFFFFF >> nlz(a ^ b);
    uint settablec = (c ^ d) == 0 ? 0 : 0xFFFFFFFF >> nlz(c ^ d);
    uint candidatebitsa = (~a & c) & settablea;
    uint candidatebitsc = (a & ~c) & settablec;
    uint candidatebits = candidatebitsa | candidatebitsc;

    uint target = candidatebits == 0 ? 0 : 1u << bsr(candidatebits);
    uint targeta = c & target;
    uint targetc = a & target;

    uint newa = a & ~(targeta == 0 ? 0 : targeta - 1);
    uint newc = c & ~(targetc == 0 ? 0 : targetc - 1);
    return newa | newc;
Sadly, there's an awful lot of conditionals in there, which could be branches. But they could also be conditional moves. And on x86 at least, both bsr and lzcnt set a nice condition flag if the input was zero, so it's really not too bad in practice. It is, in my opinion, a pity that there aren't more instruction to deal with leftmost bits, while instruction that deal with the rightmost bit are being added left and right. They are nice, I will admit, but the rightmost bit could already be efficiently dealt with, while the leftmost bit is somewhat problematic.

Next post, well, I'm not sure yet. I'll work on something similar for the upper bound instead of the lower bound, but who knows.
edit: As it turns out, the upper bound was fairly easy to do as well.

No comments:

Post a Comment