## Sunday, 16 September 2012

### Calculating the lower and upper bound of the bitwise OR of two variables that are bounded and may have bits known to be zero

This new problem clearly is related to two of my previous posts. But this time, there is slightly more information. It may look like a contrived, purely theoretical, problem, but it actually has applications in abstract interpretation. Static knowledge about the values that variables could have at runtime often takes the form of a range and a number that the variable is known to be a multiple of, which is most commonly a power of two.

The lower bound will be $$\min _{x \in [a, b] \wedge m\backslash x, y \in [c, d] \wedge n\backslash y} x | y$$ And the upper bound will be $$\max _{x \in [a, b] \wedge m\backslash x, y \in [c, d] \wedge n\backslash y} x | y$$ Where m\x means "x is divisible by m".

So how can we calculate them faster than direct evaluation? I don't know, and to my knowledge, no one else does either. But if sound (ie only overapproximating) but non-tight bounds are OK, then there is a way. Part of the trick is constraining m and n to be powers of two. It's safe to use m = m & -m. That should look familiar - it's extracting the rightmost bit of m. An other explanation of "the rightmost bit of m" is "the highest power of two that divides m". That doesn't rule out any values of x that were valid before, so it's a sound approximation.

Strangely, for minOR, if the bounds are pre-rounded to their corresponding powers of two, there is absolutely no difference in the code whatsoever. It is possible to set a bit that is known to be zero in that bound, but that can only happen if that bit is one in the other bound anyway, so it doesn't affect the result. The other case, setting a bit that is not known to be zero, is the same as it would be with only the range information.

maxOR is a problem though. In maxOR, bits at the right are set which may be known to be zero. Some of those bits may have to be reset. But how many? To avoid resetting too many bits, we have to round the result down to a multiple of min(m, n). That's clearly sound - if a bit can't be one in both x and n, obviously it can't be one in the result. But it turns out not to be tight - for example for [8, 9] 1\x and [0, 8] 4\y, it computes 0b1111, even though the last two bits can only be 0b00 or 0b01 (y does not contribute to these bits, and the range of x is so small that the bits only have those values) so the tight upper bound is 0b1101. If that's acceptable, the code would be
static uint maxOR(uint a, uint b, uint c, uint d, uint m, uint n)
{
uint resettableb = (a ^ b) == 0 ? 0 : 0xFFFFFFFF >> nlz(a ^ b);
uint resettabled = (c ^ d) == 0 ? 0 : 0xFFFFFFFF >> nlz(c ^ d);
uint resettable = b & d & (resettableb | resettabled);
uint target = resettable == 0 ? 0 : 1u << bsr(resettable);
uint targetb = target & resettableb;
uint targetd = target & resettabled & ~resettableb;
uint newb = b | (targetb == 0 ? 0 : targetb - 1);
uint newd = d | (targetd == 0 ? 0 : targetd - 1);
uint mask = (m | n) & (0 - (m | n));
return (newb | newd) & (0 - mask);
}
Which also uses a sneaky way of getting min(m, n) - by ORing them and then taking the rightmost bit. Because why not.

I haven't (yet?) found a nice way to calculate the tight upper bound. Even if I do, that still leaves things non-tight when the old m or n were not powers of two.

Next post, xor, which has some unique difficulties.

## Friday, 14 September 2012

### Calculating the lower and upper bounds of the bitwise AND of two bounded variables

This post is the closely related the previous post and the post before it, so I strongly suggest you read those two first.

It's the same idea as before, but with bitwise AND instead of OR. That leads to some interesting symmetries. First, the definitions. The lower bound will be $$\min _{x \in [a, b], y \in [c, d]} x \& y$$ And the upper bound will be $$\max _{x \in [a, b], y \in [c, d]} x \& y$$ The algorithms given by Warren are
unsigned minAND(unsigned a, unsigned b,
unsigned c, unsigned d) {
unsigned m, temp;

m = 0x80000000;
while (m != 0) {
if (~a & ~c & m) {
temp = (a | m) & -m;
if (temp <= b) {a = temp; break;}
temp = (c | m) & -m;
if (temp <= d) {c = temp; break;}
}
m = m >> 1;
}
return a & c;
}
unsigned maxAND(unsigned a, unsigned b,
unsigned c, unsigned d) {
unsigned m, temp;

m = 0x80000000;
while (m != 0) {
if (b & ~d & m) {
temp = (b & ~m) | (m - 1);
if (temp >= a) {b = temp; break;}
}
else if (~b & d & m) {
temp = (d & ~m) | (m - 1);
if (temp >= c) {d = temp; break;}
}
m = m >> 1;
}
return b & d;
}
Obviously, the follow the same basic idea. Try to set a bit so you can reset the bits to the right of it in the lower bound, or try to reset a bit so you can set the bits to the right of it in the upper bound. The same reasoning about starting at 0x80000000 >> nlz(~a & ~c) or 0x80000000 >> nlz(b ^ d) applies, and the same reasoning about "bits at and to the right of a ^ b" applies as well. I'll skip the "sparse loops" this time, they're nice enough but mainly instructive, and repeating the same idea twice doesn't make it twice as instructive. So straight to the loopless algorithms:
static uint minAND(uint a, uint b, uint c, uint d)
{
uint settablea = (a ^ b) == 0 ? 0 : 0xFFFFFFFF >> nlz(a ^ b);
uint settablec = (c ^ d) == 0 ? 0 : 0xFFFFFFFF >> nlz(c ^ d);
uint settable = ~a & ~c & (settablea | settablec);
uint target = settable == 0 ? 0 : 1u << bsr(settable);
uint targeta = target & settablea;
uint targetc = target & settablec & ~settablea;
uint newa = a & (targeta == 0 ? 0xFFFFFFFF : 0-targeta);
uint newc = c & (targetc == 0 ? 0xFFFFFFFF : 0-targetc);
return newa & newc;
}
static uint maxAND(uint a, uint b, uint c, uint d)
{
uint resettableb = (a ^ b) == 0 ? 0 : 0xFFFFFFFF >> nlz(a ^ b);
uint resettabled = (c ^ d) == 0 ? 0 : 0xFFFFFFFF >> nlz(c ^ d);
uint candidatebitsb = b & ~d & resettableb;
uint candidatebitsd = ~b & d & resettabled;
uint candidatebits = candidatebitsb | candidatebitsd;
uint target = candidatebits == 0 ? 0 : 1u << bsr(candidatebits);
uint targetb = target & b;
uint targetd = target & d & ~b;
uint newb = b | (targetb == 0 ? 0 : targetb - 1);
uint newd = d | (targetd == 0 ? 0 : targetd - 1);
return newb & newd;

Symmetry everywhere. But not really anything to new to explain.

Next post, something new to explain.

### Calculating the upper bound of the bitwise OR of two bounded variables

This post is the closely related the previous one, so I strongly suggest you read that one first.

The only difference with the previous post, is that this time, we're interested in the upper bound instead of the lower bound. In other words, evaluate
$$\max _{x \in [a, b], y \in [c, d]} x | y$$ The algorithm given by Warren in Hackers Delight is
unsigned maxOR(unsigned a, unsigned b,
unsigned c, unsigned d) {
unsigned m, temp;

m = 0x80000000;
while (m != 0) {
if (b & d & m) {
temp = (b - m) | (m - 1);
if (temp >= a) {b = temp; break;}
temp = (d - m) | (m - 1);
if (temp >= c) {d = temp; break;}
}
m = m >> 1;
}
return b | d;
}
And it's really the same sort of idea as the algorithm to calculate the minimum, except this time we're looking for a place where both b and d are one, so we can try to reset that bit and set all the bits to the right of it.

Warren notes that m can start at 0x80000000 >> nlz(b & d), and once again the same principle holds: it's enough to only look at those bits which are one in b & d, and they can be visited from high to low with bsr
static uint maxOR(uint a, uint b, uint c, uint d)
{
while (bits != 0)
{
uint m = 1u << bsr(bits);

uint temp;
temp = (b - m) | (m - 1);
if (temp >= a) { b = temp; break; }
temp = (d - m) | (m - 1);
if (temp >= c) { d = temp; break; }

bits ^= m;
}
return b | d;
}
And also, again, we can use that the bit we're looking for in b must be at or to the right of the leftmost bit in a ^ b (c ^ d for d), and that the selected bit doesn't actually have to be changed.
static uint maxOR(uint a, uint b, uint c, uint d)
{
uint resettableb = (a ^ b) == 0 ? 0 : 0xFFFFFFFF >> nlz(a ^ b);
uint resettabled = (c ^ d) == 0 ? 0 : 0xFFFFFFFF >> nlz(c ^ d);
uint candidatebits = b & d & (resettableb | resettabled);
uint target = candidatebits == 0 ? 0 : 1u << bsr(candidatebits);
uint targetb = target & resettableb;
uint targetd = target & resettabled & ~resettableb;
uint newb = b | (targetb == 0 ? 0 : targetb - 1);
uint newd = d | (targetd == 0 ? 0 : targetd - 1);
return newb | newd;
}
Most of the code should be obvious after a moments thought, but something interesting and non-symmetric happens for targetd. There, I had to make sure that a change is not made to both bounds (that would invalidate the whole idea of "being able to make the change without affecting that bit in the result"). In minOR that happened automatically because it looked at positions where the bits were different, so both targets couldn't both be non-zero. Here, one of the bounds has to be explicitly prioritized before the other.

Next post, maybe the same sort of thing but for bitwise AND. Then again, maybe not. I'll see what I can come up with.
edit: bitwise AND it is.

## Thursday, 13 September 2012

### Calculating the lower bound of the bitwise OR of two bounded variables

What does that even mean?

Suppose you have the variables x in [a, b] and y in [c, d]. The question then is: what is the lowest possible value of x | y where x and y are both in their corresponding ranges. In other words, evaluate
$$\min _{x \in [a, b], y \in [c, d]} x | y$$ At a maximum of 264 iterations, direct evaluation is clearly not an option for 32-bit integers.

Fortunately, there is an algorithm that has a complexity linear in the number of bits, given by Warren in Hackers Delight, Propagating Bounds through Logical Operations, which the license permits me to show here:
unsigned minOR(unsigned a, unsigned b,
unsigned c, unsigned d) {
unsigned m, temp;

m = 0x80000000;
while (m != 0) {
if (~a & c & m) {
temp = (a | m) & -m;
if (temp <= b) {a = temp; break;}
}
else if (a & ~c & m) {
temp = (c | m) & -m;
if (temp <= d) {c = temp; break;}
}
m = m >> 1;
}
return a | c;
}

So let's break down what it's doing. It starts at the MSB, and then it searches for either the highest bit that is zero a and one in c such that changing a to have that bit set and all bits the right of it unset would not make the new a higher than b, or, the highest bit that is zero c and one in a such that changing c to have that bit set and all bits the right of it unset would not make the new c higher than d, whichever one comes first.

That's literally easier to code than to explain, and I haven't even explained yet why it works.
Suppose the highest such bit is found in a. Setting that bit in a does not affect the value of a | c, after all, that bit must have been set in c already so it was already set in a | c, too. However, resetting the bits to the right of that bit however can lower a | c. Notice that it is pointless to continue looking at lower bits - in a there are no more bits to reset, and for c there are no more bits that have the corresponding bit in a set.

Warren notes that m could start at 0x80000000 >> nlz(a ^ c) (where nlz is the "number of leading zeros" function), meaning it starts looking at the first bit that is different in a and c. But we can do better. Not only can we start at the first bit which is different in a and c, we could look at only those bits. That requires frequent invocation of the nlz function (or bsr, bit scan reverse, giving the index of the leftmost bit), but it maps to a fast instruction on many platforms.
uint minOR(uint a, uint b, uint c, uint d)
{
uint bits = a ^ c;
while (bits != 0)
{
// get the highest bit
uint m = 1u << (nlz(bits) ^ 31);
// remove the bit
bits ^= m;
if ((a & m) == 0)
{
uint temp = (a | m) & -m;
if (temp <= b) { a = temp; break; }
}
else
{
uint temp = (c | m) & -m;
if (temp <= d) { c = temp; break; }
}
}
return a | c;
}
One interesting consequence of looking only at the bits that are different is that the second if disappears - the case where the bits are equal is ruled out by looking only at the different bits in the first place.

But that is not all. The bit positions at which the <= operators could return true, are precisely all those at and to the right of one important point: the highest set bit in a ^ b (or c ^ d for the other bound). Why? Well the upper bounds are not lower than the lower bounds, so the first bit at which they differ must be the first position at which the lower bound has a zero where the upper bound has a one. Setting that bit to one and all bits to the right to zero in the lower is clearly valid (ie doesn't make it higher than the upper bound), but whether that bit can actually be set depends on the other lower bound as well.

What that means in practical terms, is that the value of m that first passes the tests is directly computable. No loops required. Also, because the test to check whether the new bound is still less than or equal to the upper bound isn't necessary anymore (by construction, that test always passes), the bit doesn't even have to be set anymore - without the test the new value isn't really needed, and the entire idea was that setting that bit would not change the result, so setting it is pointless.
uint minOR(uint a, uint b, uint c, uint d)
{
uint settablea = (a ^ b) == 0 ? 0 : 0xFFFFFFFF >> nlz(a ^ b);
uint settablec = (c ^ d) == 0 ? 0 : 0xFFFFFFFF >> nlz(c ^ d);
uint candidatebitsa = (~a & c) & settablea;
uint candidatebitsc = (a & ~c) & settablec;
uint candidatebits = candidatebitsa | candidatebitsc;

uint target = candidatebits == 0 ? 0 : 1u << bsr(candidatebits);
uint targeta = c & target;
uint targetc = a & target;

uint newa = a & ~(targeta == 0 ? 0 : targeta - 1);
uint newc = c & ~(targetc == 0 ? 0 : targetc - 1);
return newa | newc;
}
Sadly, there's an awful lot of conditionals in there, which could be branches. But they could also be conditional moves. And on x86 at least, both bsr and lzcnt set a nice condition flag if the input was zero, so it's really not too bad in practice. It is, in my opinion, a pity that there aren't more instruction to deal with leftmost bits, while instruction that deal with the rightmost bit are being added left and right. They are nice, I will admit, but the rightmost bit could already be efficiently dealt with, while the leftmost bit is somewhat problematic.

Next post, well, I'm not sure yet. I'll work on something similar for the upper bound instead of the lower bound, but who knows.
edit: As it turns out, the upper bound was fairly easy to do as well.

### Divisibility and modular multiplication, even divisors

As promised, I will now expand the divisibility testing by modular multiplication algorithm to handle even divisors.

Recall that a number y that has a rightmost bit can be written as y = d * 2n where d is odd. A number x is divisible by y = d * 2n iff it is divisible by 2n and by d. And both of those problems have already been solved in earlier posts, so:
bool IsDivisibleBy(uint32_t x, uint32_t divisor)
{
uint32_t poweroftwo = divisor & -divisor;
uint32_t d = divisor >> bsf(divisor);
return (x & (poweroftwo - 1)) == 0 &&
(d == 1 || IsDivisibleByOdd(x, d));
}

Pretty straightforward. Except perhaps the d == 1 bit. Recall that IsDivisibleByOdd doesn't want the divisor to be one, so that case has to be avoided. And if d is one, that means the divisor was a power of two. It even works if divisor is one; poweroftwo would also be one, and x & 0 is clearly always zero.

And bsf is not defined. The implementation would be strongly platform dependent, and not particularly enlightening.

Now, on to the performance part. Does this help? The answer is the same as last time - no, usually not. Except in some select cases, such as when you need to test a whole array for divisibility by the same number, which is not a compile-time constant.

Added on 29 January 2014: as Jasper Neumann pointed out, if right-rotate can be used, the part that tests whether the lower bits are zero can be merged with the other test, as described in, for example, Hacker's Delight chapter 10-16 (Test for Zero Remainder after Division by a Constant) and gmplib.org/~tege/divcnst-pldi94.pdf.

That concludes the divisibility series (for now, anyway). Next post, something completely different.

### The basics of working with the rightmost bit

note: I rewrote this post because of its popularity.

In this post I will assume that the reader knows the bitwise operators and what they do (if not, see Low Level Bit Hacks You Absolutely Must Know), to avoid having to cover the basics.

The rightmost bit (not to be confused with the least-significant bit), also called "rightmost 1", is the lowest bit that is set (rightmost zero is the lowest bit that is not set). So zero, unlike all other numbers, doesn't have a rightmost bit. The rightmost bit is interesting because surprisingly many useful operations can be done on it.

Here's a small selection of potentially useful basic "rightmost bit/zero operations":

 x - 1 Remove the rightmost bit and smear it to the right.(zero is interpreted as having a rightmost bit just beyond the msb) x & (x - 1) Remove the rightmost bit. x & -x Isolate the rightmost bit. x | (x + 1) Set the rightmost zero. x | ~(x + 1) Isolate the rightmost zero (as an inverted mask).

# How it works

Manipulation of the rightmost bit makes use of the properties of the carry/borrow process, namely that it propagates from the lsb to the msb, changes the bits it touches, and can be stopped. For example, the simplest operation operation, x - 1 just runs a borrow through the zeroes on the right, changing all of them to ones, the borrow is stopped by the rightmost one (which is changed to a zero). Effectively it inverted the part of the number that includes the rightmost bit and spans to the right, and left the rest alone. ANDing that number with x (as in the "remove the rightmost bit" operation) then sets that rightmost part to all zeroes (because x & ~x = 0) and leaves the rest of the number alone (because x & x = x).

Since -x = ~(x - 1), clearly negation is a kind of "opposite" of subtracting 1; the rightmost part (including the rightmost 1) is not change, and the leftmost part is changed. So x & -x also gives the opposite thing of x & (x - 1), namely just the rightmost bit (instead of everything else).

The other two operations from the table can be derived by taking ~operation(~x) and simplifying it:

~(~x & (~x - 1)) =
// use the definition of subtraction: a - b = ~(~a + b)
~(~x & ~(x + 1)) =
// use De Morgan's law
x | (x + 1)
~(~x & -~x) =
// use the definition of negation: -a = ~(a - 1)
~(~x & ~(~x - 1)) =
// use De Morgan's law
x | (~x - 1) =
// use the definition of subtraction: a - b = ~(~a + b)
x | ~(x + 1)

Using the same principles, more complicated operations can be constructed. For example (by chaining two operations on the rightmost bit), by first smearing the rightmost 1 to the right, the rightmost run of ones is now in a good position to get rid of it (by adding one and ANDing):
(x | (x - 1)) + 1 & x

The same can also be accomplished differently (no better, just different), by instead of smearing the rightmost bit to the right so that it can be affected by the +1, adding a big enough number - that number is, of course, the isolated rightmost bit, so we get this:
x + (x & -x) & x

A good overview of the basic rightmost-bit operations can be found here.

Next time, why rightmost bits are relevant in testing divisibility by even numbers.

### Divisibility and modular multiplication

An other post from CodeProject (with small changes). I promise I'll post something new next time.

Typically, all computer math is done modulo 232 (or 2something, which the rest of my post trivially generalizes to). This leads to sometimes surprising effects, such as that the sum of a and b, while both positive, can end up being lower than min(a, b), which is rather well-known and known as "overflow" (often treated like something bad, which it can be). Less well known is that it also means that some numbers have a multiplicative inverse, ie a number x-1 such that x-1x=1. As mentioned in the wikipedia article, the numbers which have multiplicative inverses are precisely those coprime to the modulo. The modulo is a power of two, which means that a number has a multiplicative inverse iff it is odd.

And it turns out to be actually useful, too. One application is, as you might guess from the title, divisibility testing.

Multiplying a number by an odd number is reversible - you can take the multiplicative inverse of the odd number and then multiply by it to get the original number back. Put differently, the function f(x) = x * k (for k odd) is a bijection.

Modular multiplication is associative, so a multiple of k, say n * k multiplied by inv(k) (the multiplicative inverse of k), is n, because

(n * k) * inv(k) =
// use associativity
n * (k * inv(k)) =
// use definition of multiplicative inverse
n * 1 =
// multiplicative identity
n
That means that in x * inv(k), the multiples of k "use up" the results from 0 through (232-1)/k, they can't be used twice because it's a bijection, leaving just the numbers bigger than (232-1)/k for non-multiples-of-k. Which suggest a very simple divisibility test:
static bool IsDivisibleByOdd(uint x, uint divisor)
{
if ((divisor & 1) == 0)
throw new ArgumentException("divisor must be odd");
uint d_inv = inv(divisor);
uint biggest = uint.MaxValue / divisor;  // problem right here
return (x * d_inv) <= biggest;
}

static uint inv(uint d)
{
// see Hacker's Delight,
// Computing the Multiplicative Inverse by Newton's Method
// use extra iteration when extending to 64 bits
uint x = (d * d) + d - 1;
uint t = d * x;
x *= 2 - t;
t = d * x;
x *= 2 - t;
t = d * x;
x *= 2 - t;
return x;
}
And there is a problem. It didn't avoid the division, it turned it into an other division. So this doesn't actually help - in fact it's slower than the usual way. The left hand side of the division is a constant though, so unlike in the usual x % divisor == 0, the slow operation (division and modulo are slow relative to multiplication) can be done at compile time. Unfortunately, that doesn't actually help - it's one of the most common optimizations that compilers do, so it already happened automatically.

But. And there is a but. If you want to test for a lot of number whether they are divisible by some odd number1 that doesn't vary in that test, you only have one division, one inv, and the cost per item is only a multiplication and a test. What's more, no compiler I know of automatically does this for you, so this can actually save time.

1. Except 1, but why would you test for divisibility by 1? I will extend this method to even numbers after the next post, which will cover some concept (bits again, I haven't forgotten about them) that are necessary for it.

## Wednesday, 12 September 2012

### Divisibility and digital roots

To start off this blog, I'll start with a subject that I wrote about on CodeProject, divisibility and digital roots.

It is well known that a number is divisible by 9 if and only if its digital root is 9. Less well known is that a similar trick kind applies to numbers other than 9, but doesn't really work out.

In order to make this trick "work" (I'll get to why it sometimes doesn't) for number k, the digit at position i has to be multiplied by base^i - [the biggest multiple of k <= base^i] before adding it to the (modified) digital sum.

For example for k = 7, base = 10, you'd multiply the ones position by 3, the tens position by 2, the hundreds position by 6, and so forth (3, 2, 6, 4, 5, 8, then it repeats).

It does transform every multiple of 7 into a multiple of 7 (and every non-multiple-of-7 into a non-multiple-of-7), but it can be the same number, for example 14: 3 * 4 + 2 * 1 = 14, or it can even be a bigger number, for example 9.

But we're programmers, so the base isn't 10. It can be 16. 6 * 6 = 36, so every (positive integer) power of 16 ends in a 6, which means that the nearest lower multiple of 5 is only 1 away. So for k = 5, it works out to a factor of 1 at every position.

Even better, 16^n-1 is divisible by 15, so for base 16, k = 15 works out well too, with a factor of 1 at every position. This leads to the following algorithm:

static bool IsMultipleOf15(int x)
{
// lookup table to speed up last step
const ulong lookuptable = 0x1000200040008001;
int t = (x & 0x0F0F0F0F) + ((x & unchecked((int)0xF0F0F0F0)) >> 4);
t = (t & 0x001F001F) + ((t & 0x1F001F00) >> 8);
t = (t & 0x0000003F) + ((t & 0x003F0000) >> 16);
t = (t & 0xF) + ((t & 0x70) >> 4);
return ((lookuptable >> t) & 1) != 0;
}
15, of course, has factors 3 and 5, so the same code works to test for divisibility by 3 or 5 just by changing the lookup table to 0x9249249249249249 or 0x1084210842108421, respectively (the two of those ANDed together gives the lookup table for 15, of course). I haven't encountered a situation where this is useful; modulo by a constant is optimized by every sane compiler so this is never an optimization, just a curiosity (or perhaps something to torture interviewees with).

In the next post, I'll cover a divisibility testing algorithm that is actually useful.