Bits, Math and Performance(?): Tesseral arithmetic

Introductions are boring, feel free to skip to the interesting stuff

Tesseral arithmetic is a type of arithmetic that operates on interleaved coordinates. That may not seem very useful, so first, when would you want to do that?

The Z-order curve is a space-filling curve (also known as Morton order, Morton coordinates, etc) that is closely related to quad trees (and octrees) and (in some contexts) improves the locality of reference when working with multidimensional data.

In essence, it maps multidimensional coordinates to single-dimensional coordinates, which can be used to address memory, and it does so in a way that sometimes leads to better locality of reference than concatenating the parts of a coordinate into a longer one. The trick is to interleave the bits. While that is not the best (ie. optimal locality of reference) mapping, but it's interesting that it works so well for such a simple trick.

But where it really gets interesting is when you have interleaved coordinates and you want to do math with them. You could unpack them, do your math, and then repack, but if you follow the previous link you can see that while unpacking and packing are simple and fast relative to the mappings of other space-filling curves, unpacking and packing would add a lot of overhead to what would otherwise be simple math.

That's where tesseral arithmetic comes in.

Bitwise AND, OR and XOR still work the same way, because the bits of the result only depend on the corresponding bits in the inputs. Shifts are simple - the shift count must be multiplied by two. So for example x ^ (x << 1) becomes x ^ (x << 2) in tesseral arithmetic.

Addition is more trouble. The carries in normal addition propagate into bits they shouldn't be affecting in tesseral arithmetic. But consider what would happen if the bit pairs at odd positions would each sum to 1. A carry coming into an odd position would always be passed on, and no extra carries would be generated from odd positions. So if the bits at odd positions are just right, the bits at the even positions are summed tesserally, with the carry moving two places instead of one. Obviously this extends to the odd bits as well, when the bits at even positions are just right. This actually makes tesseral addition quite simple:

static uint TesseralAdd(uint z, uint w)
{
    uint xsum = (z | 0xAAAAAAAA) + (w & 0x55555555);
    uint ysum = (z | 0x55555555) + (w & 0xAAAAAAAA);
    return (xsum & 0x55555555) | (ysum & 0xAAAAAAAA);
}

Unsurprisingly, the same principle applies to subtraction. In subtraction, borrows are passed on unmodified through a pair of bits if they sum to zero, or in other words, if both are zero. In a way that's conceptually even simpler than addition.

static uint TesseralSubtract(uint z, uint w)
{
    uint xdiff = (z & 0x55555555) - (w & 0x55555555);
    uint ydiff = (z & 0xAAAAAAAA) - (w & 0xAAAAAAAA);
    return (xdiff & 0x55555555) | (ydiff & 0xAAAAAAAA);
}

But multiplication isn't that nice. The problem is that multiplication is basically build out of a lot of shifts and additions (it's not implemented that way in hardware anymore) and the additions aren't tesseral nor can they be made tesseral.
Unless, of course, we implement multiplication in software:

static uint TesseralMultiply(uint z, uint w)
{
    uint x = z & 0x55555555;
    uint y = w & 0x55555555;
    uint xres = 0;
    while (x != 0)
    {
        if ((x & 1) != 0)
            xres = (xres | 0xAAAAAAAA) + y;
        y <<= 2;
        x >>= 2;
    }

    x = z & 0xAAAAAAAA;
    y = w & 0xAAAAAAAA;
    uint yres = 0;
    while (x != 0)
    {
        if ((x & 2) != 0)
            yres = (yres | 0x55555555) + y;
        y <<= 2;
        x >>= 2;
    }

    return (xres & 0x55555555) | (yres & 0xAAAAAAAA);
}

But that doesn't achieve the goal of being faster than unpacking, doing math, and repacking. If anyone has a better idea, please let me know.

So ok, no tricks multiplication or division. But we're not done. As I hinted in my previous post, many bitwise tricks extend to tesseral arithmetic. For example, taking the absolute value of both parts of the coordinate simultaneously, using the same trick as in my previous post (working with the signbit). The basic principle is simple: replace all operations by their tesseral counterparts. Then look for simplifications and other improvements.

static uint TesseralAbs(uint z)
{
    uint maskx = (uint)((int)z << 1 >> 31);
    uint masky = (uint)((int)z >> 31);

    // this is a simplified tesseral addition (followed by a xor)
    uint xabs = (z & 0x55555555) + maskx ^ maskx;
    uint yabs = (z & 0xAAAAAAAA) + masky ^ masky;

    return (xabs & 0x55555555) | (yabs & 0xAAAAAAAA);
}

The mask is known to be either all ones or all zeroes. It may seem at first as though that means we'd have to OR it with something to make the "in between" bits sum to one, but when the mask is zero there are no carries to pass on anyway. So the OR can be skipped.

But calculating absolute values of coordinates doesn't happen that often. So let's calculate an element-wise minimum, using the same basic principle as before, replace normal operators by tesseral operators. This time however, a substantial improvement over the non-tesseral version is possible.

static uint TesseralMin(uint z, uint w)
{
    // these are tesseral subtractions, of course
    uint xdiff = (z & 0x55555555) - (w & 0x55555555);
    uint ydiff = (z >> 1 & 0x55555555) - (w >> 1 & 0x55555555);

    uint maskx = (uint)((int)xdiff >> 31);
    uint masky = (uint)((int)ydiff >> 31);

    uint xmin = (maskx & z) | (~maskx & w);
    uint ymin = (masky & z) | (~masky & w);

    return (xmin & 0x55555555) | (ymin & 0xAAAAAAAA);
}

And there's something very nice about how that worked out. In the normal min, there was a problem with overflow. That doesn't happen here, because for xdiff there was an extra bit anyway, and for ydiff that extra bit could easily be arranged by shifting right by 1. That makes the comparison unsigned, though, because the "extra bit" is zero, not a sign-extended bit.

So that's it for this post. Many other bitwise tricks can be extended to tesseral math, using the same basic principle. And of course this all generalizes to higher dimensions as well.

In the next post, I'll have some more useful snippets for tesseral arithmetic.

There are some other references for this type of arithmetic or its generalizations, for example The Art of Computer Programming volume 4A which calls this "working with fragmented fields" and Morton-order Matrices Deserve Compilers’ Support which calls this the "algebra of dilated integers".

By the way I originally wrote this post thanks to (or maybe due to?) this article, which I found by searching how to do coordinate arithmetic in a quad tree with Morton order. That's where the title comes from. Unfortunately the article didn't really say how to actually do it, so I worked that out (though the algebra of dilated integers had been explored before, I did not know it went by that name) and posted it for the benefit of other people who perhaps traversed the same steps up to that point.

Tuesday, 27 November 2012

Tesseral arithmetic