Tuesday, 3 January 2023

Weighted popcnt

Since I showed the code below on Twitter, and some people understood it and some didn't, I suppose I should explain how it works and what the more general technique is.

// sum of indexes of set bits
int A073642(uint64_t n)
{
    return __popcnt64(n & 0xAAAAAAAAAAAAAAAA) +
          (__popcnt64(n & 0xCCCCCCCCCCCCCCCC) << 1) +
          (__popcnt64(n & 0xF0F0F0F0F0F0F0F0) << 2) +
          (__popcnt64(n & 0xFF00FF00FF00FF00) << 3) +
          (__popcnt64(n & 0xFFFF0000FFFF0000) << 4) +
          (__popcnt64(n & 0xFFFFFFFF00000000) << 5);
}

A sum of several multi-digit numbers, such as 12 + 34, can be rewritten to make the place-value notation explicit, giving 1*10 + 2*1 + 3*10 + 4*1. Then the distributive property of multiplication can be used to group digits of equal place-value together, giving (1 + 3)*10 + (2 + 4)*1. I don't bring up something so basic as an insult to the reader, the basis of that possibly-intimidating piece of code really is this basic.

The masks 0xAAAAAAAAAAAAAAAA, 0xCCCCCCCCCCCCCCCC, etc, represent nothing more than the numbers 0..63, each written vertically in binary, starting with the least significant bit. Each column of bits contains its own index. AND-ing the masks with n leaves, in each column, either zero (if the corresponding bit of n was zero) or the index of that column.

The first popcnt then adds together all the digits of the "filtered" indexes with a place-value of 1, the second popcnt adds together the digits with a place-value of 2, and so on. The result of each popcnt is shifted to match the corresponding place-value, and all of them are added together.

Generalizing the technique

The numbers in the columns did not have to be 0..63, the columns can contain arbitrary numbers, even negative numbers. This gives some sort of weighted popcnt: rather than each bit having a weight of 1, we are free to choose an arbitrary weight for each bit.

In general, starting with some arbitrary numbers, interpret them as a binary matrix. Transpose that matrix. Every row of the transposed matrix is the mask that goes into the corresponding popcnt-step.

Some simplifications may be possible:

  • If a row of the matrix is zero, the corresponding step can be left out.
  • If two or more rows are equal, their popcnt-steps can be merged into one with a weight that is the sum of the weights of the steps that are merged.
  • If a row is a linear combination of some set of other rows, the popcnt corresponding to that row can be computed as a linear combination of the popcnts corresponding to those other rows.
  • If a row has exactly one bit set, its step can be implemented with an AND and a shift, without popcnt, by shifting the bit to its target position.
  • If every row has exactly one bit set, we're really dealing with a bit-permutation and there are better ways to implement them.

For example, let's construct "the sum of squares of indexes of set bits" (with 1-indexed indexes) aka oeis.org/A003995. Transposing the squares 1,2,4,9..4096 gives these masks:

0x5555555555555555
0x0000000000000000
0x2222222222222222
0x1414141414141414
0x0d580d580d580d58
0x0335566003355660
0x00f332d555a66780
0x555a5b6666387800
0x66639c78783f8000
0x787c1f807fc00000
0x7f801fff80000000
0x7fffe00000000000
0x8000000000000000

The second mask is zero because a square cannot be congruent to 2 or 3 modulo 4. The last mask has exactly one bit set, so its step can be implemented without popcnt. Putting it together:

int A003995(uint64_t n)
{
    return __popcnt64(n & 0x5555555555555555) +
          (__popcnt64(n & 0x2222222222222222) << 2) +
          (__popcnt64(n & 0x1414141414141414) << 3) +
          (__popcnt64(n & 0x0d580d580d580d58) << 4) +
          (__popcnt64(n & 0x0335566003355660) << 5) +
          (__popcnt64(n & 0x00f332d555a66780) << 6) +
          (__popcnt64(n & 0x555a5b6666387800) << 7) +
          (__popcnt64(n & 0x66639c78783f8000) << 8) +
          (__popcnt64(n & 0x787c1f807fc00000) << 9) +
          (__popcnt64(n & 0x7f801fff80000000) << 10) +
          (__popcnt64(n & 0x7fffe00000000000) << 11) +
          ((n & 0x8000000000000000) >> 51);
}

Aside from implementing strange sequences from the OEIS, possible uses of this technique may include

  • Summing the face-values of a hand of cards. Unfortunately this cannot take "special combinations" into account, unlike other techniques.
  • A basic evaluation of a board state in Reversi/Othello, based on weights for each captured square that differ by their position.
  • Determining the price for a pizza based on a set of selected toppings, I don't know, I'm grasping at straws here.