Saturday 10 July 2021

Integer promotion does not help performance

There is a rule in the C language which roughly says that arithmetic operations on short integer types implicitly convert their operands to normal-sized integers, and also give their result as a normal-sized integer. For example in C:

If an int can represent all values of the original type (as restricted by the width, for a bit-field), the value is converted to an int; otherwise, it is converted to an unsigned int. These are called the integer promotions. All other types are unchanged by the integer promotions.

Various other languages have similar rules. For example C#, the specification of which is not as jargon-infested as the C specifiction:

In the case of integral types, those operators (except the ++ and -- operators) are defined for the int, uint, long, and ulong types. When operands are of other integral types (sbyte, byte, short, ushort, or char), their values are converted to the int type, which is also the result type of an operation.

There may be various reasons to include such a rule in a programming language (and some reasons not to), but one that is commonly mentioned is that "the CPU prefers to work at its native word size, other sizes are slower". There is just one problem with that: it is based on an incorrect assumption about how the compiler will need to implement "narrow arithmetic".

To give that supposed reason the biggest benefit that I can fairly give it, I will be using MIPS for the assembly examples. MIPS completely lacks narrow arithmetic operations.

Implementing narrow arithmetic as a programmer

Narrow arithmetic is often required, even though various languages make it a bit cumbersome. C# and Java both demand that you explicitly convert the result back to a narrow type. Despite that, code that needs to perform several steps of narrow arithmetic is usually not littered with casts. The usual pattern is to do the arithmetic without intermediate casts, then only in the end use one cast just to make the compiler happy. In C, even that final cast is not necessary.

For example, let's reverse the bits of a byte in C#. This code was written by Igor Ostrovsky, in his blog post Programming job interview challenge. It's not a unique or special case, and I don't mean that negatively: it's good code that anyone proficient could have written, a job well done. Code that senselessly casts back to byte after every step is also sometimes seen, perhaps because in that case, the author does not really understand what they are doing.

// Reverses bits in a byte 
static byte Reverse(byte b)
{
    int rev = (b >> 4) | ((b & 0xf) << 4);
    rev = ((rev & 0xcc) >> 2) | ((rev & 0x33) << 2);
    rev = ((rev & 0xaa) >> 1) | ((rev & 0x55) << 1); 

    return (byte)rev;
}

Morally, all of the operations in this function are really narrow operations, but C# cannot express that. A special property of this code is that none of the intermediate results exceed the limits of a byte, so in a language without integer promotion it could be written in much the same way, but without going through int for the intermediate results.

Implementing narrow arithmetic as a compiler

The central misconception that (I think) gave rise to the myth that integer promotion helps performance, is the assumption that without integer promotion, the compiler must implement narrow operations by inserting an explicit narrowing operation after every arithmetic operation. But that's not true, a compiler for a language that lacks integer promotion can use the same approach that programmers use to implement narrow arithmetic in languages that do have integer promotion. For example, what if two bytes were added together (from memory, and storing the result in memory) in a hypothetical language that lacks integer promotion, and what if that code was compiled for MIPS? The assumption is that it will cost an additional operation to get rid of the "trash bits", but it does not:

        lbu     $2,0($4)
        lbu     $3,0($5)
        addu    $2,$2,$3
        sb      $2,0($4)

The sb instruction does not care about any "trash" in the upper 24 bits, those bits simply won't be stored. This is not a cherry-picked case. Even if there were more arithmetic operations, in most cases the "trash" in the upper bits could safely be left there, being mostly isolated from the bits of interest by the fact that carries only propagate from the least significant bit up, never down. For example, let's throw in a multiplication by 42 and a shift-left by 3 as well for good measure:

        lbu     $6,0($4)
        li      $3,42
        lbu     $2,0($5)
        mul     $5,$3,$6
        addu    $2,$5,$2
        sll     $2,$2,3
        sb      $2,0($4)

What is true, is that before some operations, the trash in the upper bits must be cleared. For example before a division, shift-right, or comparison. That is not an exhaustive list, but the list of operations that require the upper bits to be clean is shorter (and have less "total frequency") than the list of operations that do not require that, see for example which 2's complement integer operations can be used without zeroing high bits in the inputs, if only the low part of the result is wanted? "Before some operations" is not the same thing as "after every operation", but that still sounds like an additional cost. However, the trash-clearing operations that a compiler for a language that lacks integer promotion would have to insert are not additional operations: they are the same ones that a programmer would write explicitly in a language with integer promotion.

It may be possible to construct a contrived case in which a human would know that the high bits of an integer are clean, while a compiler would struggle to infer that. For example, a compiler may have more trouble reasoning about bits that were cancelled by an XOR, or worse, by a multiplication. Such cases are not likely to be the reason behind the myth. A more likely reason is that many programmers are not as familiar with basic integer arithmetic as they perhaps ought to be.

So integer promotion is useless?

Integer promotion may prevent accidental use of narrow operations where wide operations were intended, whether that is worth it is another question. All I wanted to say with this post, is that "the CPU prefers to work at its native word size" is a bogus argument. Even when it is true, it is irrelevant.