Beware of the sign bit

Signedness can trip you up in unexpected ways.

One example of this is the fairly innocuous islower function in the C standard library. This function takes an int and returns a value indicating to you whether it represents a lowercase or uppercase character.

The prototype for this function is:

int islower(
   int c 
);

Unfortunately, due to how C works, this code is likely to introduce subtle bugs depending on the input you give it. For instance, if you have a char array and iterate through it checking if each character is lowercase, you might naively write a program like so:

char ch[] = "...";
for (int = 0; ch[i]; i++)
{
 if (islower(ch[i]))
 {
   ...
 }
}

This code has a nasty bug in it, though, in that if your compiler defaults to char as an 8-bit signed value (most every mainstream compiler on mainstream platforms does nowadays), and if you are given a character value that has more than 7 significant bits (say, 150), you will go off into undefined-behavior-land because the compiler will sign extend ch[i] to a negative int value of -150 instead of an int value of 150. Depending on the implementation of islower, this could have various different (bad) effects; for the Microsoft C implementation, islower indexes into an array based on the given argument, so you’ll underrun an array and get garbage results back.

3 Responses to “Beware of the sign bit”

  1. dispensa says:

    s/islower/tolower/ or the reverse…

  2. Andrew Rogers says:

    Note that 150 == -106 (mod 256) == 0x96, and that -150 (mod 256) == 106 == 0x6A.

    When you sign-extend 0x96, as the top bit is set, you get 0xFF96, 0xFFFFFF96, 0xFFFFFFFFFFFFFF96, etc. these correspond with signed integer values of -106, *not* -150 (which would be 0xFF6A, 0xFFFFFF6A, etc.). -150 as a char is 0x6A, so would be sign-extended to 0x6A no matter how wide you made it!

    What happens with 150 is still just as bad, but this is just more precise on how it’s bad… ;-)