C++17-Style Hex Floats (And How To Parse Them) [JoshJers' Ramblings]

2024-12-30

C++17-Style Hex Floats (And How To Parse Them)

C++17 added support for hex float literals, so you can put more bit-accurate floating point values into your code. They’re handy to have, and I wanted to be able to parse them from a text file in a C# app I was writing.

Some examples:

0x1.0p0         // == 1.0
0x1.8p1         // == 3
0x8.0p-3        // == 1.0
0x0.8p1         // == 1.0
0xAB.CDEFp-10   // == 0.16777776181697846
0x0.0000000ABp0 // == 2.4883775040507317E-09

What Am I Even Looking At Here?

(Before reading this, if you don’t have a good feel for how floats work, maybe check my previous post about floats (and float rounding))

I had a bit of a mental block on this number format for a bit - like, what does it even mean to have fractional hex digits? But it turns out it’s a concept that we already use all the time and my brain just needed some prodding to make the connection.

With our standard base 10 numbers, moving the decimal point left one digit means dividing the number by 10:

12.3 == 1.23 * 10^1 == 0.123 * 10^2

Hex floats? Same deal, just in 16s instead:

0x1B.C8 == 0x1.BC8 * 16^1 == 0x0.1BC8 * 16^2

Okay, so now what’s the “p” part in the number? Well, that’s the start of the exponent. A standard float has an exponent starting with ‘e’:

1.3e2 == 1.3 * 10^2

But ‘e’ is a hex digit, so you can’t use ‘e’ anymore as the exponent starter, so they chose ‘p’ instead (why not ‘x’, the second letter? Probably because a hex number starts with ‘0x’, so ‘x’ also already has a use - but ‘p’ is free so it wins)

The exponent for a hex float is in powers of 2 (so it corresponds perfectly to the exponent as it is stored in the value), so:

0x1.ABp3 == 0x1.AB * 2^3

So that’s how a hex float literal works! Here’s a quick breakdown:

<hex-digits>[ '.' [fractional-hex-digits]] 'p' <exponent>

(Parsing code below the fold)

Okay Now What’s This About Parsing Them?

C++ conveniently has functions to parse these (std::strtod/strtof handle this nicely). However, if you’re (hypothetically) making a parser, and you happen to be writing it in C# which does not have an inbuilt way to parse these, then you’ll have to parse your own.

It ended up being a little more complicated than I thought for a couple reasons.

Arbitrarily long hex strings seem to be supported, which means you need to track both where the top-most non-zero hex value starts (i.e. skip any leading zeros) but also properly handle bits that are extra tiny which may affect rounding.
In order to properly handle said rounding, running through the hex digits in reverse and pushing them in from the top ends up being a nice strategy, because float rounding works via a sticky bit that stays set as things right-shift down through it.

Ultimately I ended up with the following algorithm (in C#) to parse a hex float, which I’m definitely sure is ~perfect~ and has absolutely no bugs whatsoever. It is also absolutely the most efficient version of this possible, with no room for improvement. Yep.

I’m throwing it in here in case anyone ever finds it useful.

static bool IsDigit(char c) => (c >= '0' && c <= '9');
static bool IsHexDigit(char c) 
  => IsDigit(c) 
    || (c >= 'A' && c <= 'F') 
    || (c >= 'a' && c <= 'f');

double ParseHexFloat(string s)
{ 
  // This doesn't handle a negative sign, if only because the parser I have only
  // needed to support positive values, but it'd be easy to add
  if (s.Length < 2 || s[0] != '0' || char.ToLowerInvariant(s[1]) != 'x')
    { throw new FormatException("Missing 0x prefix"); }

  if (s.Length < 3 || !IsHexDigit(s[2]))
    { throw new FormatException(
        "Hex float literal must contain at least one whole part digit"); }
    
  int i = 2;
  int decimalPointIndex = -1;
  int firstNonZeroHexDigitIndex = -1;

  // Scan through our digits, looking for the index of the first set (non-zero)
  //  hex value and the decimal point (if we have one).
  while (i < s.Length && (IsHexDigit(s[i]) || s[i] == '.'))
  {
    if (s[i] == '.')
    {
      // Found the decimal point! Hopefully there wasn't already one!
      if (decimalPointIndex >= 0)
        { throw new FormatException("Too many decimal points"); }

      decimalPointIndex = i;
    }
    else if (s[i] != '0' && firstNonZeroHexDigitIndex < 0)
      { firstNonZeroHexDigitIndex = i; } // Here's our top-most set hex value.

    i++;
  }

  // Also make a note of where our last hex digit was (usually the digit before
  //  the 'p' that we should be at right now)
  int lastHexDigitIndex = i - 1;

  // ... but if the previous character was the decimal point, the last hex
  //  digit is before that.
  if (lastHexDigitIndex == decimalPointIndex)
    { lastHexDigitIndex--; }

  // If we didn't find a decimal point, it's EFFECTIVELY here, at the 'p'
  if (decimalPointIndex < 0)
    { decimalPointIndex = i; }

  // Validate and skip the 'p' character
  if (i >= s.Length || char.ToLowerInvariant(s[i]) != 'p')
    { throw new FormatException("Missing exponent 'p'"); }
  i++;

  // Grab the sign if we have one
  bool negativeExponent = false;
  if (i < s.Length && (s[i] == '+' || s[i] == '-'))
  {
    negativeExponent = (s[i] == '-');
    i++;
  }

  if (i >= s.Length)
    { throw new FormatException("Missing exponent digits"); }

  // Parse the exponent!
  int exponent = 0;
  while (i < s.Length && IsDigit(s[i]))
  {
    if (int.MaxValue / 10 < exponent)
      { throw new FormatException("Exponent overflow"); }
    
    exponent *= 10;
    exponent += (int)(s[i] - '0');
    i++;
  }

  if (negativeExponent)
    { exponent = -exponent; }

  // If we had no non-zero hex digits, there's no point in continuing, it's
  //  zero. 
  if (firstNonZeroHexDigitIndex < 0)
    { return 0.0; }
    
  if (i != s.Length)
    { throw new FormatException("Unexpected characters at end of string"); }
    
  // We have the supplied exponent, but we want to massage it a bit. In a
  //  (IEEE) floating-point value, the mantissa is entirely fractional - that
  //  is, the value is 1.mantissa * 2^(exponent) - there's an implied 1
  //  (excluding subnormal floats, which we'll handle properly, but we can
  //  ignore for the moment). Two things need to happen here:
  //   1. We need to adjust the exponent based on the position of the first
  //      non-zero hex digit, to match the fact that we're parsing hex digits
  //      such that the top hex digit is sitting in the top 4 bits of our 64-
  //      bit int.
  //   2. But we EXPECT a single bit to be above the mantissa (the implied
  //      1) so subtract 1 from our adjustment to take into account that
  //      there will be 4 bits in that hex, so if we had parsed a single "1"
  //      (from, say, 0x1p0, which just equals 1.0) our effective exponent
  //      should be 3 (which we will later shift back down to 0 to position
  //      the 1s bit at the very top)
  if (decimalPointIndex >= firstNonZeroHexDigitIndex)
    { exponent += ((decimalPointIndex - firstNonZeroHexDigitIndex) * 4) - 1; }
  else
    { exponent += ((decimalPointIndex - firstNonZeroHexDigitIndex + 1) * 4) - 1; }

  // Now that we have the exponent and know the bounds of our hex digits,
  //  we can parse backwards through the hex digits, shifting them in from
  //  the top. We do this so that we can easily handle the rounding to the
  //  final 53 bits of significand (by ensuring that we don't ever shift
  //  any 1s off the bottom)
  ulong mantissa = 0;
  for (i = lastHexDigitIndex; i >= firstNonZeroHexDigitIndex; i--)
  {
    // Skip the '.' if there was one.
    if (s[i] == '.')
      { continue; }

    char c = char.ToLowerInvariant(s[i]);
    ulong v = (c >= 'a' && c <= 'f') 
      ? (ulong)(c - 'a' + 10) 
      : (ulong)(c - '0');

    // Shift the mantissa down, but keep any 1s that happen to be in the bottom
    //  4 bits (this is a reasonably-efficient emulation of the "sticky bit"
    //  that is used to round a floating point number properly.
    mantissa = (mantissa >> 4) | (mantissa & 0xf);

    // Add in our parsed hex value, putting its 4 bits at the very top of the
    //  mantissa ulong.
    mantissa |= (v << 60);
  }

  // We know the mantissa is non-zero (checked earlier), and we want to position
  //  the highest set bit at the top of our ulong so shift up until the top bit
  //  is set (and adjust our exponent down 1 to compensate).
  while ((mantissa & 0x8000_0000_0000_0000ul) == 0)
  {
    mantissa <<= 1;
    exponent--;
  }

  const int DoubleExponentBias = 1023;
  const int MaxBiasedDoubleExponent = 1023 + DoubleExponentBias;
  const ulong MantissaMask = 0x000f_ffff_ffff_fffful;
  const ulong ImpliedOneBit = 0x0010_0000_0000_0000ul;
  const int ExponentShift = 52;
  const int MantissaShiftRight = sizeof(double)*8 - ExponentShift - 1;

  // Exponents are stored in a biased form (they can't go negative) so add our
  //  bias now.
  exponent += DoubleExponentBias;

  if (exponent <= 0)
  {
    // We have a subnormal value, which means there is no implied 1, so first
    //  we need to shift our mantissa down by one to get rid of the implied 1.
    //  (note that we're not letting any 1s shift off the bottom, keeping them
    //  sticky)
    mantissa = (mantissa >> 1) | (mantissa & 1);

    // Continue to denormalize the mantissa until our exponent reaches zero
    while (exponent < 0)
    {
      mantissa = (mantissa >> 1) | (mantissa & 1);
      exponent++;
    }
  }

  // Now to do the actual rounding of the mantissa. Sometimes floating point
  //  rounding needs 3 bits (guard, round, sticky) to do rounding, but in our
  //  case, two will suffice: one bit that represents the uppermost bit that
  //  shifts right off of the edge of the mantissa (i.e. the "0.5" bit) and
  //  then "literally any 1 bit underneath that" (which is why we've been
  //  holding on to extra 1s when shifting right) that is the tiebreaker
  bool roundBit      = (mantissa & 0b10000000000) != 0;
  bool tiebreakerBit = (mantissa & 0b01111111111) != 0;

  // Now that we have those bits, we can shift our mantissa down into its
  //  proper place (as the lower 53 bits of our 64-bit ulong).
  mantissa >>= MantissaShiftRight;
  if (roundBit)
  {
    // If there's a tiebreaker, we'll increment the mantissa. Otherwise,
    //  if there's a tie (could round either way), we round so that the
    //  mantissa value is even (lowest bit in the double is 0)
    if (tiebreakerBit || (mantissa & 1) != 0)
      { mantissa++; }

    // If we have a subnormal float we may have overflowed into the implied 1
    //  bit, otherwise we might have overflowed into the ... I guess the
    //  implied *2* bit?
    ulong overflowedMask = ImpliedOneBit << ((exponent == 0) ? 0 : 1);
    if ((mantissa & overflowedMask) != 0)
    {
      // Shift back down one. This is not going to drop a 1 off the bottom
      //  because if we overflowed it means we were odd, and added one to
      //  become even.
      exponent++;
      mantissa >>= 1;
    }
  }

  // It's possible that the truncation we ended up with a 0 mantissa after all,
  //  so our final value has rounded allll the way down to 0.
  if (mantissa == 0)
    { return 0.0; }

  // If our exponent is too large to be represented, this value is infinity.
  if (exponent > MaxBiasedDoubleExponent)
    { return double.PositiveInfinity; }

  // Mask off the implied one bit (if we have one)
  mantissa &= ~ImpliedOneBit;

  // Alright assemble the final double's bits, which means shifting and
  //  adding the exponent into its proper place.
  //  (if we had a sign to apply we'd apply it to the top bit). 
  ulong assembled = mantissa | (((ulong)exponent) << ExponentShift);
  double result = BitConverter.UInt64BitsToDouble(assembled);
  return result;
}