Number Representation

By Alexander Yee

 

(Last updated: February 4, 2019)

 

Back To:

 

Large Integers

 

Unsigned Integers:

 

y-cruncher represents unsigned integers in base 232 or 264 numbers stored in arrays of 32-bit or 64-bit integers in little-endian order.

BigInteger = A[0] + A[1]*base1 + A[2]*base2 + A[3]*base3 + ...

These arrays are in little-endian such that:

 

The decision to use little-endian was mostly due to ease of programming. But it also has the following (desireable) effects:

 

y-cruncher supports both word-sizes for both 32-bit and 64-bit hardware, but to state the obvious:

 

Although y-cruncher can be compiled to use either 32-bit or 64-bit word sizes, not all of the low-level code is able to operate on both sizes.

Everything is required to support 32-bit word sizes. But 64-bit is optional. Anything that doesn't natively support 64-bit simply calls the 32-bit version with a few pointer casts and length adjustments. This is possible because the little-endian order ensures that the data layout is compatible.

 

In particular, any operation that has no performance benefit to using 64-bit words provides only a 32-bit implementation. Most of the large multiplication algorithms treat their operands as data-streams rather than arrays of integers. Therefore, there is no difference between the 32-bit and 64-bit representations.

 

 

 

Signed Integers:

 

Believe it or not, y-cruncher does not have a large signed integer object. Why? It doesn't need it.

 

 

Floating-Point

 

Floating-point numbers are represented as unsigned integers with a sign boolean and a signed 64-bit exponent.

BigFloat = sign * ( A[0] + A[1]*base1 + A[2]*base2 + A[3]*base3 + ... ) * baseexp

Nothing special here. It is intentionally simple. No infinities, or NaNs. The only minor complication is that special handling is required for zero.

 

 

Considerations

 

The packed representation used in y-cruncher is used almost universally for binary-representations. But it has a couple of very major drawbacks:

In the average case, carry-propagation is parallelizable. But the worst-case remains bad unless you start using something like a Kogge-Stone adder in software.

 

 

Redundant Representation:

 

It is possible to use a base that is smaller than the word-size. (e.g. 260 for a 64-bit word)

The idea with this approach is to delay carry-progation until it is actually needed. This allows additions and subtractions to be vectorized.

 

But these have different drawbacks:

These are all pretty bad. I've done a small feasibility study and found that the increase in memory usage and bandwidth demand alone outweights any benefit of faster additions/subtractions (which are already memory bound). There may be special cases within the CPU L1 cache where it is beneficial, but overall, a redundant representation is not worth the hassle.

 

Lastly, very little work is actually done directly on the number representation. Most of the time is spent in the large multiplications. But all of the FFT-based multiply algorithms do all their computation using their own internal representation. They convert the operands to their internal representation, do their work, and convert back. It is not possible to choose a representation that matches that of the large multiply algorithms because each of those algorithms have different (and variable) internal representations.

 

So in the end, the full-word (packed) representation remains the best because it is simple and optimal for memory.

 

 

Other Domains

 

The packed implementation is the "standard" representation used in the program. It is sort of the "English" for interfacing between different components. Nevertheless, other representations are used in various parts of the program.

 

 

Radix Conversion:

 

The radix conversion module converts digits from binary to a different radix. The output for such a conversion is base 109 or 1019 depending on whether the word-size is 32-bit or 64-bit. The digit viewer that processes digits will convert this representation to an ASCII string before dumping it to a massive .txt file.

 

 

FFT Multiplication:

 

The Floating-point FFT multiply will convert the operands to base 2k where k can be anywhere from 8 - 20 depending on the size of the convolution.

Inside the FFT itself, it's just an array of double-precision floating-point values aligned to the SIMD vector.

 

The rest of the FFT algorithms have similarly different internal representations.