Number Representation

By Alexander Yee

(Last updated: February 4, 2019)

Back To:

Large Integers

Unsigned Integers:

y-cruncher represents unsigned integers in base 2³² or 2⁶⁴ numbers stored in arrays of 32-bit or 64-bit integers in little-endian order.

BigInteger = A[0] + A[1]*base¹ + A[2]*base² + A[3]*base³ + ...

These arrays are in little-endian such that:

Increasing address leads to increasing "magnitude" of digits. (A[0] is the one's digit. A[1] is the 2³²'s digit... etc...)
Carry-propagation goes in the direction of increasing address.

The decision to use little-endian was mostly due to ease of programming. But it also has the following (desireable) effects:

Both 32-bit and 64-bit become (almost) mutually compatible since the data layout is the same in memory. (y-cruncher requires little-endian addressing)
Most loops iterate forward rather than in reverse. This matters for disk storage since they are most efficiently accessed sequentially in forward direction.

y-cruncher supports both word-sizes for both 32-bit and 64-bit hardware, but to state the obvious:

Base 2³² is fastest on x86.
Base 2⁶⁴ is fastest on x64.

Although y-cruncher can be compiled to use either 32-bit or 64-bit word sizes, not all of the low-level code is able to operate on both sizes.

Everything is required to support 32-bit word sizes. But 64-bit is optional. Anything that doesn't natively support 64-bit simply calls the 32-bit version with a few pointer casts and length adjustments. This is possible because the little-endian order ensures that the data layout is compatible.

In particular, any operation that has no performance benefit to using 64-bit words provides only a 32-bit implementation. Most of the large multiplication algorithms treat their operands as data-streams rather than arrays of integers. Therefore, there is no difference between the 32-bit and 64-bit representations.

Signed Integers:

Believe it or not, y-cruncher does not have a large signed integer object. Why? It doesn't need it.

Floating-Point

Floating-point numbers are represented as unsigned integers with a sign boolean and a signed 64-bit exponent.

BigFloat = sign * ( A[0] + A[1]*base¹ + A[2]*base² + A[3]*base³ + ... ) * base^exp

Nothing special here. It is intentionally simple. No infinities, or NaNs. The only minor complication is that special handling is required for zero.

Considerations

The packed representation used in y-cruncher is used almost universally for binary-representations. But it has a couple of very major drawbacks:

Carry-progation is not vectorizable due to the inherent dependency chain.
Add-with-carry functionality is difficult to utilize without inline-assembly or compiler intrinsics.

In the average case, carry-propagation is parallelizable. But the worst-case remains bad unless you start using something like a Kogge-Stone adder in software.

Redundant Representation:

It is possible to use a base that is smaller than the word-size. (e.g. 2⁶⁰ for a 64-bit word)

The idea with this approach is to delay carry-progation until it is actually needed. This allows additions and subtractions to be vectorized.

But these have different drawbacks:

It requires keeping track of how many "free bits" have been used and perfoming a carryout when it is close to overflowing.
It complicates multplication and division.
It uses more memory.
There are multiple ways to represent a particular integer. This makes comparisons more difficult.

These are all pretty bad. I've done a small feasibility study and found that the increase in memory usage and bandwidth demand alone outweights any benefit of faster additions/subtractions (which are already memory bound). There may be special cases within the CPU L1 cache where it is beneficial, but overall, a redundant representation is not worth the hassle.

Lastly, very little work is actually done directly on the number representation. Most of the time is spent in the large multiplications. But all of the FFT-based multiply algorithms do all their computation using their own internal representation. They convert the operands to their internal representation, do their work, and convert back. It is not possible to choose a representation that matches that of the large multiply algorithms because each of those algorithms have different (and variable) internal representations.

So in the end, the full-word (packed) representation remains the best because it is simple and optimal for memory.

Other Domains

The packed implementation is the "standard" representation used in the program. It is sort of the "English" for interfacing between different components. Nevertheless, other representations are used in various parts of the program.

Radix Conversion:

The radix conversion module converts digits from binary to a different radix. The output for such a conversion is base 10⁹ or 10¹⁹ depending on whether the word-size is 32-bit or 64-bit. The digit viewer that processes digits will convert this representation to an ASCII string before dumping it to a massive .txt file.

FFT Multiplication:

The Floating-point FFT multiply will convert the operands to base 2^k where k can be anywhere from 8 - 20 depending on the size of the convolution.

Inside the FFT itself, it's just an array of double-precision floating-point values aligned to the SIMD vector.

The rest of the FFT algorithms have similarly different internal representations.