y-cruncher - Algorithms and Internals

By Alexander J. Yee

(Last updated: August 20, 2023)

Back To:

Expanded Articles:

General:

Large Number Arithmetic:

Implementation (as of v0.8.2):

General Information:

y-cruncher started off as a C99 program. Now it is requires C++17.
Intel SSE and AVX compiler intrinsics are heavily used.
Some inline assembly is used.
C++ template metaprogramming is used extensively to reduce code duplication.

Libraries and Dependencies:

WinAPI (Windows Only)
POSIX (Linux Only)
Cilk Plus
Thread Building Blocks (TBB)

y-cruncher has no other non-system dependencies. No Boost. No GMP. Pretty much everything that isn't provided by C++ is built from ground up.

Furthermore, the Cilk and TBB dependencies can be trivially removed without affecting the core functionality of the program.

Compilers:

Windows: Microsoft Visual Studio + Intel Compiler
Linux: GCC

Other Internal Requirements:

Little-endian addressing
Sign-fill (arithmetic) right-shift
IEEE double-precision floating-point

Code Organization:

y-cruncher's root source tree is (roughly) broken up into the following subdirectories. They are listed in order of build dependency.

Module	Files	Lines of Code	Open Sourced?	Description
Public Libs	153	15,217	Yes	The public portion of the support libraries.
Private Libs	456	60,875	No	The private portion of the support libraries.
Dynamic Linking	3	213	No	Nothing here yet.
Launcher	10	931	Yes	The CPU dispatcher that picks the optimal binary to run. It's the module that builds the y-cruncher(.exe) binary.
Digit Viewer 2	93	11,626	Yes	The new Digit Viewer.
BBPv2	60	9,770	No	The bundled BBP digit extraction app for Pi.
Modules	2,841	484,039	No	Low-level arbitrary-precision arithmetic: Addition, subtraction, multiplication, single-word division, and checksum hashing.
Objects	84	17,469	Partial	Large number objects. (BigInt, BigFloat, etc...)
Functions	349	54,065	No	High-level mathematical code. Implementations for all functions and all the constants.
YMP Library	14	2,275	Headers Only	A public interface to the internal large number library.
Number Factory	31	3,419	Yes (outdated)	Research infrastructure and test app for the YMP library.
y-cruncher	299	43,170	No	y-cruncher itself. Top-level code that includes all the UI menus.
Experimental	69	13,876	No	Sandboxes for experimental code.
External	2	169	No	Supporting DLLs.
Misc.	9	1,933	No	Settings, versioning, and development sandbox.
Total:	4,473	719,047		Software bloat anyone?

Notes:

The code size has grown significantly in the year leading up to v0.8.1 and v0.8.2. This is due to the ongoing rewrite of nearly all the large multiplication algorithms. As of this writing, that rewrite has not yet reached the point where the code that is being rewritten can be removed yet. Thus, it is expected that the code size will shrink in the next few releases.

Roughly 70,000 lines of code are either generated, or significantly aided by code generation. Most of it is auto-generated superoptimizer tables.

There is an estimated 100,000 lines of near duplicate code that are either not similar enough to de-dupe using generic programming techniques, or will incur an unacceptable performance hit to do so.

It takes several hours and all 64GB ram on my dedicated build machine (Ryzen 7 1800x) to compile all of y-cruncher, both Windows and Linux.

Limits:

Like most other programs, there are theoretical limits to y-cruncher. Most of these limits are enforced by the program.

	32-bit	64-bit	Comments
Ram Usage	~1.8 GiB	~ 1 EiB (10¹⁸ bytes)	Limited by memory address space.
Disk Usage	~ 1 EiB		Limited by 64-bit address space.
Task Decomposition	65,536		Arbitrary limit.
RAID - Level 1	8 paths
RAID - Level 2	64 x Level 1 RAID groups		Limited by the # of bits in largest integer. Will likely be increased in the future.
Largest Multiplication	(2.02 * 10¹⁸) x (2.02 * 10¹⁸) bits (6.7 * 10¹⁷) x (6.7 * 10¹⁷) decimal digits		Small Primes Number-Theoretic Transform: 5 x 63-bit primes Transform Length: 7 * 2⁵²
Convolution Length	4.03 * 10¹⁸ bits 1.21 * 10¹⁸ decimal digits
Computation Size (for all constants)	10¹⁵ decimal digits		Limited by double-precision floating-point.*
BBP Hexadecimal Offset	2⁴⁶ - 1		Implementation-specific limitation.

*y-cruncher uses double-precision floating-point for things such as:

Computing the number of terms needed for a computation.
Determining how much precision is needed for an operation.
Determining how large the numbers will be so that enough memory will be allocated.
Finding the optimal work-partitioning for various algorithms.

The result of these calculations are generally rounded to integers and must be accurate to +/- 1 for the program to operate correctly. The problem is that double-precision floating-point only has 53 bits of precision which will run out at around 9 * 10¹⁵. Since there is round-off error, the limit will certainly be lower. The exact limit is unknown and will vary with the different constants. Therefore y-cruncher arbitrarily caps it to 10¹⁵ decimal digits. Colloquially, I call this the "float-indexing limit".

There are currently no plans to raise this limit since it is already well beyond the capability of current hardware (as of 2015).

It is worth mentioning that the float-indexing limit is the only thing left that prevents y-cruncher from going all the way up to the 64-bit limit. Without it, it should be possible to reach 6.1 * 10¹⁷ decimal digits (the limit of the Small Primes NTT).

Getting rid of the float-indexing limit will require a floating-point type with at least a 64-bit mantissa. A viable option is to use 80-bit extended-precision via the x87 FPU although some compilers don't support it. But since "float indexing" isn't exactly a performance bottleneck, any sort of software emulation will probably work as well.