y-cruncher - Algorithms and Internals

By Alexander J. Yee

 

(Last updated: October 14, 2017)

 

 

Back To:

 

Expanded Articles:

 

General:

 

Large Number Arithmetic:

 

Implementation (as of v0.7.4):

 

General Information:

 

Libraries and Dependencies:

y-cruncher has no other non-system dependencies. No Boost. No GMP. Pretty much everything that isn't provided by C++ is built from ground up.

Furthermore, the Cilk Plus dependency is not a hard dependency. It can be trivially removed without affecting the core functionality of the program.

 

 

Compilers:

 

Other Internal Requirements:

 

Code Organization:

 

y-cruncher's root source tree is (roughly) broken up into the following subdirectories. They are listed in order of build dependency.

Module Files Lines of Code Open Sourced? Description
Public Libs 100 8,583 Yes

The public portion of the support library: Console I/O, File I/O, time, exceptions, smart pointers, serialization, strings, and basic concurrency and environment info.

Arch-Specific Libs 90 9,739 No ISA-specific libraries for SSE, AVX, and AVX512.
Basic Libs 94 10,041 No Simple and (mostly) self-contained libraries and interfaces: Command line options, template containers, file readers, configuration files, checkpoint-restart, thread pool, and parallelism interface.
SystemLibs 140 14,935 No Libraries with system-specific implementations: Hardware detection, memory allocators, thread library, and parallel frameworks.
Deprecated Libs 42 7,168 No Libraries that are no longer developed or require massive overhaul: Old memory allocator, swap file interface, and multi-layer raid-file.
Dynamic Linking 3 213 No Nothing here yet.
Launcher 9 709 No

The CPU dispatcher that picks the optimal binary to run.

It's the module that builds the y-cruncher(.exe) binary.

Digit Viewer 87 9,990 Yes

The bundled Digit Viewer.

BBPv2 32 4,381 No

The bundled BBP digit extraction app for Pi.

Modules

(expanded below)

1,862 286,434 No

Low-level arbitrary-precision arithmetic: Addition, subtraction, multiplication, radix conversion, and checksum hashing.

Objects 84 13,948 Partial

Large number objects. (BigInt, BigFloat, etc...)

Functions 25 3,475 No

Non-trivial math: Division, square root, and string conversions.

YMP Library 14 2,139 Headers Only

A public interface to the internal large number library.

Number Factory 31 3,414 Yes

Research infrastructure and test app for the YMP library.

y-cruncher 324 46,947 No

y-cruncher itself. This has most of the console UI and the implementations for all the constants.

Experimental 83 11,012 No

Sandboxes for experimental code.

Misc. 10 4,184 No

Settings, versioning, and development sandbox.

Total: 3,030 437,312  

Software bloat anyone?

 

Modules
Sub-Module Files Lines of Code Description
Cache 6 348

Dead Code - unfinished experiment with caching.

Float Types 6 931 "Float-Indexing".
Intrinsics 14 1,068

Double-word multiply, bit-reversal, length conversions, etc...

Profiles 14 925

Processor-specific tuning settings.

Linear Ops 9 949

Parallel and out-of-core memset(), memcpy(), and scanning.

Checksum Hashing 12 1,333 Modular Redundancy Checks. Ram-only + parallel + out-of-core.
Random 9 710

Pseudorandom number generators. Ram-only + parallel + out-of-core.

Carryout 13 2,671 Kogge-Stone Parallel Carryout. Ram-only + parallel + out-of-core.
Addition 15 1,368 Large integer addition and subtraction. Ram-only.
Word Multiply 15 1,784

Single-word multiplication. Ram-only + parallel + out-of-core.

Parameters 11 860

Parameters structures.

Large Multiply 1,712 269,126

Large integer multiplication. Ram-only + parallel + out-of-core.

CBRv2 19 3,271 Radix Conversion via Scaled Remainder Tree. Ram-only + parallel + out-of-core.
Testers 1 264

Deprecated test scripts.

Misc. 6 826

 

Total: 1,862 286,434

 

 

 

Notes:

 

 

 

Limits:

 

Like most other programs, there are theoretical limits to y-cruncher. Most of these limits are enforced by the program.

  32-bit 64-bit Comments

Ram Usage

~1.8 GiB ~ 1 EiB (1018 bytes)

Limited by memory address space.

Disk Usage

~ 1 EiB

Limited by 64-bit address space.

Task Decomposition

65,536

Arbitrary limit.

RAID - Level 1

8 paths

 

RAID - Level 2

64 x Level 1 RAID groups

Limited by the # of bits in largest integer.

Will likely be increased in the future.

Largest Multiplication

(2.02 * 1018) x (2.02 * 1018) bits
(6.7 * 1017) x (6.7 * 1017)
decimal digits

Small Primes Number-Theoretic Transform:

  • 5 x 63-bit primes
  • Transform Length: 7 * 252

Convolution Length

4.03 * 1018 bits
1.34 * 1018 decimal digits

Computation Size

(for all constants)

1015 decimal digits

Limited by double-precision floating-point.*

BBP Hexadecimal Offset

246 - 1

Implementation-specific limitation.

*y-cruncher uses double-precision floating-point for things such as:

The result of these calculations are generally rounded to integers and must be accurate to +/- 1 for the program to operate correctly. The problem is that double-precision floating-point only has 53 bits of precision which will run out at around 9 * 1015. Since there is round-off error, the limit will certainly be lower. The exact limit is unknown and will vary with the different constants. Therefore y-cruncher arbitrarily caps it to 1015 decimal digits. Colloquially, I call this the "float-indexing limit".

 

There are currently no plans to raise this limit since it is already well beyond the capability of current hardware (as of 2015).

 

It is worth mentioning that the float-indexing limit is the only thing left that prevents y-cruncher from going all the way up to the 64-bit limit. Without it, it should be possible to reach 6.7 * 1017 decimal digits (the limit of the Small Primes NTT).

 

Getting rid of the float-indexing limit will require a floating-point type with at least a 64-bit mantissa. A viable option is to use 80-bit extended-precision via the x87 FPU although some compilers don't support it. But since "float indexing" isn't exactly a performance bottleneck, any sort of software emulation will probably work as well.