y-cruncher - Language and Algorithms
By Alexander J. Yee
December 31, 2013)
Implementation (as of v0.6.3):
- y-cruncher is written in C++11. But significant portions of it will compile in C99.
- Some inline assembly is used.
- Intel SSE and AVX compiler intrinsics are used.
- As of December 22, 2013:
- y-cruncher has about 200,000 lines of active hand-written code for a total size of approximately 6.7 MB.
- If inactive and auto-generated code is included, then the figures are 225,000 lines for about 7.5 MB.
- The external Digit Viewer has about 12,000 lines of code. But only 1,100 lines is for itself. The rest is shared with y-cruncher.
- y-cruncher itself is closed source, but the Digit Viewer is open-sourced on my GitHub. It includes all the necessary dependencies to compile.
Libraries and Dependencies:
- Windows: WinAPI
- Linux: POSIX
- y-cruncher has no other 3rd party dependencies.
- All Windows binaries are compiled using Microsoft Visual Studio 2012.
- All Linux binaries are compiled using GCC.
- The Intel Compiler is used in some versions of y-cruncher. But it is not used in v0.6.2 - v0.6.3.
Other Internal Requirements:
- Little-endian addressing
- Sign-fill (arithmetic) right-shift
- IEEE double-precision floating-point
y-cruncher's design is split into 3 major layers.
- Interface + Math Layer: (24% of total code: 47,000 lines, 214 files)
This is where all the constants and functions are implemented. This is done largely using very simple object-oriented programming structured in a mathematical way that most resembles the formulas/algorithms that are involved.
- Object Layer: (2% of total code: 4,000 lines, 13 files)
This is where all the "number" objects are defined. (Large integer, large floating-point, etc...)
This layer is just a simple class library that wraps the modules layer into a usable interface for bignum arithmetic.
- Modules Layer: (74% of total code: 149,000 lines, 720 files)
The Modules layer implements all the core operations needed by the upper layers. This is where most of the performance critical code is. Most of the code bloat in this module is due to performance optimizations.
y-cruncher has two algorithms for each major constant that it can compute - one for computation, and one for verification.
All complexities shown assume multiplication to be O(n log(n)). It is slightly higher than that, but for all practical purposes, O(n log(n)) is close enough.
- Square Root of n and Golden Ratio
- 1st order Newton's Method - Runtime Complexity: O(n log(n))
Note that the final radix conversion from binary to decimal has a complexity of O(n log(n)2).
- Taylor Series of exp(1): O(n log(n)2)
- Taylor Series of exp(-1): O(n log(n)2)
- Chudnovsky Formula: O(n log(n)3)
- Ramanujan's Formula: O(n log(n)3)
- Machin-Like Formulas: O(n log(n)3)
- Starting from v0.6.1, y-cruncher is able to generate and utilize machin-like formulas for the log of any small integer.
- Zeta(3) - Apery's Constant
- Amdeberhan-Zeilberger Formula 2: O(n log(n)3)
- Amdeberhan-Zeilberger Formula 1: O(n log(n)3)
- Gauss Formula: O(n log(n)3)
- Sebah's Formula: O(n log(n)3)
Both of these formulas were pulled from: http://numbers.computation.free.fr/Constants/PiProgram/userconstants.html
Note that the AGM-based algorithms are probably faster. But y-cruncher currently uses these series-based formulas because:
- It lacks an implementation of sqrt(N) for arbitrary N. It only supports N being a small integer.
- There already is a series summation framework that can be easily applied to the ArcSinlemn() function.
- Catalan's Constant
- Lupas Formula: O(n log(n)3)
- Huvent's Formula: O(n log(n)3)
y-cruncher uses the following rearrangement of Huvent's formula:
- Euler-Mascheroni Constant
- Brent-McMillan with Refinement: O(n log(n)3)
- Brent-McMillan (alone): O(n log(n)3)
- Note that both of these formulas are essentially the same.
Therefore, in order for two computations to be independent enough to qualify as a verified record, they MUST be done using different n.
For simplicity of implementation, y-cruncher only uses n that are powers of two - which serves a dual purpose of allowing the use of (the easily computed) Log(2), as well as lending itself to shift optimizations.
Like most bignum programs, y-cruncher uses different multiplication algorithms for different sizes:
As of v0.6.3, y-cruncher uses:
*Standard libraries like GMP avoid the use of floating-point FFTs due to the difficulty of proving bounds on round-off error. But with modern SIMD, the performance of floating-point is through the roof. So rather than throwing all this computational power away, y-cruncher utilizes it to the maximum extent.
y-cruncher fully exploits floating-point algorithms and ensures their accuracy by means of conservative parameter selection and algorithmic error-detection. Although this is not proven to be correct for all inputs, the probability that y-cruncher's multiplication returns an undetected incorrect value is less than the probability of a silent hardware error. So in that sense, y-cruncher is arguably more reliable than other implementations that lack algorithmic error-detection.
**The last two of these algorithms (Hybrid NTT and VST) are homebrew algorithms that were developed between 2008 - 2012. Both of them are currently unpublished.
The following popular algorithms are currently not used by y-cruncher:
With modern SIMD, the floating-point FFT asserts complete dominance over everything from 500 decimal digits up to the point where it no longer fits in CPU cache.
This completely displaces 3-way Toom-Cook as well as a good portion of the sizes that are normally covered by SSA.
Of the three of these, the NTT over primes algorithm is the only one that y-cruncher may conceivably pick up in the near future.
Inverse and Roots:
Reciprocals, division, and square roots are all done using 1st order Newton's Method.
- Prior to v0.6.1, their implementations were very basic and unoptimized.
- Starting v0.6.1, they take advantage of middle-product and the reuse of redundant FFTs.
Prior to v0.6.1, only log(2) and log(10) were supported using hard-coded machin-like formulas. A generic log(n) was needed for Ramanujan's formula for Catalan's constant. That was implemented using Arithmetic-Geometric Mean (AGM).
In v0.6.1, Ramanujan's formula for Catalan's constant was removed - thereby removing the need for a generic log(n). Instead, v0.6.1 supports the computation of log(n) for any small integer n. This is done using a formula generator that generates (at run-time) machin-like formulas for arbitrary small integer n.
Generation of machin-like formulas for log(n) is done using table-lookup along with a branch-and-bound search on several argument reduction formulas.
- Prior v0.5.3, conversions were done using the standard Divide and Conquer algorithm.
- Starting from v0.5.3, they are done using Scaled Remainder Trees. This is the same approach that was used in the December 2009 world record for Pi.
- Starting from v0.6.1, the Scaled Remainder Tree algorithm is combined with the middle-product and reuse of redundant FFTs.
Series summation is done using standard Binary Splitting techniques with the following catches:
- Precision is tightly controlled to keep operands from becoming unnecessarily large.
- The cutting points for the binary splitting recursions are usually significantly skewed in order to reduce memory usage and improve load-balance.
- Series summation is partially done backwards. The lowest terms (terms with the largest index) are summed first. The motivation and details for this is quite intricate and goes hand-and-hand with the skewing. So I won't go into that.
This series summation scheme (including the skewed splitting and backwards summing) has been the same in all versions of y-cruncher to date. All of this is expected to change when GCD factorization is to be incorporated.