y-cruncher - Language and Algorithms

By Alexander J. Yee

 

(Last updated: April 4, 2010)

 

 

Back To:

Language:

 

 

As of v0.5.2, the breakdown for lines of code is as follows:

Category Lines of Code Bytes of Code
Active Code
166,938
5,495,464
Under Development
< 10,000
Deactivated Code*
173,537
7,444,695
Category Lines of Code Bytes of Code
APIs
5,032
165,645
Interface + Minor Features
3,391
109,911
Digit Writers + Readers
4,760
149,974
Profiling + Tuning
5,842
195,006
Testers
7,300
232,967
Computation Infrastructure
3,478
118,252
Objects
4,422
143,966
Functions + Constants
41,937
1,202,895
Basic Arithmetic
1,990
62,397
Basic Multiplication
2,644
87,271
FFT (Original) - x87 + SSE3 + SSE4.1
19,788
725,617
FFT v2 (New) - x64 Vector Scalable
24,900
847,367
Hybrid NTT
35,334
1,202,216
Multiplication Wrappers
837
30,535
Other
5,283
221,445
Total (All Active Code)
166,938
5,495,464
Functions + Constants Lines of Code Bytes of Code
Radix Conversion
2,704
80,625
Arithmetic-Geometric Mean
157
4,616
Division/Reciprocation
450
11,464
Inverse Square Root
545
13,054
Integer ArcCoth()
3,092
84,838
Square Root of n
374
11,537
Golden Ratio
369
12,617
e
4,426
112,888
Pi
6,842
198,465
Log(2)
1,275
37,910
Log(10)
1,262
37,599
Zeta(3) - Apery's Constant
6,197
174,191
Catalan's Constant
6,082
176,129
Euler-Mascheroni Constant
7,211
212,686
Other
951
34,276
Total (Functions + Constants)
41,937
1,202,895

Formulas:

y-cruncher has two algorithms for each major constant that it can compute - one for computation, and one for verification.

 

All complexities shown assume multiplication to be O(n log(n)). It is slightly higher than that, but for all practical purposes, O(n log(n)) close enough.

 

Square Root of n and Golden Ratio

1st order Newton's Method - Runtime Complexity: O(n log(n))
Note that the final radix conversion from binary to decimal has a complexity of O(n log(n)2).
e

Taylor Series of exp(1): O(n log(n)2)


Taylor Series of exp(-1): O(n log(n)2)


 
Pi

Chudnovsky Formula: O(n log(n)3)


Ramanujan's Formula: O(n log(n)3)


 
Log(2)

Machin-Like Formulas: O(n log(n)3)




Despite the fact that the formulas are so similar, they are sufficient to qualify as computation and verification.
As long as two machin-like formula for the same constant have different coefficients for all shared terms, they are sufficent for computation and verification.

 
Log(10)

Machin-Like Formulas: O(n log(n)3)



 
Zeta(3) - Apery's Constant

Amdeberhan-Zeilberger Formula 2: O(n log(n)3)



Amdeberhan-Zeilberger Formula 1: O(n log(n)3)



Catalan's Constant

Lupas Formula: O(n log(n)3)


Ramanujan's Formula: O(n log(n)3)



Euler-Mascheroni Constant

Brent-McMillan with Refinement: O(n log(n)3)



                          


Brent-McMillan (alone): O(n log(n)3)



                 

Note that both of these formulas are essentially the same.
Therefore, in order for two computations to be independent enough to qualify as a verified record, they MUST be done using different n.

For simplicity of implementation, y-cruncher only uses n that are powers of two - which serves a dual purpose of allowing the use of (the easily computed) Log(2), as well as lending itself to shift optimizations.

Arithmetic Algorithms:

 

Multiplication:

y-cruncher uses a combination of grade-school, Karatsuba, Fast Fourier Transform, and Hybridized Number-Theoretic Transform multiplication algorithms.

This rather unique combination of integer and floating-point based algorithms has proven itself to be extremely useful. Not only is it fast and memory efficient, but it is also well suited for parallelism and vectorization.


Even without the use of assembly language optimizations, the current multiplication scheme used in y-cruncher is faster than that of GMP (as of the version used in Mathematica 6).

(GMP is an extremely fast arbitrary precison arithmetic library that is used by virtually every major math program including Mathematica, and Matlab. It is fast due in part to its use of highly optimized and specialized assembly code for different processors.)


The table below summarizes the approximate ranges where each algorithm is used: (for all versions prior to v0.4.3)

(v0.4.3 has a completely different tuning and is a lot more complicated than this.)

Algorithm Complexity Approximate Range (decimal digits)
x86 x64
Basecase (grade-school multiplication) 1 - 130 1 - 170
Karatsuba 130 - 1000 170 - 560
3-Way Toom-Cook Implemented - But Never Optimal to Use
Floating-Point Fast Fourier Transform (FFT)
w = # bits in mantissa
1000 - 4,800,000 560 - 120,000
Hybridized Number-Theoretic Transform (Hybrid NTT) Better than FFT, but worse than SSA
Pending Further Analysis...
> 4,800,000* > 120,000
Schönhage-Strassen Algorithm (SSA) Implemented - But Never Optimal to Use**

The current method of having a single threshold between each algorithm is not optimal as it does not take into account for multiple crossover points between adjacent algorithms.

 

*From my experience, it is never optimal to use Hybrid NTT on x86. FFT is faster for anything under the 32-bit limit.
But in the interest of reducing memory usage for large computations, I've forced y-cruncher to use Hybrid NTT for anything above 4.8 million digits - even if it means significantly slowing down a computation.

 

**The difference between the run-time complexities of Hybrid NTT and SSA is on the order of double-logarithms. However, SSA has a much larger Big-O constant - so large that Hybrid NTT is a lot faster for any attainable size despite having a slightly inferior complexity.
(Of course, this could change if there are any new ground-breaking optimizations that apply to SSA but not to Hybrid NTT.)

 

 

Other:

Nothing special here. These are all standard algorithms and approaches.

As far as optimizations go, Division and Square Roots have been given very little attention and are therefore inefficient. (The current implementations aren't even fully paralleled.) Natural Log via AGM suffers as a result. Division and square roots account for such a tiny portion of the total run-time for the majority of constants that it has not been worth the effort to improve it. Most of the optimizations (as of version 0.5.3) have been concentrated in multiplication and Binary Splitting because they clearly dominate total run-time.