y-cruncher - Language and Algorithms

By Alexander J. Yee

 

(Last updated: February 13, 2010)

 

 

Back To:

Language:

 

 

As of v0.4.4, the breakdown for lines of code is as follows:

Category Lines of Code Bytes of Code
Active Code
116,379
3,849,389
Under Development
20,398
660,019
Deactivated Code*
157,619
6,920,428
Category Lines of Code Bytes of Code
APIs
4,904
163,343
Interface + Minor Features
7,430
231,674
Digit Writers + Readers
3,603
115,422
Settings
133
5,858
Profiling + Tuning
4,581
154,342
Testers
2,058
62,265
Computation Infrastructure
1,702
58,544
Objects
3,167
105,639
Functions + Constants
33,808
990,921
Basic Arithmetic
675
20,927
Basic Multiplication
2,644
87,271
FFT
19,562
719,504
Hybrid NTT
26,568
904,565
Multiplication Wrappers
783
27,948
Other
4,761
201,166
Total (All Active Code)
116,379
3,849,389
Functions + Constants Lines of Code Bytes of Code
Radix Conversion
1,883
57,304
Arithmetic-Geometric Mean
165
4,958
Division/Reciprocation
268
6,687
Inverse Square Root
421
10,118
Integer ArcCoth()
2,351
66,687
Square Root of n
360
11,625
Golden Ratio
312
10,948
e
3,194
83,319
Pi
5,284
158,935
Log(2)
966
29,334
Log(10)
958
29,110
Zeta(3) - Apery's Constant
4,765
139,273
Catalan's Constant
5,242
154,465
Euler-Mascheroni Constant
7,102
209,035
Other
537
19,123
Total (Functions + Constants)
33,808
990,921

Formulas:

y-cruncher has two algorithms for each major constant that it can compute - one for computation, and one for verification.

 

All complexities shown assume multiplication to be O(n log(n)). It is slightly higher than that, but for all practical purposes, O(n log(n)) close enough.

 

Square Root of n and Golden Ratio

1st order Newton's Method - Runtime Complexity: O(n log(n))
Note that the final radix conversion from binary to decimal has a complexity of O(n log(n)2).
e

Taylor Series of exp(1): O(n log(n)2)


Taylor Series of exp(-1): O(n log(n)2)


 
Pi

Chudnovsky Formula: O(n log(n)3)


Ramanujan's Formula: O(n log(n)3)


 
Log(2)

Machin-Like Formulas: O(n log(n)3)




Despite the fact that the formulas are so similar, they are sufficient to qualify as computation and verification.
As long as two machin-like formula for the same constant have different coefficients for all shared terms, they are sufficent for computation and verification.

 
Log(10)

Machin-Like Formulas: O(n log(n)3)



 
Zeta(3) - Apery's Constant

Amdeberhan-Zeilberger Formula 2: O(n log(n)3)



Amdeberhan-Zeilberger Formula 1: O(n log(n)3)



Catalan's Constant

Lupas Formula: O(n log(n)3)


Ramanujan's Formula: O(n log(n)3)



Euler-Mascheroni Constant

Brent-McMillan with Refinement: O(n log(n)3)



                          


Brent-McMillan (alone): O(n log(n)3)



                 

Note that both of these formulas are essentially the same.
Therefore, in order for two computations to be independent enough to qualify as a verified record, they MUST be done using different n.

For simplicity of implementation, y-cruncher only uses n that are powers of two - which serves a dual purpose of allowing the use of (the easily computed) Log(2), as well as lending itself to shift optimizations.

Arithmetic Algorithms:

 

Multiplication:

y-cruncher uses a combination of grade-school, Karatsuba, Fast Fourier Transform, and Hybridized Number-Theoretic Transform multiplication algorithms.

This rather unique combination of integer and floating-point based algorithms has proven itself to be extremely useful. Not only is it fast and memory efficient, but it is also well suited for parallelism and vectorization.


Even without the use of assembly language optimizations, the current multiplication scheme used in y-cruncher is faster than that of GMP (as of the version used in Mathematica 6).

(GMP is an extremely fast arbitrary precison arithmetic library that is used by virtually every major math program including Mathematica, and Matlab. It is fast due in part to its use of highly optimized and specialized assembly code for different processors.)


The table below summarizes the approximate ranges where each algorithm is used: (for all versions prior to v0.4.3)

(v0.4.3 has a completely different tuning and is a lot more complicated than this.)

Algorithm Complexity Approximate Range (decimal digits)
x86 x64
Basecase (grade-school multiplication) 1 - 130 1 - 170
Karatsuba 130 - 1000 170 - 560
3-Way Toom-Cook Implemented - But Never Optimal to Use
Floating-Point Fast Fourier Transform (FFT)
w = # bits in mantissa
1000 - 4,800,000 560 - 120,000
Hybridized Number-Theoretic Transform (Hybrid NTT) Better than FFT, but worse than SSA
Pending Further Analysis...
> 4,800,000* > 120,000
Schönhage-Strassen Algorithm (SSA) Implemented - But Never Optimal to Use**

The current method of having a single threshold between each algorithm is not optimal as it does not take into account for multiple crossover points between adjacent algorithms.

 

*From my experience, it is never optimal to use Hybrid NTT on x86. FFT is faster for anything under the 32-bit limit.
But in the interest of reducing memory usage for large computations, I've forced y-cruncher to use Hybrid NTT for anything above 4.8 million digits - even if it means significantly slowing down a computation.

 

**The difference between the run-time complexities of Hybrid NTT and SSA is on the order of double-logarithms. However, SSA has a much larger Big-O constant - so large that Hybrid NTT is a lot faster for any attainable size despite having a slightly inferior complexity.
(Of course, this could change if there are any new ground-breaking optimizations that apply to SSA but not to Hybrid NTT.)

 

 

Other:

Nothing special here. These are all standard algorithms and approaches.

As far as optimizations go, Division and Square Roots have been given very little attention and are therefore inefficient. (The current implementations aren't even fully paralleled.) Natural Log via AGM suffers as a result. Division and square roots account for such a tiny portion of the total run-time for the majority of constants that it has not been worth the effort to improve it. Most of the optimizations (as of version 0.4.3) have been concentrated in multiplication and Binary Splitting because they clearly dominate total run-time.