(Last updated: August 6, 2011)
Back To:
Language:
Code-Size: Lines of Code in y-cruncher:
Although y-cruncher is completely close-sourced (and will likely remain that way for a long time), I'm fine with revealing some of the structure to satisfy the curious.
Below is the approximate code-size distribution of the y-cruncher project (July 2011).
Some of the numbers are more up-to-date than others. But for the most part, they are relatively accurate.
| Category | Lines of Code | Bytes of Code | Files | Comments |
| General | ||||
Interface + Minor Features |
3,497 |
110,905 |
6 |
Bench/Batch/Stress/Custom Compute |
Computation Infrastructure |
2,894 |
97,126 |
4 |
Internal interface for launching computations. |
Digit Conversion |
6,912 |
245,588 |
38 |
Digit format conversions + compression/decompression |
Digit Readers/Writers |
7,569 |
236,013 |
27 |
Digit Viewer + Internal Digit Writer |
Architecture-Specific Profiles |
13,687 |
438,335 |
25 |
Settings for: x86, x86 SSE3, x64 SSE3 (Kasumi), x64 SSE4.1 (Nagisa/Ushio), x64 AVX (Hina) About 11,000 lines of this code is generated by the superoptimizer. |
APIs |
8,019 |
270,650 |
44 |
Home-made Libaries: Memory Allocation, Threads, Disk I/O, Time, Strings, etc... |
Internal Object Interface |
5,182 |
170,883 |
22 |
Objects for: Large Integers, Floating-Point |
| Functions | ||||
Division/Reciprocation |
594 |
17,110 |
1 |
|
Integer ArcCoth() |
3,174 |
87,369 |
9 |
Used by Log(2), Log(10), and the Euler-Mascheroni Constant |
Radix Conversion |
1,977 |
61,607 |
7 |
|
Other Minor Functions |
253 |
8,624 |
3 |
AGM for Natural Log, Integer Powers, etc... |
| Constants | ||||
Square Root + Golden Ratio |
1,427 |
42,623 |
7 |
Includes generic square roots. |
e |
4,642 |
121,095 |
10 |
exp(1) and exp(-1) are each about 1,100 lines. Fused summation is 1,700 lines. Rest is shared. |
Pi - Chudnovsky |
3,362 |
98,342 |
8 |
Uses square roots. |
Pi - Ramanujan |
3,153 |
91,090 |
8 |
Uses square roots. |
Pi - (shared) |
725 |
22,550 |
4 |
|
Log(2) |
1,366 |
41,743 |
5 |
Uses Integer ArcCoth(). |
Log(10) |
1,359 |
41,661 |
5 |
Uses Integer ArcCoth(). |
Zeta(3) - AZ1 |
2,851 |
79,610 |
8 |
|
Zeta(3) - AZ2 |
2,923 |
82,472 |
8 |
|
Zeta(3) - (shared) |
725 |
22,839 |
4 |
|
Catalan - Lupas |
3,127 |
88,531 |
10 |
|
Catalan - Ramanujan |
2,752 |
80,567 |
10 |
No Advanced Swap Implementation yet. Uses Pi, AGM natural Log, and square roots. |
Catalan - (shared) |
439 |
14,994 |
3 |
|
Euler-Mascheroni Constant |
7,247 |
214,528 |
25 |
No Advanced Swap Implementation yet. Both algorithms use the same code. Uses Log(2). |
Other |
3,276 |
150,197 |
12 |
|
| Internal Modules | ||||
Basic Arithmetic Unit |
1,996 |
65,007 |
14 |
Addition, Subtraction, Shift, Checksum-Hashing, Random Number Generation, etc... |
Basic Multiplication Unit |
3,551 |
119,519 |
4 |
32-bit implementations of: N x 1 Multiply, Basecase, Karatsuba, and Toom-3. |
FFT (original) |
16,347 |
611,337 |
65 |
Original FFT from the first release. Generic, SSE3, and SSE4.1. (depreciated) |
FFT (v2) |
28,498 |
961,047 |
75 |
Vector-Scalable FFT: SSE3, SSE4.1, AVX, and FMA (planned). |
FFT (shared) |
2,968 |
101,042 |
7 |
Shared by both FFTs. |
Hybrid NTT (core only) |
22,194 |
790,463 |
57 |
Core components only. Total code-size for this algorithm is about 35,000 lines. An additional 13,000 lines belonging to this algorithm are scattered across other categories. |
Multiplication Wrappers |
6,198 |
206,845 |
12 |
Builds a multiplication interface around the FFT/FFTv2 and the Hybrid NTT. All three algorithms (Basic, FFT, and Hybrid NTT) are wrapped together by an algorithm dispatcher. |
Other |
2,791 |
121,163 |
14 |
|
| Total Active Code: | 177,675 |
5,913,475 |
561 |
Includes all code that is currently active in the released version of y-cruncher v0.5.5. |
| Development Code (Disabled in Public Releases) |
||||
Auther's Version Features |
6,538 |
220,070 |
8 |
Benchmarking features for each individual multiplication algorithm. |
Performance Tuning |
5,174 |
165,853 |
14 |
Includes the internal superoptimizer. |
Testers/Debuggers |
2,529 |
77,959 |
12 |
|
| Total Development Code: | 14,241 |
463,882 |
34 |
|
| Unfinished Components | ||||
YMP Front-End Library |
10,456 |
358,459 |
44 |
100% complete (Alpha Testing) - ETA: sometime after v0.6.1 Dynamic-Link Library for y-cruncher's arithmetic library. All essential features have been implemented. All new features are extraneous. |
y-cruncher Modules Wrapper |
5,748 |
228,044 |
26 |
100% complete - ETA: v0.6.1 Compile-time code dispatcher for all of y-cruncher's low-level functions. |
Basic Arithmetic Unit (v2) |
14,546 |
483,045 |
78 |
100% complete - ETA: v0.6.1 Native 64-bit/vector arithmetic: Shift, N x 1/Basecase/Karatsuba Multiply Partial support for Delayed-Carry Arithmetic. |
Multiplication Wrappers (v2) |
6,508 |
212,278 |
26 |
100% complete - ETA: v0.6.1 Native 64-bit. Flexible memory management. Double-ended Partial products. |
Internal Data Conversion |
4,757 |
169,304 |
42 |
100% complete - ETA: v0.6.1 A dedicated unit for converting between different internal representations. This module is currently only used by the YMP Front-End Library Interface and Hybrid NTT (v2). Neither of these are in production yet. So there's no point in enabling this module yet. |
Multi-Layer Raid-File |
4,368 |
139,828 |
21 |
70% complete - ETA: v0.6.1? - Usable. Recovery features not implemented yet. A new swap-file interface that supports hybrid RAID 0/3. |
VST Multiplication Algorithm |
57,708 |
1,901,957 |
183 |
95% complete - ETA: v0.6.1? - Complete. Not integrated yet. Undergoing intensive testing. Another experimental home-made algorithm. Faster than Hybrid NTT above 50 billion digits. More extensions are planned for the future. |
Integer NTT (x64) |
720 |
23,438 |
5 |
Not Started - Still in experimental phase.
Classic Number-Theoretic Transform over multiple primes. |
| Total Unfinished Code: | 104,811 |
3,516,353 |
425 |
|
| Canceled/On Hold | ||||
Hybrid NTT (v2) |
41,667 |
1,469,808 |
119 |
30% complete - On Hold Indefinitely. (The newly completed VST Algorithm is good enough.) Hybrid NTT version 2 |
| Total Canceled/On Hold Code: | 41,667 |
1,469,808 |
119 |
|
| Grand Total: | 316,622 |
10,680,436 |
1,081 |
|
Notes:
The implementation for each of these is about 3000 lines - including multi-threading and swap. For each algorithm listed here, these 3000 lines are mostly copy/paste duplicates of each other with the necessary changes made for each algorithm.
Formulas:
y-cruncher has two algorithms for each major constant that it can compute - one for computation, and one for verification.
All complexities shown assume multiplication to be O(n log(n)). It is slightly higher than that, but for all practical purposes, O(n log(n)) is close enough.


/zeta(3)_amdeberhan-zeiberger-2.jpg)
/zeta(3)_amdeberhan-zeiberger-1.jpg)



Arithmetic Algorithms:
Multiplication:
y-cruncher uses a combination of grade-school, Karatsuba, Fast Fourier Transform, and Hybridized Number-Theoretic Transform multiplication algorithms.
This rather unique combination of integer and floating-point based algorithms has proven itself to be extremely useful. Not only is it fast and memory efficient, but it is also well suited for parallelism and vectorization.
Even without the use of assembly language optimizations, the current multiplication scheme used in y-cruncher is faster than that of GMP (as of the version used in Mathematica 6).
(GMP is an extremely fast arbitrary precison arithmetic library that is used by virtually every major math program including Mathematica, and Matlab. It is fast due in part to its use of highly optimized and specialized assembly code for different processors.)
The table below summarizes the approximate ranges where each algorithm is used: (for all versions prior to v0.4.3)
(v0.4.3 has a completely different tuning and is a lot more complicated than this.)
| Algorithm | Complexity | Approximate Range (decimal digits) | |
| x86 | x64 | ||
| Basecase (grade-school multiplication) | 1 - 130 | 1 - 170 | |
| Karatsuba | 130 - 1000 | 170 - 560 | |
| 3-Way Toom-Cook | Implemented - But Never Optimal to Use | ||
| Floating-Point Fast Fourier Transform (FFT) | w = # bits in mantissa |
1000 - 4,800,000 | 560 - 120,000 |
| Hybridized Number-Theoretic Transform (Hybrid NTT) | Better than FFT, but worse than SSA Pending Further Analysis... |
> 4,800,000* | > 120,000 |
| Schönhage-Strassen Algorithm (SSA) | Implemented - But Never Optimal to Use** | ||
The current method of having a single threshold between each algorithm is not optimal as it does not take into account for multiple crossover points between adjacent algorithms.
*From my experience, it is never optimal to use Hybrid NTT on x86. FFT is faster for anything under the 32-bit limit.
But in the interest of reducing memory usage for large computations, I've forced y-cruncher to use Hybrid NTT for anything above 4.8 million digits - even if it means significantly slowing down a computation.
**The difference between the run-time complexities of Hybrid NTT and SSA is on the order of double-logarithms. However, SSA has a much larger Big-O constant - so large that Hybrid NTT is a lot faster for any attainable size despite having a slightly inferior complexity.
(Of course, this could change if there are any new ground-breaking optimizations that apply to SSA but not to Hybrid NTT.)
Other:
Nothing special here. These are all standard algorithms and approaches.
As far as optimizations go, Division and Square Roots have been given very little attention and are therefore inefficient. (The current implementations aren't even fully paralleled.) Natural Log via AGM suffers as a result. Division and square roots account for such a tiny portion of the total run-time for the majority of constants that it has not been worth the effort to improve it. Most of the optimizations (as of version 0.5.3) have been concentrated in multiplication and Binary Splitting because they clearly dominate total run-time.