(Last updated: January 27, 2015)
Libraries and Dependencies:
Other Internal Requirements:
y-cruncher's design is split into 3 major layers. As of v0.6.7:
At one point, y-cruncher had almost 300,000 lines of total code. But it has steadily gotten smaller as more and more of the program is migrated to C++.
y-cruncher makes fairly heavy use of processor-specific optimizations. These optimizations are largely done manually since modern compilers still cannot optimize as well as a programmer with domain-specific knowledge.
The listed instruction set requirements are intentionally vague. There are so many of them now and it's difficult to determine which are actually needed since compilers are finally smart enough to use them behind my back.
|Binary||Target Processor(s)||Instruction Set Requirements||Notes|
|x64 AVX2 ~ Airi||Intel Haswell||
All Haswell instructions - including AVX.
|Not all Haswell processors support AVX.*|
|x64 XOP ~ Miyu||AMD Bulldozer||
All Bulldozer instructions.
|x64 AVX ~ Hina||Intel Sandy Bridge||
All Sandy Bridge instructions - including AVX.
|Not all Sandy Bridge processors support AVX.*|
|x64 SSE4.1 ~ Ushio||Intel Nehalem||
x64, SSE, SSE2, SSE3, SSE4.1
|x64 SSE4.1 ~ Nagisa||Intel Core 2 Penryn||
x64, SSE, SSE2, SSE3, SSE4.1
|Discontinued as of v0.6.5.|
|x64 SSE3 ~ Kasumi||AMD K10||
x64, SSE, SSE2, SSE3
SSE, SSE2, SSE3
|Discontinued as of v0.6.1.|
*These "crippled" processors that lack their architecture's full ISA will fall back to a slower version of the program. This is the case for all the "Pentium" branded processors starting from Sandy Bridge which will fall back to SSE4.1. Likewise, the non-Xeon variants of the upcoming Skylake processors will likely be without AVX512 and will fall back to AVX2 instead.
y-cruncher has two algorithms for each major constant that it can compute - one for computation, and one for verification.
All complexities shown assume multiplication to be O(n log(n)). It is slightly higher than that, but for all practical purposes, O(n log(n)) is close enough.
Both of these formulas were pulled from: http://numbers.computation.free.fr/Constants/PiProgram/userconstants.html
Note that the AGM-based algorithms are probably faster. But y-cruncher currently uses these series-based formulas because:
Prior to v0.6.1, only log(2) and log(10) were supported using hard-coded machin-like formulas. A generic log(n) was needed for Ramanujan's formula for Catalan's constant. That was implemented using Arithmetic-Geometric Mean (AGM).
In v0.6.1, Ramanujan's formula for Catalan's constant was removed - thereby removing the need for a generic log(n). Instead, v0.6.1 supports the computation of log(n) for any small integer n. This is done using a formula generator that generates (at run-time) machin-like formulas for arbitrary small integer n.
Generation of machin-like formulas for log(n) is done using table-lookup along with a branch-and-bound search on several argument reduction formulas.
Series summation is done using standard Binary Splitting techniques with the following catches:
This series summation scheme (including the skewed splitting and backwards summing) has been the same in all versions of y-cruncher to date. All of this is expected to change when GCD factorization is to be incorporated.
A few years ago, y-cruncher was pretty heavily developed. But I've since left school and found a full-time job. While y-cruncher will probably remain a side-hobby for the near future, it won't get as much attention as it used to get. At the very least, I'll continue to fix whatever bugs that are discovered. And I'll make an effort to keep the program up-to-date with the latest instruction set extensions and development toolchains. But for the most part, the project is done.
Most of the work right now is in cleaning up the code and refactoring it into idiomatic C++ for long-term maintainability. The program was (and still largely is) a fragile mess of unreadable C hackery with little to no documentation. So at the very least, I'd like to get it into a state where someone other than myself can read it.
Nevertheless, it's an never-ending project. So there are things on the to-do list. But it can be a long time before anything gets done.
AVX2 support for Haswell processors.
|Command Line Options||
Run the program directly from the command line without needing to enter input.
All the major features in y-cruncher should be accessible via command line.
The target will be Skylake with AVX512. So we're looking at AVX512-F/CD/BW/DQ/VL.
All of these except for AVX512-CD will be useful for y-cruncher.
Waiting for the hardware...
In both the last two record computations of Pi (Shigeru Kondo - 12.1 trillion, "houkouonchi" - 13.3 trillion), the computation failed during the final base conversion. Since there are no checkpoints in this conversion, more than 10 days were lost in each computation.
The radix conversion needs checkpoints. But it will take some work to do since the code predates the checkpointing framework and is fundamentally incompatible with it.
|Reduced Memory Mode||
For Pi Chudnovsky and Ramanujan, add a mode that will allow a computation to be done using less memory/disk at the cost of slower performance.
For a typical computation, most of the work requires very little memory. It's the occasional memory spike that causes y-cruncher to have such a high memory requirement.
There are about 4 large memory spikes in a Pi computation. In approximate descending order of size, they are:
These spikes can be flattened via space-time tradeoffs in the respective algorithms. Since the the trade-off only needs to be done at the spikes, the overall performance hit should be reasonably small.
The feature is partially done. But not done enough to be worth enabling publicly.
Neverthess, this feature was used in the 12.1 trillion digit computation of Pi.
Implement the optimization described here. At minimum do it for Pi Chudnovsky+Ramanujan. See if it can also be done to Zeta(3) and Catalan.
Because of the way that y-cruncher micro-manages its memory, this isn't going to be easy to do. Furthermore, y-cruncher lacks frameworks for integer division and segmented sieve.
|Small Primes NTT||
It seems that y-cruncher is the only serious Pi-program that doesn't use this algorithm. So it's got some catching up to do...
Victor Shoup's butterfly with 63-bit primes seems to strike the right balance between performance and maximum transform size.