(Last updated: March 13, 2015)
Back To:
Implementation (as of v0.6.8):
General Information:
Libraries and Dependencies:
ycruncher has no other nonsystem dependencies. No Boost. No GMP. Pretty much everything that isn't provided by C++ is built from ground up.
Compilers:
Other Internal Requirements:
Code Organization:
ycruncher used have a layered design where each layer is built upon the layer below it. Now it's mostly a collection modules:
Module  Files  Lines of Code  Description 
Public  48  4,872  The common support library. It provides a common interface for stuff like time, file I/O, and colored console output. 
Digit Viewer  71  9,811  The bundled Digit Viewer. 
BBPv2  33  4,255  The bundled BBP digit extraction app for Pi. 
Launcher  8  564  The CPU dispatcher that picks the optimal binary to run. It's module that builds the ycruncher.exe and ycruncher.out files. 
ycruncher  197  36,571  ycruncher itself. This has most of the console UI, the implementations for all the constants, and the supporting math functions. 
Objects  39  6,891  The bignum library. It provides the large number objects that are used by ycruncher. 
Modules  1,037  180,205  All the lowlevel arbitraryprecision arithmetic. It also has all the ycruncherspecific support libraries such as the parallel computing framework and the multilayer raid file. 
Misc.  7  998  Stupid stuff like global settings, and version numbers/names. 
Total:  1,401  244,167 
At one point, ycruncher had almost 300,000 lines of code. But it has steadily gotten smaller as more and more of the program is migrated to C++.
ProcessorSpecific Optimizations:
ycruncher makes fairly heavy use of processorspecific optimizations. These optimizations are largely done manually since modern compilers still cannot optimize as well as a programmer with domainspecific knowledge.
Binary  Target Processor(s)  Instruction Set Requirements  Notes 
x64 AVX512DQ ~ ?? 
Intel Skylake Xeon 
x64, ABM, BMI1, BMI2, ADX, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, FMA3, AVX2, AVX512(F/CD/VL/BW/DQ) 
Planned without an ETA... 
x64 AVX2 ~ Airi 
Intel Haswell 
x64, ABM, BMI1, BMI2, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, FMA3, AVX2 
Not all Haswell processors support AVX.* 
x64 XOP ~ Miyu 
AMD Bulldozer 
x64, ABM, SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2, AVX, FMA4, XOP 
Not all Sandy Bridge processors support AVX.* 
x64 AVX ~ Hina 
Intel Sandy Bridge 
x64, SSE, SSE2, SSE3, SSSE3, SSE4.1, AVX 

x64 SSE4.1 ~ Ushio 
Intel Nehalem 
x64, SSE, SSE2, SSE3, SSSE3, SSE4.1 

x64 SSE4.1 ~ Nagisa 
Intel Core 2 Penryn 
x64, SSE, SSE2, SSE3, SSSE3, SSE4.1 
Discontinued as of v0.6.5. 
x64 SSE3 ~ Kasumi 
AMD K10 
x64, SSE, SSE2, SSE3 

x86 SSE3 
  SSE, SSE2, SSE3 

x86 
  none 
Discontinued as of v0.6.1. 
*Some processors do not support all the instructions in their processor line. These will simply fallback to a slower binary.
ycruncher has two algorithms for each major constant that it can compute  one for computation, and one for verification.
All complexities shown assume multiplication to be O(n log(n)). It is slightly higher than that, but for all practical purposes, O(n log(n)) is close enough.
Both of these formulas were pulled from: http://numbers.computation.free.fr/Constants/PiProgram/userconstants.html
Note that the AGMbased algorithms are probably faster. But ycruncher currently uses these seriesbased formulas because:
Expanded Articles:
Log(n):
Prior to v0.6.1, only log(2) and log(10) were supported using hardcoded machinlike formulas. A generic log(n) was needed for Ramanujan's formula for Catalan's constant. That was implemented using ArithmeticGeometric Mean (AGM).
In v0.6.1, Ramanujan's formula for Catalan's constant was removed  thereby removing the need for a generic log(n). Instead, v0.6.1 supports the computation of log(n) for any small integer n. This is done using a formula generator that generates (at runtime) machinlike formulas for arbitrary small integer n.
Generation of machinlike formulas for log(n) is done using tablelookup along with a branchandbound search on several argument reduction formulas.
Series Summation:
Series summation is done using standard Binary Splitting techniques with the following catches:
This series summation scheme (including the skewed splitting and backwards summing) has been the same in all versions of ycruncher to date. All of this is expected to change when GCD factorization is to be incorporated.
A few years ago, ycruncher was pretty heavily developed. But I've since left school and found a fulltime job. While ycruncher will probably remain a sidehobby for the near future, it won't get as much attention as it used to get. At the very least, I'll continue to fix whatever bugs that are discovered. And I'll make an effort to keep the program uptodate with the latest instruction set extensions and development toolchains. But for the most part, the project is done.
Most of the work right now is in cleaning up the code and refactoring it into idiomatic C++ for longterm maintainability. The program was (and still largely is) a fragile mess of unreadable C hackery with little to no documentation. So at the very least, I'd like to get it into a state where someone other than myself can read it.
Nevertheless, it's an neverending project. So there are things on the todo list. But it can be a long time before anything gets done.
Feature  Description  Status 
AVX512  The target will be Skylake with AVX512. So we're looking at AVX512F/CD/VL/BW/DQ. All of these except for AVX512CD will be useful for ycruncher. 
Watching a pot boil... 
Checkpointing for Radix Conversion 
In both the last two record computations of Pi (Shigeru Kondo  12.1 trillion, "houkouonchi"  13.3 trillion), the computation failed during the final base conversion. Since there are no checkpoints in this conversion, more than 10 days were lost in each computation.
The radix conversion needs checkpoints. But it will take some work to do since the code predates the checkpointing framework and is fundamentally incompatible with it. 

Reduced Memory Mode  For Pi Chudnovsky and Ramanujan, add a mode that will allow a computation to be done using less memory/disk at the cost of slower performance.
For a typical computation, most of the work requires very little memory. It's the occasional memory spike that causes ycruncher to have such a high memory requirement.
There are about 4 large memory spikes in a Pi computation. In approximate descending order of size, they are:
These spikes can be flattened via spacetime tradeoffs in the respective algorithms. Since the the tradeoff only needs to be done at the spikes, the overall performance hit should be reasonably small. 
The feature is partially done. But not done enough to be worth enabling publicly.
Neverthess, this feature was used in the 12.1 trillion digit computation of Pi. 
GCD Factorization  Implement the optimization described here. At minimum do it for Pi Chudnovsky+Ramanujan. See if it can also be done to Zeta(3) and Catalan.
Because of the way that ycruncher micromanages its memory, this isn't going to be easy to do. Furthermore, ycruncher lacks frameworks for integer division and segmented sieve. 
