y-cruncher - Language and Algorithms

By Alexander J. Yee

 

(Last updated: March 13, 2015)

 

 

Back To:

Implementation (as of v0.6.8):

 

General Information:

 

Libraries and Dependencies:

y-cruncher has no other non-system dependencies. No Boost. No GMP. Pretty much everything that isn't provided by C++ is built from ground up.

 

 

Compilers:

 

Other Internal Requirements:

 

Code Organization:

 

y-cruncher used have a layered design where each layer is built upon the layer below it. Now it's mostly a collection modules:

Module Files Lines of Code Description
Public 48 4,872

The common support library. It provides a common interface for stuff like time, file I/O, and colored console output.

Digit Viewer 71 9,811

The bundled Digit Viewer.

BBPv2 33 4,255

The bundled BBP digit extraction app for Pi.

Launcher 8 564

The CPU dispatcher that picks the optimal binary to run.

It's module that builds the y-cruncher.exe and y-cruncher.out files.

y-cruncher 197 36,571

y-cruncher itself. This has most of the console UI, the implementations for all the constants, and the supporting math functions.

Objects 39 6,891

The bignum library. It provides the large number objects that are used by y-cruncher.

Modules 1,037 180,205

All the low-level arbitrary-precision arithmetic. It also has all the y-cruncher-specific support libraries such as the parallel computing framework and the multi-layer raid file.

Misc. 7 998

Stupid stuff like global settings, and version numbers/names.

Total: 1,401 244,167  

At one point, y-cruncher had almost 300,000 lines of code. But it has steadily gotten smaller as more and more of the program is migrated to C++.

 

 

Processor-Specific Optimizations:

 

y-cruncher makes fairly heavy use of processor-specific optimizations. These optimizations are largely done manually since modern compilers still cannot optimize as well as a programmer with domain-specific knowledge.

Binary Target Processor(s) Instruction Set Requirements Notes

x64 AVX512-DQ ~ ??

Intel Skylake Xeon

x64, ABM, BMI1, BMI2, ADX,

SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2,

AVX, FMA3, AVX2,

AVX512-(F/CD/VL/BW/DQ)

Planned without an ETA...

x64 AVX2 ~ Airi

Intel Haswell

x64, ABM, BMI1, BMI2,

SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2,

AVX, FMA3, AVX2

Not all Haswell processors support AVX.*

x64 XOP ~ Miyu

AMD Bulldozer

x64, ABM,

SSE, SSE2, SSE3, SSSE3, SSE4.1, SSE4.2,

AVX, FMA4, XOP

Not all Sandy Bridge processors support AVX.*

x64 AVX ~ Hina

Intel Sandy Bridge

x64, SSE, SSE2, SSE3, SSSE3, SSE4.1, AVX

 

x64 SSE4.1 ~ Ushio

Intel Nehalem

x64, SSE, SSE2, SSE3, SSSE3, SSE4.1

 

x64 SSE4.1 ~ Nagisa

Intel Core 2 Penryn

x64, SSE, SSE2, SSE3, SSSE3, SSE4.1

Discontinued as of v0.6.5.

x64 SSE3 ~ Kasumi

AMD K10

x64, SSE, SSE2, SSE3

 

x86 SSE3

-

SSE, SSE2, SSE3

 

x86

-

none

Discontinued as of v0.6.1.

*Some processors do not support all the instructions in their processor line. These will simply fallback to a slower binary.

 

Formulas:

y-cruncher has two algorithms for each major constant that it can compute - one for computation, and one for verification.

 

All complexities shown assume multiplication to be O(n log(n)). It is slightly higher than that, but for all practical purposes, O(n log(n)) is close enough.

 

Square Root of n and Golden Ratio

1st order Newton's Method - Runtime Complexity: O(n log(n))
Note that the final radix conversion from binary to decimal has a complexity of O(n log(n)2).
e

Taylor Series of exp(1): O(n log(n)2)


Taylor Series of exp(-1): O(n log(n)2)


Pi

Chudnovsky Formula: O(n log(n)3)


Ramanujan's Formula: O(n log(n)3)

Log(n)

Machin-Like Formulas: O(n log(n)3)



Starting from v0.6.1, y-cruncher is able to generate and utilize machin-like formulas for the log of any small integer.
 
Zeta(3) - Apery's Constant

Amdeberhan-Zeilberger Formula 2: O(n log(n)3)



Amdeberhan-Zeilberger Formula 1: O(n log(n)3)


Lemniscate

Gauss Formula: O(n log(n)3)


Sebah's Formula: O(n log(n)3)


Both of these formulas were pulled from: http://numbers.computation.free.fr/Constants/PiProgram/userconstants.html

 

Note that the AGM-based algorithms are probably faster. But y-cruncher currently uses these series-based formulas because:

  1. It lacks an implementation of sqrt(N) for arbitrary N. It only supports N being a small integer.
  2. There already is a series summation framework that can be easily applied to the ArcSinlemn() function.

 

Catalan's Constant

Lupas Formula: O(n log(n)3)


Huvent's Formula: O(n log(n)3)



y-cruncher uses the following rearrangement of Huvent's formula:

 
Euler-Mascheroni Constant

Brent-McMillan with Refinement: O(n log(n)3)



                          


Brent-McMillan (alone): O(n log(n)3)



                

Note that both of these formulas are essentially the same.
Therefore, in order for two computations to be independent enough to qualify as a verified record, they MUST be done using different n.

For simplicity of implementation, y-cruncher only uses n that are powers of two - which serves a dual purpose of allowing the use of (the easily computed) Log(2), as well as lending itself to shift optimizations.

 

Arithmetic Algorithms:

 

Expanded Articles:


 

 

Log(n):

Prior to v0.6.1, only log(2) and log(10) were supported using hard-coded machin-like formulas. A generic log(n) was needed for Ramanujan's formula for Catalan's constant. That was implemented using Arithmetic-Geometric Mean (AGM).

 

In v0.6.1, Ramanujan's formula for Catalan's constant was removed - thereby removing the need for a generic log(n). Instead, v0.6.1 supports the computation of log(n) for any small integer n. This is done using a formula generator that generates (at run-time) machin-like formulas for arbitrary small integer n.

 

Generation of machin-like formulas for log(n) is done using table-lookup along with a branch-and-bound search on several argument reduction formulas.

 

 

 

Series Summation:

Series summation is done using standard Binary Splitting techniques with the following catches:

This series summation scheme (including the skewed splitting and backwards summing) has been the same in all versions of y-cruncher to date. All of this is expected to change when GCD factorization is to be incorporated.

 

 

 

Developments:

 

A few years ago, y-cruncher was pretty heavily developed. But I've since left school and found a full-time job. While y-cruncher will probably remain a side-hobby for the near future, it won't get as much attention as it used to get. At the very least, I'll continue to fix whatever bugs that are discovered. And I'll make an effort to keep the program up-to-date with the latest instruction set extensions and development toolchains. But for the most part, the project is done.

 

Most of the work right now is in cleaning up the code and refactoring it into idiomatic C++ for long-term maintainability. The program was (and still largely is) a fragile mess of unreadable C hackery with little to no documentation. So at the very least, I'd like to get it into a state where someone other than myself can read it.

 

Nevertheless, it's an never-ending project. So there are things on the to-do list. But it can be a long time before anything gets done.

 

Feature Description Status
AVX512

The target will be Skylake with AVX512. So we're looking at AVX512-F/CD/VL/BW/DQ.

All of these except for AVX512-CD will be useful for y-cruncher.

Watching a pot boil...

Checkpointing for
Radix Conversion

In both the last two record computations of Pi (Shigeru Kondo - 12.1 trillion, "houkouonchi" - 13.3 trillion), the computation failed during the final base conversion. Since there are no checkpoints in this conversion, more than 10 days were lost in each computation.

 

The radix conversion needs checkpoints. But it will take some work to do since the code predates the checkpointing framework and is fundamentally incompatible with it.

 
Reduced Memory Mode

For Pi Chudnovsky and Ramanujan, add a mode that will allow a computation to be done using less memory/disk at the cost of slower performance.

 

For a typical computation, most of the work requires very little memory. It's the occasional memory spike that causes y-cruncher to have such a high memory requirement.

 

There are about 4 large memory spikes in a Pi computation. In approximate descending order of size, they are:

  1. The final Binary Splitting merge in the series summation.

  2. The Final Multiply.

  3. The radix conversion verification.

  4. The first split of the radix conversion.

These spikes can be flattened via space-time tradeoffs in the respective algorithms. Since the the trade-off only needs to be done at the spikes, the overall performance hit should be reasonably small.

The feature is partially done. But not done enough to be worth enabling publicly.

 

Neverthess, this feature was used in the 12.1 trillion digit computation of Pi.

GCD Factorization

Implement the optimization described here. At minimum do it for Pi Chudnovsky+Ramanujan. See if it can also be done to Zeta(3) and Catalan.

 

Because of the way that y-cruncher micro-manages its memory, this isn't going to be easy to do. Furthermore, y-cruncher lacks frameworks for integer division and segmented sieve.