y-cruncher - A Multi-Threaded Pi-Program
From a high-school project that went a little too far...
By Alexander J. Yee
(Last updated: February 23, 2018)
The first scalable multi-threaded Pi-benchmark for multi-core systems...
How fast can your computer compute Pi?
y-cruncher is a program that can compute Pi and other constants to trillions of digits.
It is the first of its kind that is multi-threaded and scalable to multi-core systems. Ever since its launch in 2009, it has become a common benchmarking and stress-testing application for overclockers and hardware enthusiasts.
y-cruncher has been used to set several world records for the most digits of Pi ever computed.
Windows: Version 0.7.5 Build 9481 (Released: February 24, 2018)
Linux : Version 0.7.5 Build 9481 (Released: February 24, 2018)
Official HWBOT thread.
Official XtremeSystems Forums thread.
Version 0.7.5: (January 21, 2018)
This release finishes up a set of optimizations that were motivated by the scalability problems on Skylake X. Other than that, there are no major new features.
Most of the optimizations in this version fall into two categories:
The memory optimizations have been ongoing since July of last year. Some of them made it into v0.7.4. This release eliminates the remaining low hanging fruit. At this point, much of the code in y-cruncher is nearing the point of becoming "memory-optimal". So there is little room left for improvement with the current set of algorithms.
Nevertheless, y-cruncher remains memory-bound on the high-core count Skylake X systems - including those with extreme memory overclocks. And unfortunately, this is likely get worse in the future as the CPU/memory gap continues to widen as it has been for the past several decades.
Moving on. Amdahl's Law has been an increasing nuisance in the recent years as core counts continue to rise. Much of the offending code that is not parallelized involved linear operations like large number addition and 1 x N-word multiplication. These operations have never been parallelized since they have historically been:
Amdahl's Law has reached the point where 1 and 3 are not that true anymore. And 2 is just an engineering problem. So the optimizations in this release include an all out assault on Amdahl's Law to search and destroy everything that is not parallelized. (for the most part that is..)
y-cruncher has been used to set a number world record size computations.
Blue: Current World Record
Green: Former World Record
Red: Unverified computation. Does not qualify as a world record until verified using an alternate formula.
|Date Announced||Date Completed:||Source:||Who:||Constant:||Decimal Digits:||Time:||Computer:|
|August 24, 2017||August 23, 2017||Ron Watkins||Euler-Mascheroni Constant||477,511,832,674||4 x Xeon E5-4660 v3 @ 2.1 GHz - 1 TB
2 x Xeon X5690 @ 3.47 GHz - 128 GB
|August 14, 2017||August 13, 2017||Ron Watkins||Zeta(3) - Apery's Constant||500,000,000,000||
8 x Xeon 6550 @ 2.0 GHz - 512 GB
2 x Xeon X5690 @ 3.46 GHz - 142 GB
|November 15, 2016||November 11, 2016||Blog
|Peter Trueb||Pi||22,459,157,718,361||Compute: 105 days||4 x Xeon E7-8890 v3 @ 2.50 GHz
1.25 TB DDR4
20 x 6 TB 7200 RPM Seagate
|September 3, 2016||August 29, 2016||Ron Watkins||e||5,000,000,000,000||2 x Xeon X5690 @ 3.47 GHz
|July 11, 2016||July 5, 2016||"yoyo"||Golden Ratio||10,000,000,000,000||
|2 x Intel Xeon E5-2696 v4 @ 2.2 GHz
|June 28, 2016||June 19, 2016||Ron Watkins||Square Root of 2||10,000,000,000,000||2 x Xeon X5690 @ 3.47 GHz
|June 4, 2016||May 29, 2016||Ron Watkins||Lemniscate||250,000,000,000||4 x Xeon E5-4660 v3 @ 2.1 GHz - 1TB
4 x Xeon X6550 @ 2 GHz - 512 GB
|June 4, 2016||June 2, 2016||"yoyo"||Golden Ratio||5,000,000,000,000||
|2 x Intel Xeon E5-2696 v4 @ 2.2 GHz
|April 24, 2016||April 18, 2016||Ron Watkins||Log(2)||500,000,000,000||4 x Xeon X5690 @ 3.47 GHz - 141 GB|
|April 17, 2016||April 12, 2016||Ron Watkins||Catalan's Constant||250,000,000,000||4 x Xeon E5-4660 v3 @ 2.1 GHz
|April 9, 2016||April 3, 2016||Ron Watkins||Log(10)||500,000,000,000||2 x Xeon X5690 @ 3.47 GHz
|February 8, 2016||February 6, 2016||Mike A||Catalan's Constant||500,000,000,000||
|2 x Intel Xeon E5-2697 v3 @ 2.6 GHz
|July 24, 2015||July 22, 2015
July 23, 2015
|Golden Ratio||2,000,000,000,000||4 x Xeon X6550 @ 2 GHz - 512 GB
Xeon E5-2676 v3 @ 2.4 GHz - 64 GB
|October 8, 2014||October 7, 2014||"houkouonchi"||Pi||13,300,000,000,000||2 x Xeon E5-4650L @ 2.6 GHz
192 GB DDR3 @ 1333 MHz
24 x 4 TB + 30 x 3 TB
|December 28, 2013||December 28, 2013||Source||Shigeru Kondo||Pi||12,100,000,000,050||2 x Xeon E5-2690 @ 2.9 GHz
128 GB DDR3 @ 1600 MHz
24 x 3 TB
See the complete list including other notably large computations.
If you wish to set a record, you must:
*The validation files are protected with a checksum to prevent tampering/cheating. Yes, people have tried to cheat before.
An exception to the "two computations rule" can be made for Pi since it can be verified using BBP formulas.
Note that for anyone attempting to set a Pi world record: Should the attempt succeed, I kindly ask that you make yourself sufficiently available for external requests to access or download the digits in its entirety (at least until it is broken again by someone else). Pi is popular enough that people do actually want to see the digits.
The main computational features of y-cruncher are:
Sample Screenshot: 100 billion digits of Pi
|Core i7 5960X @ 4.0 GHz - 128GB DDR4 @ 2666 MHz - 16 HDs|
Latest Releases: (February 24, 2018)
OS Download Link Size
The Linux version comes in both statically and dynamically linked versions. The static version should work on most Linux distributions, but lacks Cilk Plus and NUMA binding. The dynamic version supports all features, but is less portable due to the DLL dependency hell.
The Windows download comes bundled with the HWBOT submitter which allows benchmarks to be submitted to HWBOT.
- Windows Vista or later.
- The HWBOT submitter requires the Java 8 Runtime.
- 64-bit Linux is required. There is no support for 32-bit.
- The dynamic version has been tested on Ubuntu 17.04.
- An x86 or x64 processor.
Very old systems that don't meet these requirements may be able to run older versions of y-cruncher. Support goes all the way back to even before Windows XP.
Other Downloads (for C++ programmers):
Comparison Chart: (Last updated: January 20, 2018)
Computations of Pi to various sizes. All times in seconds. All computations done entirely in ram.
The timings include the time needed to convert the digits to decimal representation, but not the time needed to write out the digits to disk.
Laptops + Low-Power:
|Processor(s):||Core i7 3630QM||VIA C4650||Pentium N42001||Xeon E3-1535M v5||Core i7 6820HK|
|Generation:||Intel Ivy Bridge||VIA Isaiah||Intel Apollo Lake||Intel Skylake||Intel Skylake|
|Processor Speed:||3.2 GHz||2.0 GHz||1.1 - 2.5 GHz||2.9 GHz||3.2 GHz|
|Memory:||8 GB - 1600 MT/s||16 GB||4 GB||16 GB||48 GB - 2133 MT/s|
|Version:||v0.7.2 ~ Hina||v0.7.2 ~ Hina||v0.7.2 ~ Ushio||v0.7.1 ~ Kurumi||v0.7.5 ~ Kurumi|
|Instruction Set:||x64 AVX||x64 AVX||x64 SSE4.1||x64 AVX2 + ADX||x64 AVX2 + ADX|
|Processor(s):||Core 2 Quad Q6600||Core i7 920||FX-8350||Core i7 4770K||Core i7 5775C||Core i7 7700K||Ryzen 7 1800X|
|Generation:||Intel Core||Intel Nehalem||AMD Piledriver||Intel Haswell||Intel Broadwell||Intel Kaby Lake||AMD Zen|
|Processor Speed:||2.4 GHz||3.5 GHz (OC)||4.0 GHz||4.0 GHz (OC)||3.8 GHz (OC)||4.8 GHz (OC)||3.8 GHz|
|Memory:||6 GB - 800 MT/s||12 GB - 1333 MT/s||32 GB - 1600 MT/s||32 GB - 2133 MT/s||16 GB - 2400 MT/s||64 GB - 3000 MT/s||64 GB - 2666 MT/s|
|Program Version:||v0.7.2 ~ Kasumi||v0.7.5 ~ Ushio||v0.7.5 ~ Miyu||v0.7.5 ~ Airi||v0.7.1 ~ Kurumi||v0.7.1 ~ Kurumi||v0.7.5 ~ Yukina|
|Instruction Set:||x64 SSE3||x64 SSE4.1||x64 AVX + XOP||x64 AVX2||x64 AVX2 + ADX||x64 AVX2 + ADX||x64 AVX2 + ADX|
|Credit:||André Bachmann||Oliver Kruse|
|Processor(s):||Core i7 5820K||Core i7 5960X||Threadripper 1950X||Core i9 7900X||Core i9 7940X|
|Generation:||Intel Haswell||Intel Haswell||AMD Threadripper||Intel Skylake X||Intel Skylake X|
|Processor Speed:||4.5 GHz (OC)||4.0 GHz (OC)||4.0 GHz (OC)||
|3.0 GHz cache||2.8 GHz cache|
|Memory:||32 GB - 2400 MT/s||64 GB - 2400 MT/s||128 GB - 2800-3200 MT/s||128 GB - 3200 MT/s||128 GB - 3400 MT/s|
|Program Version:||v0.7.3 ~ Airi||v0.7.4 ~ Airi||v0.7.3 ~ Yukina||v0.7.3 ~ Kotori||v0.7.5 ~ Kotori||v0.7.5 ~ Kotori|
|Instruction Set:||x64 AVX2||x64 AVX2||x64 AVX2 + ADX||x64 AVX512-DQ||x64 AVX512-DQ|
|Credit:||Sean Heneghan||Oliver Kruse|
*All-core non-AVX/AVX/AVX512 CPU frequency.
Due to high core count and the effect of NUMA (Non-Uniform Memory Access), performance on multi-processor systems are extremely sensitive to various settings. Therefore, these benchmarks may not be entirely representative of what the hardware is capable of.
|Processor(s):||Xeon E5-2683 v3||Xeon E5-2687W v4||Xeon E5-2696 v4||Xeon E7-8880 v3||Epyc 7601||Xeon Gold 6130F|
|Generation:||Intel Haswell||Intel Broadwell||Intel Broadwell||Intel Haswell||AMD Naples||Intel Skylake Purley|
|Processor Speed:||2.03 GHz||3.0 GHz||2.2 GHz||2.3 GHz||2.2 GHz||2.1 GHz|
|Memory:||128 GB - ???||64 GB||768 GB - ???||2 TB - ???||256 GB - ??||256 GB - ??|
|Program Version:||v0.6.9 ~ Airi||v0.7.4 ~ Kurumi||v0.7.1 ~ Kurumi||v0.7.1 ~ Airi||v0.7.3 ~ Yukina||v0.7.3 ~ Kotori|
|Instruction Set:||x64 AVX2||x64 AVX2 + ADX||x64 AVX2 + ADX||x64 AVX2||x64 AVX2 + ADX||x64 AVX512-DQ|
|Credit:||Shigeru Kondo||Cameron Giesbrecht||"yoyo"||Jacob Coleman||Dave Graham|
|Processor(s):||Xeon X5482||Xeon E5-2690|
|Generation:||Intel Penryn||Intel Sandy Bridge|
|Processor Speed:||3.2 GHz||3.5 GHz|
|Memory:||64 GB - 800 MT/s||256 GB - ???|
|Program Version:||v0.7.2 ~ Ushio||v0.7.5 ~ Nagisa||v0.6.2/3 ~ Hina|
|Instruction Set:||x64 SSE4.1||x64 AVX|
The full chart of rankings for each size can be found here:
These fastest times may include unreleased betas.
Got a faster time? Let me know: email@example.com
Note that I usually don't respond to these emails. I simply put them into the charts which I update periodically.
Because of the memory-intensive nature of computing Pi and other constants, y-cruncher needs a lot of memory bandwidth to perform well. In fact, the program has been noticably memory bound on nearly all high-end desktops since 2012 as well as the majority of multi-socket systems since at least 2006.
Don't be surprised if y-cruncher exposes instabilities that other applications and stress-tests do not. y-cruncher is unusual in that it simultaneously places a heavy load on both the CPU and the entire memory subsystem.
y-cruncher has a lot of settings for tuning parallel performance. By default, it makes a best effort to analyze the hardware and pick the best settings. But because of the virtually unlimited combinations of processor topologies, it's difficult for y-cruncher to optimally pick the best settings for everything. So sometimes the best performance can only be achieved with manual settings.
*These are advanced settings that cannot be changed if you're using the benchmark option in the console UI. To change them, you will need to either run benchmark mode from the command line or use the custom compute menu.
Load imbalance is a faily common problem in y-cruncher. The usual causes are:
This is probably one of the most complicated features in y-cruncher.
Everything in this section is in the process of being re-verified and moved to: https://github.com/Mysticial/y-cruncher/issues
Pi and other Constants:
Hardware and Overclocking:
Here's some interesting sites dedicated to the computation of Pi and other constants:
Contact me via e-mail. I'm pretty good with responding unless it gets caught in my school's junk mail filter.