y-cruncher - A Multi-Threaded Pi-Program
From a high-school project that went a little too far...
By Alexander J. Yee
(Last updated: October 10, 2016)
The first scalable multi-threaded Pi-benchmark for multi-core systems...
How fast can your computer compute Pi?
y-cruncher is a program that can compute Pi and other constants to trillions of digits.
It is the first of its kind that is multi-threaded and scalable to multi-core systems. Ever since its launch in 2009, it has become a common benchmarking and stress-testing application for overclockers and hardware enthusiasts.
y-cruncher has been used to set several world records for the most digits of Pi ever computed.
Windows: Version 0.7.1 Build 9466 (Released: September 16, 2016)
Linux : Version 0.7.1 Build 9466 (Released: September 16, 2016)
Official HWBOT thread.
Official XtremeSystems Forums thread.
Knights Landing Xeon Phi with AVX512: (October 10, 2016)
After more than 2 years of waiting, y-cruncher with AVX512 has finally been tested on native hardware. David Carver was kind enough to test drive an internal version of y-cruncher v0.7.1 which has the AVX512-CD binary enabled. Here it is compared to some more conventional machines:
|Processors:||Core i7 5960X||2 x Xeon E5-2696 v4||Xeon Phi 7250|
|Processor Speed:||4.0 GHz (OC)||2.2 GHz||1.4 GHz|
|Binary:||AVX2||AVX2 + ADX||AVX2 + ADX||AVX512-CD|
The AVX512-CD binary uses AVX512 Foundation and Conflict-Detection instructions. It has been in development since early 2014, but has never been run on native hardware until now. Now it has been confirmed to work well enough to do a Pi benchmark.
Performance-wise, Knights Landing falls short of the highest-end Haswell-E and Broadwell-E systems. Furthermore, the AVX2 -> AVX512 scaling is a lackluster 34%. For now, the reason remains unknown. But it's currently hypothesized to be either memory bandwidth or Amdahl's Law.
It's worth noting that y-cruncher is completely untuned for the Knights Landing architecture. Nearly all optimizations and tuning settings are the same as the desktop chips. So there's likely more performance left to be squeezed out. But due to the cost of Xeon Phi systems along with the general inaccessibility to consumers, it will be a while before y-cruncher has any properly tuned binaries for Knights Landing (if ever).
The AVX512-CD binary (for both Windows and Linux) is available upon request to anyone who sends me a Knights Landing benchmark. But for now, I'm hesitant to formally release it since it hasn't been sufficiently tested. (A pi benchmark has very poor test-coverage of the entire program.)
In addition to the AVX512-CD binary, y-cruncher also has AVX512-DQ and AVX512-IFMA binaries for Skylake Purley and Cannonlake. But assuming Intel sticks with its policy of massive delays, it will be a quite while before either of them see the light of day.
y-cruncher has been used to set a number world record size computations.
Blue: Current World Record
Green: Former World Record
Red: Unverified computation. Does not qualify as a world record until verified using an alternate formula.
|Date Announced||Date Completed:||Source:||Who:||Constant:||Decimal Digits:||Time:||Computer:|
|September 3, 2016||August 29, 2016||Ron Watkins||e||5,000,000,000,000||2 x Xeon X5690 @ 3.47 GHz
|August 14, 2016||June 26, 2016||Ron Watkins||Euler-Mascheroni Constant||477,511,832,674||
|4 x Xeon E5-4660 v3 @ 2.1 GHz
|July 11, 2016||July 5, 2016||"yoyo"||Golden Ratio||10,000,000,000,000||
|2 x Intel Xeon E5-2696 v4 @ 2.2 GHz
|June 28, 2016||June 19, 2016||Ron Watkins||Square Root of 2||10,000,000,000,000||2 x Xeon X5690 @ 3.47 GHz
|June 4, 2016||May 29, 2016||Ron Watkins||Lemniscate||250,000,000,000||4 x Xeon E5-4660 v3 @ 2.1 GHz - 1TB
4 x Xeon X6550 @ 2 GHz - 512 GB
|June 4, 2016||June 2, 2016||"yoyo"||Golden Ratio||5,000,000,000,000||
|2 x Intel Xeon E5-2696 v4 @ 2.2 GHz
|May 25, 2016||May 18, 2016||Ron Watkins||Euler-Mascheroni Constant||250,000,000,000||2 x Xeon E5-4660 v3 @ 2.1 GHz - 1 TB
4 x Xeon X6550 @ 2.0 GHz - 512 GB
|April 24, 2016||April 18, 2016||Ron Watkins||Log(2)||500,000,000,000||4 x Xeon X5690 @ 3.47 GHz - 141 GB|
|April 17, 2016||April 12, 2016||Ron Watkins||Catalan's Constant||250,000,000,000||4 x Xeon E5-4660 v3 @ 2.1 GHz
|April 9, 2016||April 3, 2016||Ron Watkins||Log(10)||500,000,000,000||2 x Xeon X5690 @ 3.47 GHz
|February 8, 2016||February 6, 2016||Mike A||Catalan's Constant||500,000,000,000||
|2 x Intel Xeon E5-2697 v3 @ 2.6 GHz
|December 21, 2015||December 21, 2015||Dipanjan Nag||Zeta(3) - Apery's Constant||400,000,000,000||Xeon E5-2698B @ 2.0 GHz - 224 GB|
|July 24, 2015||July 22, 2015
July 23, 2015
|Golden Ratio||2,000,000,000,000||4 x Xeon X6550 @ 2 GHz - 512 GB
Xeon E5-2676 v3 @ 2.4 GHz - 64 GB
|October 8, 2014||October 7, 2014||"houkouonchi"||Pi||13,300,000,000,000||
|2 x Xeon E5-4650L @ 2.6 GHz
192 GB DDR3 @ 1333 MHz
24 x 4 TB + 30 x 3 TB
|December 28, 2013||December 28, 2013||Source||Shigeru Kondo||Pi||12,100,000,000,050||2 x Xeon E5-2690 @ 2.9 GHz
128 GB DDR3 @ 1600 MHz
24 x 3 TB
See the complete list including other notably large computations.
If you wish to set a record, you must run two computations using different formulas (one to compute, the other to verify). Then send me the validation files, but do not make any attempt to modify them. The validation files are protected with a checksum to prevent tampering/cheating. Yes, people have tried to cheat before.
An exception to the "two computations rule" can be made for Pi since it can be verified using BBP formulas.
Note that for anyone attempting to set a Pi world record: Should the attempt succeed, I kindly ask that you make yourself sufficiently available for external requests to access or download the digits in its entirety (at least until it is broken again by someone else). Pi is popular enough that people do actually want to see the digits.
Aside from computing Pi and other constants, y-cruncher is great for stress testing 64-bit systems with lots of ram.
Latest Releases: (September 16, 2016)
OS Programs Download Link Size
y-cruncher + HWBOT Submitter
HWBOT Submitter Only
The Linux version comes in both statically and dynamically linked versions. The static version should work on most Linux distributions, but lacks Cilk Plus. The dynamic version supports Cilk Plus, but is less portable due to the DLL dependency hell.
The HWBOT submitter allows y-cruncher benchmarks to be submitted to HWBOT - which is a competitive overclocking site. It is currently only available for Windows.
- Windows Vista or later.
- The HWBOT submitter requires the Java 8 Runtime.
- 64-bit Linux is required. There is no support for 32-bit.
- The dynamic version has been tested on Ubuntu 15.10 and 16.04.
- An x86 or x64 processor.
Very old systems that don't meet these requirements may be able to run older versions of y-cruncher. Support goes all the way back to even before Windows XP.
Other Downloads (for C++ programmers):
So while it may be difficult to believe, Windows is currently the more suitable OS for running y-cruncher.
Comparison Chart: (Last updated: May 14, 2016)
Computations of Pi to various sizes. All times in seconds. All computations done entirely in ram.
The timings include the time needed to convert the digits to decimal representation, but not the time needed to write out the digits to disk.
|Processor(s):||Core i7 3630QM||Core i7 6820HK|
|Generation:||Intel Ivy Bridge||Intel Skylake|
|Processor Speed:||3.2 GHz||3.2 GHz|
|Memory:||8 GB - 1600 MHz||48 GB - 2133 MHz|
|Version:||v0.7.1 - AVX||v0.7.1 - ADX|
|Processor(s):||Core 2 Quad Q6600||Core i7 920||FX-8350||Core i7 4770K||Core i7 5960X||Core i7 5775C*||Core i7 6700K**|
|Generation:||Intel Core||Intel Nehalem||AMD Piledriver||Intel Haswell||Intel Haswell||Intel Broadwell||Intel Skylake|
|Processor Speed:||2.4 GHz||3.5 GHz (OC)||4.0 GHz||4.0 GHz (OC)||4.0 GHz (OC)||3.8 GHz (OC)||4.6 GHz (OC)|
|Memory:||6 GB - 800 MHz||12 GB - 1333 MHz||32 GB - 1333 MHz||32 GB - 2400 MHz||64 GB - 2400 MHz||16 GB - 2400 MHz||64 GB - 2800 MHz|
|Version:||v0.7.1 - SSE3||v0.7.1 - SSE4.1||v0.7.1 - XOP||v0.7.1 - AVX2||v0.7.1 - AVX2||v0.7.1 - ADX||v0.6.9 - AVX2|
*Credit to André Bachmann.
**Credit to Oliver Kruse.
|Processor(s):||2 x Xeon X5482||2 x Xeon E5-26901||2 x Xeon E5-2683 v31||2 x Xeon E5-2696 v42||4 x Xeon E7-8880 v33|
|Generation:||Intel Penryn||Intel Sandy Bridge||Intel Haswell||Intel Broadwell||Intel Haswell|
|Processor Speed:||3.2 GHz||3.5 GHz||2.03 GHz||2.2 GHz||2.3 GHz|
|Memory:||64 GB - 800 MHz||256 GB - ???||128 GB - ???||768 GB - ???||2 TB - ???|
|Version:||v0.6.9 - SSE4.1||v0.6.2/3 - AVX||v0.6.9 - AVX2||v0.7.1 - ADX||v0.7.1 - AVX2|
1Credit to Shigeru Kondo.
2Credit to "yoyo".
3Credit to Jacob Coleman
The full chart of rankings for each size can be found here:
These fastest times may include unreleased betas.
Got a faster time? Let me know: firstname.lastname@example.org
Note that I usually don't respond to these emails. I simply put them into the charts which I update periodically.
Pi and other Constants:
Hardware and Overclocking:
Here's some interesting sites dedicated to the computation of Pi and other constants:
Contact me via e-mail. I'm pretty good with responding unless it gets caught in my school's junk mail filter.