y-cruncher - A Multi-Threaded Pi-Program
From a high-school project that went a little too far...
By Alexander J. Yee
(Last updated: August 14, 2019)
The first scalable multi-threaded Pi-benchmark for multi-core systems...
How fast can your computer compute Pi?
y-cruncher is a program that can compute Pi and other constants to trillions of digits.
It is the first of its kind that is multi-threaded and scalable to multi-core systems. Ever since its launch in 2009, it has become a common benchmarking and stress-testing application for overclockers and hardware enthusiasts.
y-cruncher has been used to set several world records for the most digits of Pi ever computed.
Windows: Version 0.7.7 Build 9501 (Released: April 22, 2019)
Linux : Version 0.7.7 Build 9501 (Released: April 22, 2019)
Official Mersenneforum Subforum (new).
Official HWBOT forum thread.
Technology Wishlist: (June 24, 2019) - permalink
For some number of years, I kept a mental list of hardware improvements that could benefit the project in some way. This list includes pretty much anything that's hardware-related since those are generally beyond my control. So stuff like faster hard drives, high endurance SSDs, and instruction set extensions...
The full page is here. And I will probably keep it updated as things change.
To be clear, these are feature requests. But I have no expectation that any of them will ever happen.
Commercial Use Reminders: (April 22, 2019) - permalink
Due to the amount of publicity that Google's latest computation has generated, there are multiple parties who have expressed interest in doing the same thing - that is to advertise a product or service by setting a world record for Pi.
As unpopular as this will sound, I must remind everyone (including those who have not reached out to me) that y-cruncher is not free for such commercial use. So if you wish to repeat what Google did, you too will need to acquire a commercial use license.
To put it frankly, corporations are not allowed to profit off my work for free unless it falls under one of the exceptions. In the past, I've typically allowed minor violations to slip since I usually don't care and I'm not exactly the kind of person to chase people down. But things are starting to get out of hand now.
y-cruncher has been used to set a number of world record sized computations.
Blue: Current World Record
Green: Former World Record
Red: Unverified computation. Does not qualify as a world record until verified using an alternate formula.
|Date Announced||Date Completed:||Source:||Who:||Constant:||Decimal Digits:||Time:||Computer:|
|July 21, 2019||July 16, 2019||
2 x Intel Xeon Gold 6140 @ 2.3 GHz
|June 6, 2019||May 26, 2019||Ian Cutress||Zeta(3) - Apery's Constant||1,000,000,000,000||
2 x Intel Xeon 8260L @ 2.4 GHz
768 TB + 6 TB Optane
2 x Xeon 8280 @ 2.7 GHz
|May 22, 2019||May 21, 2019||Screen||
2 x Intel Xeon Gold 6140 @ 2.3 GHz
2 x Intel Xeon 8260L @ 2.4 GHz
768 TB + 6 TB Optane
|April 29, 2019||April 26, 2019||Jacob Riffee||Log(2)||1,000,000,000,000||Compute: 30.9 days||
Intel Xeon E5-2660 @ 2.2 GHz
32 GB + 6 x 4 TB HD
|April 22, 2019||April 20, 2019||Ian Cutress||Gamma(1/4)||362,560,990,822||Compute: 3.01 days|
|April 15, 2019||April 14, 2019||Ian Cutress||Log(10)||600,000,000,000||
2 x Intel Xeon 8170M @ 2.1 GHz
384 GB + 2 x 2 TB SSD
|March 25, 2019||March 24, 2019||
AMD Threadripper 1950X @ 3.4 GHz
|March 14, 2019||January 21, 2019||
|Emma Haruka Iwao||Pi||31,415,926,535,897||Compute: 121 days||2 x Undisclosed Intel Xeon @ 2.00 GHz
> 1.40 TB DDR4
> 240 TB SSD
|March 3, 2019||March 3, 2019||Screen||Alexander Yee||Gamma(1/3)||200,000,000,000||Compute: 42.5 hours||
Intel Core i7 5960X @ 4.0 GHz
64 GB - 16 x 2TB 7200 RPM
|January 6, 2019||January 6, 2019||Tizian Hanselmann||Golden Ratio||3,000,000,000,100||
Intel Xeon X5650 @ 2.67 GHz
|January 3, 2019||January 3, 2019||Gerald Hofmann||e||8,000,000,000,000||Compute: 28.5 days||
2 x AMD Epyc 7551 @ 2.0 GHz
|November 30, 2018||November 27, 2018||Kevin Humphreys||Golden Ratio||3,000,000,000,000||Compute: 10.3 days||
Intel Xeon E5-2640 v2 @ 2.0 GHz
|August 24, 2017||August 23, 2017||Ron Watkins||Euler-Mascheroni Constant||477,511,832,674||4 x Xeon E5-4660 v3 @ 2.1 GHz - 1 TB
2 x Xeon X5690 @ 3.47 GHz - 128 GB
|January 21, 2019||January 17, 2019||Gerald Hofmann||Golden Ratio||16,180,339,887,498||
2 x AMD Epyc 7551 @ 2.0 GHz
|November 15, 2016||November 11, 2016||Blog
|Peter Trueb||Pi||22,459,157,718,361||Compute: 105 days||4 x Xeon E7-8890 v3 @ 2.50 GHz
1.25 TB DDR4
20 x 6 TB 7200 RPM Seagate
|June 28, 2016||June 19, 2016||Ron Watkins||Square Root of 2||10,000,000,000,000||2 x Xeon X5690 @ 3.47 GHz
|October 8, 2014||October 7, 2014||
Sandon Van Ness
|Pi||13,300,000,000,000||2 x Xeon E5-4650L @ 2.6 GHz
192 GB DDR3 @ 1333 MHz
24 x 4 TB + 30 x 3 TB
|December 28, 2013||December 28, 2013||Source||Shigeru Kondo||Pi||12,100,000,000,050||2 x Xeon E5-2690 @ 2.9 GHz
128 GB DDR3 @ 1600 MHz
24 x 3 TB
See the complete list including other notably large computations. If you want to set a record yourself, the rules are in that link.
The main computational features of y-cruncher are:
Latest Releases: (April 22, 2019)
Downloading any of these files constitutes as acceptance of the license agreement.
OS Download Link Size
The Linux version comes in both statically and dynamically linked versions. The static version should work on most Linux distributions, but lacks Cilk Plus and NUMA binding. The dynamic version supports all features, but is less portable due to the DLL dependency hell.
The Windows download comes bundled with the HWBOT submitter which allows benchmarks to be submitted to HWBOT.
- Windows Vista or later.
- The HWBOT submitter requires the Java 8 Runtime.
- 64-bit Linux is required. There is no support for 32-bit.
- The dynamic version has been tested on Ubuntu 18.04.
- An x86 or x64 processor.
Very old systems that don't meet these requirements may be able to run older versions of y-cruncher. Support goes all the way back to even before Windows XP.
Other Downloads (for C++ programmers):
Comparison Chart: (Last updated: April 6, 2019)
Computations of Pi to various sizes. All times in seconds. All computations done entirely in ram.
The timings include the time needed to convert the digits to decimal representation, but not the time needed to write out the digits to disk.
Blue: Benchmarks are up-to-date with the latest version of y-cruncher.
Green: Benchmarks were done with an old version of y-cruncher that is comparable in performance with the current release.
Red: Benchmarks are significantly out-of-date due to being run with an old version of y-cruncher that is no longer comparable with the current release.
Purple: Benchmarks are from unreleased internal builds that are not speed comparable with the current release.
Laptops + Low-Power:
|Processor(s):||Core i7 3630QM||VIA C4650||Pentium N42001||Xeon E3-1535M v5||Core i7 8565U||Core i7 6820HK||Core i7 8850H|
|Generation:||Intel Ivy Bridge||VIA Isaiah||Intel Apollo Lake||Intel Skylake||Intel Kaby Lake R||Intel Skylake||Intel Coffee Lake|
|Processor Speed:||3.2 GHz||2.0 GHz||1.1 - 2.5 GHz||2.9 GHz||2.3 - 4.6 GHz||3.2 GHz||?? GHz|
|Memory:||16 GB - 1600 MT/s||16 GB||4 GB||16 GB||8 GB||48 GB - 2133 MT/s||16 GB|
|Version:||v0.7.6 (11-SNB)||v0.7.2 (11-SNB)||v0.7.2 (08-NHM)||v0.7.1 (14-BDW)||v0.7.7 (14-BDW)||v0.7.8 (14-BDW)||v0.7.6 (14-BDW)||v0.7.7 (14-BDW)|
|Instruction Set:||x64 AVX||x64 AVX||x64 SSE4.1||x64 AVX2 + ADX||x64 AVX2 + ADX||x64 AVX2 + ADX||x64 AVX2 + ADX||x64 AVX2 + ADX|
|Credit:||Oliver Kruse||Tralalak||Kaupo Karuse||yoyo|
|Processor(s):||Core i3 8121U (Linux*)|
|Generation:||Intel Cannon Lake|
|Processor Speed:||3.1 GHz||3.0 - 3.1 GHz||2.5 - 3.0 GHz||2.6 - 3.1 GHz||2.5 - 3.0 GHz||2.3 - 2.9 GHz|
|Version:||v0.7.7 (05-A64)||v0.7.7 (08-NHM)||v0.7.7 (11-SNB)||v0.7.7 (13-HSW)||v0.7.7 (14-BDW)||v0.7.7 (17-ZN1)||v0.7.7 (16-KNL)||v0.7.7 (17-SKX)||v0.7.7 (18-CNL)|
|Instruction Set:||x64 SSE3||x64 SSE4.1||x64 AVX||x64 AVX2||x64 AVX2 + ADX||x64 AVX2 + ADX||x64 AVX512-F||x64 AVX512-DQ||x64 AVX512-VBMI|
|Processor(s):||Core i3 8121U (Windows*)|
|Generation:||Intel Cannon Lake|
|Processor Speed:||2.6 - 3.1 GHz||2.6 - 3.0 GHz||2.4 - 2.9 GHz||2.6 - 3.0 GHz||2.6 - 3.0 GHz||2.4 - 2.9 GHz|
|Version:||v0.7.7 (14-BDW)||v0.7.7 (17-SKX)||v0.7.7 (18-CNL)||v0.7.8 (14-BDW)||v0.7.8 (17-SKX)||v0.7.8 (18-CNL)|
|Instruction Set:||x64 AVX2 + ADX||x64 AVX512-DQ||x64 AVX512-VBMI||x64 AVX2 + ADX||x64 AVX512-DQ||x64 AVX512-VBMI|
*The clock speeds on this Cannon Lake CPU were extremely dynamic due to power throttling. Furthermore, the more optimized the binary is, the harder the clock speeds were throttled - with 18-CNL being the worst as it is specifically optimized for this CPU. The side effect of the throttling is that it made very it difficult to investigate why Linux is ~10% slower. As this is an off-the-shelf system (an Intel NUC), there are no overclocking options to lock down the clock speed. Single-threaded tests were more stable and has Windows+Linux within 2% of each other. Perhaps Linux has different CPU power management that makes it throttle harder than in Windows.
|Processor(s):||Core i7 7700K||Ryzen 7 1800X||Ryzen 7 2700||Ryzen 7 3700X||Core i7 8700K||Core i7 9700K||Core i9 9900K|
|Generation:||Intel Kaby Lake||AMD Zen||AMD Zen+||AMD Zen 2||Intel Coffee Lake||Intel Coffee Lake||Intel Coffee Lake|
|Processor Speed:||4.9 GHz (OC)||3.7 GHz||3.2 GHz||3.6 GHz||4.9 - 5.0 GHz (OC)||4.6 GHz||4.7 GHz|
|Memory:||64 GB - 3200 MT/s||64 GB - 3000 MT/s||64 GB - 2866 MT/s||64 GB - 2400 MT/s||64 GB - 2400 MT/s||16 GB - 3600 MT/s||16 GB - 3600 MT/s||32 GB - 3600 MT/s|
|Program Version:||v0.7.6 (14-BDW)||v0.7.6 (17-ZN1)||v0.7.8 (17-ZN1)||v0.7.7 (17-ZN1)||v0.7.7 (17-ZN1)||v0.7.6 (14-BDW)||v0.7.6 (14-BDW)||v0.7.6 (14-BDW)|
|Instruction Set:||x64 AVX2 + ADX||x64 AVX2 + ADX||x64 AVX2 + ADX||x64 AVX2 + ADX||x64 AVX2 + ADX||x64 AVX2 + ADX||x64 AVX2 + ADX||x64 AVX2 + ADX|
|Credit:||Oliver Kruse||Hiroyuki Oodaira (大平 寛之)||Nehal Prasad||ji lcpd|
|Processor(s):||Phenom II X3 720||Core i7 920||FX-8350||Core i7 4770K||Core i7 5775C|
|Generation:||AMD K10||Intel Nehalem||AMD Piledriver||Intel Haswell||Intel Broadwell|
|Cores/Threads:||4/4 (unlock from 3/3)||4/8||8/8||4/8||4/8|
|Processor Speed:||2.8 GHz||3.5 GHz (OC)||4.0 GHz||4.0 GHz (OC)||3.8 GHz (OC)|
|Memory:||12 GB - 1333 MT/s||12 GB - 1333 MT/s||32 GB - 1600 MT/s||32 GB - 2133 MT/s||16 GB - 2400 MT/s|
|Program Version:||v0.7.6 (05-A64)||v0.7.6 (08-NHM)||v0.7.6 (11-BD1)||v0.7.8 (11-BD1)||v0.7.6 (13-HSW)||v0.7.8 (13-HSW)||v0.7.1 (14-BDW)|
|Instruction Set:||x64 SSE3||x64 SSE4.1||x64 AVX + XOP||x64 AVX + XOP||x64 AVX2||x64 AVX2||x64 AVX2 + ADX|
|Processor(s):||Core i7 5820K||Core i7 5960X||Threadripper 1950X||Core i9 7900X||Core i9 7940X|
|Generation:||Intel Haswell||Intel Haswell||AMD Threadripper||Intel Skylake X||Intel Skylake X|
|Processor Speed:||4.5 GHz (OC)||4.0 GHz (OC)||3.5 - 3.7 GHz||
|4.6/4.0/3.6 GHz*||3.8 GHz||3.6 GHz|
|3.0 GHz cache||2.8 GHz cache|
|Memory:||32 GB - 2400 MT/s||64 GB - 2133 MT/s||128 GB - 3000 MT/s||128 GB - 3600 MT/s||128 GB - 3466 MT/s|
|Program Version:||v0.7.3 (13-HSW)||v0.7.6 (13-HSW)||v0.7.8 (13-HSW)||v0.7.6 (17-ZN1)||v0.7.6 (17-SKX)||v0.7.6 (17-SKX)||v0.7.8 (14-BDW)||v0.7.8 (17-SKX)|
|Instruction Set:||x64 AVX2||x64 AVX2||x64 AVX2||x64 AVX2 + ADX||x64 AVX512-DQ||x64 AVX512-DQ||x64 AVX2 + ADX||x64 AVX512-DQ|
|Credit:||Sean Heneghan||Oliver Kruse|
*All-core non-AVX/AVX/AVX512 CPU frequency.
Due to high core count and the effect of NUMA (Non-Uniform Memory Access), performance on multi-processor systems are extremely sensitive to various settings. Therefore, these benchmarks may not be entirely representative of what the hardware is capable of.
|Processor(s):||Xeon E5-2696 v4||Epyc 7601||Xeon Gold 6130F||Xeon Platinum 8124M||Xeon Gold 6148||Xeon Platinum 8175M||Epyc 7742|
|Generation:||Intel Broadwell||AMD Naples||Intel Skylake Purley||Intel Skylake Purley||Intel Skylake Purley||Intel Skylake Purley||AMD Rome|
|Processor Speed:||2.2 GHz||2.2 GHz||2.1 GHz||3.0 GHz||2.4 GHz||2.5 GHz|
|Memory:||768 GB - ???||256 GB - ??||256 GB - ??||137 GB - ??||188 GB - ??||~756 GB - ??||~504 GB|
|Program Version:||v0.7.1 (14-BDW)||v0.7.3 (17-ZN1)||v0.7.3 (17-SKX)||v0.7.5 (17-SKX)||v0.7.6 (17-SKX)||v0.7.6 (17-SKX)||v0.7.7 (17-ZN1)|
|Instruction Set:||x64 AVX2 + ADX||x64 AVX2 + ADX||x64 AVX512-DQ||x64 AVX512-DQ||x64 AVX512-DQ||x64 AVX512-DQ||x64 AVX2 + ADX|
|Credit:||"yoyo"||Dave Graham||Jacob Coleman||Oliver Kruse||newalex||Carsten Spille|
|Processor(s):||Xeon X5482||Xeon E5-2690||Xeon E5-2683 v3||Xeon E7-8880 v3||Xeon E5-2687W v4||Xeon E5-2686 v4|
|Generation:||Intel Penryn||Intel Sandy Bridge||Intel Haswell||Intel Haswell||Intel Broadwell||Intel Broadwell|
|Processor Speed:||3.2 GHz||3.5 GHz||2.03 GHz||2.3 GHz||3.0 GHz||2.3 GHz|
|Memory:||64 GB - 800 MT/s||256 GB - ???||128 GB - ???||2 TB - ???||64 GB||504 GB - ???|
|Program Version:||v0.7.2 (08-NHM)||v0.7.5 (07-PNR)||v0.6.2/3 (11-SNB)||v0.6.9 (13-HSW)||v0.7.1 (13-HSW)||v0.7.6 (14-BDW)||v0.7.7 (14-BDW)|
|Instruction Set:||x64 SSE4.1||x64 AVX||x64 AVX2||x64 AVX2||x64 AVX2 + ADX||x64 AVX2 + ADX|
|Credit:||Shigeru Kondo||Shigeru Kondo||Jacob Coleman||Cameron Giesbrecht||newalex|
The full chart of rankings for each size can be found here:
These fastest times may include unreleased betas.
Got a faster time? Let me know: email@example.com
Note that I usually don't respond to these emails. I simply put them into the charts which I update periodically (typically within 2 weeks).
Decimal Digits of Pi - Times in Seconds
Core i9 7940X @ 3.7 GHz AVX512
|Memory Frequency:||2666 MT/s||3466 MT/s|
High core count Skylake X processors are known to be heavily bottlenecked by memory bandwidth.
Because of the memory-intensive nature of computing Pi and other constants, y-cruncher needs a lot of memory bandwidth to perform well. In fact, the program has been noticably memory bound on nearly all high-end desktops since 2012 as well as the majority of multi-socket systems since at least 2006.
Don't be surprised if y-cruncher exposes instabilities that other applications and stress-tests do not. y-cruncher is unusual in that it simultaneously places a heavy load on both the CPU and the entire memory subsystem.
y-cruncher has a lot of settings for tuning parallel performance. By default, it makes a best effort to analyze the hardware and pick the best settings. But because of the virtually unlimited combinations of processor topologies, it's difficult for y-cruncher to optimally pick the best settings for everything. So sometimes the best performance can only be achieved with manual settings.
*These are advanced settings that cannot be changed if you're using the benchmark option in the console UI. To change them, you will need to either run benchmark mode from the command line or use the custom compute menu.
Load imbalance is a faily common problem in y-cruncher. The usual causes are:
Large pages used to not matter in the past, but they do now in the post-Spectre/Meltdown world. Mitigations for the Meltdown vulnerability can have a noticeable performance drop for y-cruncher (up to 5% has been observed). It turns out that turning on large pages can mitigate the penalty for this mitigation. (pun intended)
Refer to the memory allocation guide on how to turn on large pages.
This is probably one of the most complicated features in y-cruncher.
Everything in this section is in the process of being re-verified and moved to: https://github.com/Mysticial/y-cruncher/issues
Pi and other Constants:
Hardware and Overclocking:
Here's some interesting sites dedicated to the computation of Pi and other constants:
Contact me via e-mail. I'm pretty good with responding unless it gets caught in my school's junk mail filter.
You can also find me on Twitter as @Mysticial.