y-cruncher - A Multi-Threaded Pi-Program
From a high-school project that went a little too far...
By Alexander J. Yee
(Last updated: September 26, 2022)
The first scalable multi-threaded Pi-benchmark for multi-core systems...
How fast can your computer compute Pi?
y-cruncher is a program that can compute Pi and other constants to trillions of digits.
It is the first of its kind that is multi-threaded and scalable to multi-core systems. Ever since its launch in 2009, it has become a common benchmarking and stress-testing application for overclockers and hardware enthusiasts.
y-cruncher has been used to set several world records for the most digits of Pi ever computed.
Windows: Version 0.7.10 Build 9513 (Released: August 31, 2022)
Linux : Version 0.7.10 Build 9513 (Released: August 31, 2022)
Official Mersenneforum Subforum (new).
Official HWBOT forum thread.
Zen4's AVX512: (Sepember 26, 2022) - permalink
Now that the embargos have lifted, I have published my breakdown of Zen4's AVX512 over on Mersenneforum.
So if you're a SIMD programmer or just curious about architecture in general, this might be worth a read.
Version 0.7.10 and AMD Zen4: (August 31, 2022) - permalink
Zen4 is set to be AMD's first processor to support AVX512. You know what that means - a new y-cruncher binary for it!
AMD has graciously provided me a pre-release sample of their Ryzen 9 7950X. And using that, I'm able to produce a Zen4-optimized binary - well ahead of launch and in time for the hardware reviewers to pick up.
Since most information about Zen4 is still under embargo, I cannot say anything about it at this time. If you happen to have access to a Zen4 system, feel free to try out this new release.
If you are a hardware reviewer who uses y-cruncher as one of your benchmarks, you will need to grab this latest version of y-cruncher to get the best results on Zen4.
The existing Intel-optimized AVX512 binaries for Skylake and Tiger Lake do not run optimally on Zen4, so you will need the new binary. Fortunately, the performance of the other binaries remain unchanged in v0.7.10. So Zen4 benchmarks on v0.7.10 can be directly compared with those of other processors using y-cruncher v0.7.9. Thus you do not need to redo your benchmarks for competing processors if they are already done with v0.7.9.
Overall, this was a very fun project which I enjoyed. Being pre-release meant that all the usual optimization and architectural resources that I usually rely on do not exist yet. So I had to do all the reverse engineering myself to figure out enough of architecture to where I could optimize for it. Unless someone beats me to it (via leaks), I intend to publish my findings as soon as is allowed.
AMD's support for AVX512 may be the trigger that finally breaks AVX512's chicken-egg problem. For better part of the last decade, nobody used AVX512 because of poor support. And since nobody used it, it received poor support. Now with AMD's backing, adoption of AVX512 may finally start to increase and perhaps put Intel at a competitive disadvantage until they bring it back to the consumer market.
I mentioned earlier this year that Zen4 and Sapphire Rapids X were the other two chips I wanted to test and optimize for. Now with Zen4 fulfilled (for now), that leaves Sapphire Rapids - which looks like it's having its fair share of delays. So I obviously have no timeline for that and I may end up skipping it if it ends being cost prohibitive. Just in case if anyone from Intel is paying attention...
100 Trillion Digits of Pi: (June 8, 2022) - permalink
I'm glad to announce that Google has reclaimed the Pi world record by computing 100 trillion digits of Pi!
This computation took 158 days from October 14 to March 21. Like last time, it was run on the Google Cloud platform, but with newer and improved hardware for both compute and storage.
y-cruncher has been used to set a number of world record sized computations.
Blue: Current World Record
Green: Former World Record
Red: Unverified computation. Does not qualify as a world record until verified using an alternate formula.
|Date Announced||Date Completed:||Source:||Who:||Constant:||Decimal Digits:||Time:||Computer:|
|July 17, 2022||July 15, 2022||Seungmin Kim||Lemniscate||1,200,000,000,100||
2 x Intel Xeon Gold 6140 @ 2.30 GHz
|June 8, 2022||March 21, 2022||Emma Haruka Iwao||Pi||100,000,000,000,000||
128 vCPU Intel Ice Lake (GCP)
|March 14, 2022||March 9, 2022||Seungmin Kim||Catalan's Constant||1,200,000,000,100||Compute: 48.6 days||
2 x Intel Xeon Gold 6140 @ 2.30 GHz
|January 5, 2022||November 12, 2021||Tizian Hanselmann||Square Root of 2||10,000,000,001,000||Intel Xeon E7-4870 @ 2.4 GHz
|October 4, 2021||September 30, 2021||
Intel Xeon E5-268v4 @ 2.1 GHz
|October 4, 2021||September 9, 2021||William Echols||Log(2)||1,500,000,000,000||2 x Intel Xeon E5-2690 v3 @ 2.6 GHz
|August 17, 2021||August 14, 2021||Source||UAS Grisons||Pi||62,831,853,071,796||Compute: 108 days
Verify: 34.4 hours
|AMD Epyc 7542 @ 2.9 GHz
34 + 4 Hard Drives
|February 14, 2021||February 12, 2021||Clifford Spielman||Golden Ratio||10,000,000,000,000||
AMD Threadripper 3995WX @ 2.7 GHz
|December 5, 2020||November 22, 2020||David Christle||e||31,415,926,535,897||
2 x Intel Xeon E5-2680 v2 @ 2.8 GHz
|September 13, 2020||September 6, 2020||Seungmin Kim||Log(10)||1,200,000,000,100||2 x Intel Xeon E5-2699 v3 @ 2.3 GHz
2 x Intel Xeon Gold 5220 @ 2.2 GHz
|August 9, 2020||July 26, 2020||Seungmin Kim||Zeta(3) - Apery's Constant||1,200,000,000,100||Compute: 31.7 days||2 x Intel Xeon E5-2670 v3 @ 2.3 GHz
2 x Intel Xeon Gold 5220 @ 2.2 GHz
|August 9, 2020||July 23, 2020||Andrew Sun||Gamma(1/3)||500,000,001,337||Compute: 17.3 days||
2 x Intel Xeon E5-2690 v4 @ 2.6 GHz
|June 28, 2020||June 22, 2020||Seungmin Kim||Zeta(3) - Apery's Constant||1,200,000,000,000||
|2 x Xeon E5-2670 v3 @ 2.3 GHz
|June 28, 2020||May 27, 2020||Andrew Sun||Gamma(1/4)||500,000,000,000||2 x Intel Xeon E5-2690 v4 @ 2.6 GHz
|May 28, 2020||May 26, 2020||Euler-Mascheroni Constant||600,000,000,100||
2 x Intel Xeon Gold 6140 @ 2.3 GHz
Intel Xeon 8280 @ 2.7 GHz
|January 29, 2020||January 29, 2020||Blog||Timothy Mullican||Pi||50,000,000,000,000||
4 x Intel Xeon E7-4880 v2 @ 2.5 GHz
48 Hard Drives
|December 4, 2019||November 13, 2019||
Christophe Patris de Broe
& Alexandre Gouy
& Cyril Hsu
2 x Intel Xeon Platinum 8268 @ 2.9 GHz
|October 21, 2019||October 17, 2019||Marco Julian Hummel||Gamma(1/3)||274,877,906,944||Compute: 11.2 days||
2 x Intel Xeon E5-2651 v2 @ 1.8 GHz
|March 14, 2019||January 21, 2019||
|Emma Haruka Iwao||Pi||31,415,926,535,897||Compute: 121 days||2 x Undisclosed Intel Xeon @ 2.00 GHz
> 1.40 TB DDR4
> 240 TB SSD
|August 24, 2017||August 23, 2017||Ron Watkins||Euler-Mascheroni Constant||477,511,832,674||4 x Xeon E5-4660 v3 @ 2.1 GHz - 1 TB
2 x Xeon X5690 @ 3.47 GHz - 128 GB
|November 15, 2016||November 11, 2016||Blog
|Peter Trueb||Pi||22,459,157,718,361||Compute: 105 days||4 x Xeon E7-8890 v3 @ 2.50 GHz
1.25 TB DDR4
20 x 6 TB 7200 RPM Seagate
|June 28, 2016||June 19, 2016||Ron Watkins||Square Root of 2||10,000,000,000,000||2 x Xeon X5690 @ 3.47 GHz
|October 8, 2014||October 7, 2014||
Sandon Van Ness
|Pi||13,300,000,000,000||2 x Xeon E5-4650L @ 2.6 GHz
192 GB DDR3 @ 1333 MHz
24 x 4 TB + 30 x 3 TB
|December 28, 2013||December 28, 2013||Source||Shigeru Kondo||Pi||12,100,000,000,050||2 x Xeon E5-2690 @ 2.9 GHz
128 GB DDR3 @ 1600 MHz
24 x 3 TB
See the complete list including other notably large computations. If you want to set a record yourself, the rules are in that link.
The main computational features of y-cruncher are:
Latest Releases: (August 31, 2022)
Downloading any of these files constitutes as acceptance of the license agreement.
OS Download Link Size
The Linux version comes in both statically and dynamically linked versions. The static version should work on most Linux distributions, but lacks Cilk Plus and NUMA binding. The dynamic version supports all features, but is less portable due to the DLL dependency hell.
The Windows download comes bundled with the HWBOT submitter which allows benchmarks to be submitted to HWBOT.
- Windows 7 or later.
- The HWBOT submitter requires the Java 8 Runtime.
- 64-bit Linux is required. There is no support for 32-bit.
- The dynamic version has been tested on Ubuntu 22.04.
- An x86 or x64 processor.
Very old systems that don't meet these requirements may be able to run older versions of y-cruncher. Support goes all the way back to even before Windows XP.
Other Downloads (for C++ programmers):
Comparison Chart: (Last updated: May 31, 2021)
Computations of Pi to various sizes. All times in seconds. All computations done entirely in ram.
The timings include the time needed to convert the digits to decimal representation, but not the time needed to write out the digits to disk.
Blue: Benchmarks are up-to-date with the latest version of y-cruncher.
Green: Benchmarks were done with an old version of y-cruncher that is comparable in performance with the current release.
Red: Benchmarks are significantly out-of-date due to being run with an old version of y-cruncher that is no longer comparable with the current release.
Purple: Benchmarks are from unreleased internal builds that are not speed comparable with the current release.
Laptops + Low-Power:
|Processor(s):||Core i7 8565U||Core i7 9750H||Core i7 1065G7||Core i7 1065G7||Core i7 1165G7||Core i9 11900KB|
|Generation:||Intel Kaby Lake R||Intel Coffee Lake||Intel Ice Lake||Intel Ice Lake||Intel Tiger Lake||Intel Tiger Lake|
|Processor Speed:||2.3 - 4.6 GHz||3.1 - 3.9 GHz||2.1 - 3.0 GHz (25W)||???||3.6 - 4.0 GHz (45W)||3.3 - 4.5 GHz|
|Memory:||8 GB||16 GB - 2666 MT/s||16 GB @ 3200 MT/s||16 GB @ 3733 MT/s||32 GB @ 3200 MT/s||32 GB @ 3200 MT/s|
|Version:||v0.7.8 (14-BDW)||v0.7.8 (14-BDW)||v0.7.7 (18-CNL)||v0.7.8 (18-CNL)||v0.7.8 (18-CNL)||v0.7.8 (18-CNL)|
|Instruction Set:||x64 AVX2 + ADX||x64 AVX2 + ADX||x64 AVX512-DQ||x64 AVX512-VBMI||x64 AVX512-VBMI||x64 AVX512-VBMI||x64 AVX512-VBMI|
|Credit:||ji lcpd||ji lcpd||Gnyueh||ji lcpd||ji lcpd|
|Processor(s):||Core i7 3630QM||Core i7 4610M||Core i3 8121U (Windows*)||Core i7 6560U||Core i7 6700HQ|
|Generation:||Intel Ivy Bridge||Intel Haswell||Intel Cannon Lake||Intel Skylake||Intel Skylake|
|Processor Speed:||3.2 GHz||3.0 GHz||2.6 - 3.0 GHz||2.6 - 3.0 GHz||2.4 - 2.9 GHz||2.21 GHz||2.6 GHz ?|
|Memory:||16 GB - 1600 MT/s||8 GB||8 GB||8 GB||16 GB|
|Version:||v0.7.8 (11-SNB)||v0.7.8 (13-HSW)||v0.7.8 (14-BDW)||v0.7.8 (17-SKX)||v0.7.8 (18-CNL)||v0.7.8 (14-BDW)||v0.7.8 (14-BDW)|
|Instruction Set:||x64 AVX||x64 AVX2||x64 AVX2 + ADX||x64 AVX512-DQ||x64 AVX512-VBMI||x64 AVX2 + ADX||x64 AVX2 + ADX|
|Credit:||Oliver Kruse||Marco Julian Hummel||Sebastien Davies||Marco Julian Hummel|
|Processor(s):||Ryzen 7 1800X||Ryzen 7 3700X||Ryzen 7 5800X3D||Core i9 11900K||Ryzen 9 3950X||Ryzen 9 5950X||Core i9 12900KF|
|Generation:||AMD Zen||AMD Zen 2||AMD Zen 3||Intel Rocket Lake||AMD Zen 2||AMD Zen 3||Intel Alder Lake|
|Cores/Threads:||8/16||8/16||8/16||8/16||16/32||16/32||8/16 + 8/8|
|Processor Speed:||3.7 GHz||4.3 GHz||5.3 GHz||5.1 GHz|
|Memory:||64 GB - 2866 MT/s||64 GB - 3600 MT/s||32 GB||64 GB - 3733 MT/s||16 GB - 3200 MT/s||64 GB||32 GB - 6200 MT/s|
|Program Version:||v0.7.8 (17-ZN1)||v0.7.8 (17-ZN1)||v0.7.9 (20-ZN3)||v0.7.8 (18-CNL)||v0.7.8 (17-ZN1)||v0.7.8 (19-ZN2)||v0.7.8 (14-BDW)|
|Instruction Set:||x64 AVX2 + ADX||x64 AVX2 + ADX||x64 AVX2 + ADX||x64 AVX512-VBMI||x64 AVX2 + ADX||x64 AVX2 + ADX||x64 AVX2 + ADX|
|Credit:||Sebastien Davies||Marc Beste||O-EtaIXVII||
|Processor(s):||FX-8350||Core i7 4770K||Core i7 7700K|
|Generation:||AMD Piledriver||Intel Haswell||Intel Kaby Lake|
|Processor Speed:||4.0 GHz||4.0 GHz (OC)||4.9 GHz (OC)|
|Memory:||32 GB - 1600 MT/s||32 GB - 2133 MT/s||64 GB - 3200 MT/s|
|Program Version:||v0.7.8 (11-BD1)||v0.7.8 (13-HSW)||v0.7.8 (14-BDW)|
|Instruction Set:||x64 AVX + XOP||x64 AVX2||x64 AVX2 + ADX|
|Processor(s):||Core i9 9980XE||Core i9 10980XE||Threadripper 3955WX||Threadripper 3970X||Threadripper 3990X|
|Generation:||Intel Skylake X||Intel Cascade Lake X||AMD Zen 2||AMD Zen 2||AMD Zen 2|
|Processor Speed:||2.8 GHz||3.9 GHz||3.7 GHz||4.0 GHz (OC)|
|Memory:||128 GB - 3600 MT/s||128 GB - 3600 MT/s||512 GB - 3200 MT/s||64 GB||256 GB - 3200 MT/s|
|Program Version:||v0.7.8 (17-SKX)||v0.7.8 (17-SKX)||v0.7.8 (19-ZN2)||v0.7.8 (17-ZN1)||v0.7.8 (19-ZN2)||v0.7.8 (19-ZN2)|
|Instruction Set:||x64 AVX512-DQ||x64 AVX512-DQ||x64 AVX2 + ADX||x64 AVX2 + ADX||x64 AVX2 + ADX||x64 AVX2 + ADX|
|Credit:||Shigeru Kondo||ji lcpd||Michael Makovi||Tainus||Bennet Huch|
|Processor(s):||Core i7 5960X||Threadripper 1950X||Core i9 7940X|
|Generation:||Intel Haswell||AMD Threadripper||Intel Skylake X|
|Processor Speed:||4.0 GHz (OC)||3.5 - 3.7 GHz||3.8 GHz||3.6 GHz|
|2.8 GHz cache|
|Memory:||64 GB - 2133 MT/s||128 GB - 2933 MT/s||128 GB - 3466 MT/s|
|Program Version:||v0.7.8 (13-HSW)||v0.7.8 (17-ZN1)||v0.7.8 (14-BDW)||v0.7.8 (17-SKX)|
|Instruction Set:||x64 AVX2||x64 AVX2 + ADX||x64 AVX2 + ADX||x64 AVX512-DQ|
*All-core non-AVX/AVX/AVX512 CPU frequency.
Due to high core count and the effect of NUMA (Non-Uniform Memory Access), performance on multi-processor systems are extremely sensitive to various settings. Therefore, these benchmarks may not be entirely representative of what the hardware is capable of.
|Processor(s):||Xeon Platinum 8124M||Xeon Gold 6148||Xeon Platinum 8175M||Xeon Platinum 8275CL||Epyc 7742||Epyc 7B12||Epyc 7742|
|Generation:||Intel Skylake Purley||Intel Skylake Purley||Intel Skylake Purley||Intel Cascade Lake||AMD Rome||AMD Rome||AMD Rome|
|Processor Speed:||3.0 GHz||2.4 GHz||2.5 GHz||3.0 GHz||2.25 GHz||2.25 GHz|
|Memory:||137 GB - ??||188 GB - ??||~756 GB - ??||192 GB||~504 GB||~882 GB||2 TB|
|Program Version:||v0.7.5 (17-SKX)||v0.7.6 (17-SKX)||v0.7.6 (17-SKX)||v0.7.8 (17-SKX)||v0.7.7 (17-ZN1)||v0.7.8 (19-ZN2)||v0.7.8 (19-ZN2)|
|Instruction Set:||x64 AVX512-DQ||x64 AVX512-DQ||x64 AVX512-DQ||x64 AVX512-DQ||x64 AVX2 + ADX||x64 AVX2 + ADX||x64 AVX2 + ADX|
|Credit:||Jacob Coleman||Oliver Kruse||newalex||Xinyu Miao||Carsten Spille||Greg Hogan||Song Pengei|
|Processor(s):||Xeon E5-2683 v3||Xeon E7-8880 v3||Xeon E5-2687W v4||Xeon E5-2686 v4||Xeon E5-2696 v4||Epyc 7601||Xeon Gold 6130F|
|Generation:||Intel Haswell||Intel Haswell||Intel Broadwell||Intel Broadwell||Intel Broadwell||AMD Naples||Intel Skylake Purley|
|Processor Speed:||2.03 GHz||2.3 GHz||3.0 GHz||2.3 GHz||2.2 GHz||2.2 GHz||2.1 GHz|
|Memory:||128 GB - ???||2 TB - ???||64 GB||504 GB - ???||768 GB - ???||256 GB - ??||256 GB - ??|
|Program Version:||v0.6.9 (13-HSW)||v0.7.1 (13-HSW)||v0.7.6 (14-BDW)||v0.7.7 (14-BDW)||v0.7.1 (14-BDW)||v0.7.3 (17-ZN1)||v0.7.3 (17-SKX)|
|Instruction Set:||x64 AVX2||x64 AVX2||x64 AVX2 + ADX||x64 AVX2 + ADX||x64 AVX2 + ADX||x64 AVX2 + ADX||x64 AVX512-DQ|
|Credit:||Shigeru Kondo||Jacob Coleman||Cameron Giesbrecht||newalex||"yoyo"||Dave Graham|
The full chart of rankings for each size can be found here:
These fastest times may include unreleased betas.
Got a faster time? Let me know: firstname.lastname@example.org
Note that I usually do not respond to these emails. I simply put them into the charts which I update periodically (typically within 2 weeks).
Decimal Digits of Pi - Times in Seconds
Core i9 7940X @ 3.7 GHz AVX512
|Memory Frequency:||2666 MT/s||3466 MT/s|
High core count Skylake X processors are known to be heavily bottlenecked by memory bandwidth.
Because of the memory-intensive nature of computing Pi and other constants, y-cruncher needs a lot of memory bandwidth to perform well. In fact, the program has been noticably memory bound on nearly all high-end desktops since 2012 as well as the majority of multi-socket systems since at least 2006.
Don't be surprised if y-cruncher exposes instabilities that other applications and stress-tests do not. y-cruncher is unusual in that it simultaneously places a heavy load on both the CPU and the entire memory subsystem.
y-cruncher has a lot of settings for tuning parallel performance. By default, it makes a best effort to analyze the hardware and pick the best settings. But because of the virtually unlimited combinations of processor topologies, it's difficult for y-cruncher to optimally pick the best settings for everything. So sometimes the best performance can only be achieved with manual settings.
*These are advanced settings that cannot be changed if you're using the benchmark option in the console UI. To change them, you will need to either run benchmark mode from the command line or use the custom compute menu.
Load imbalance is a faily common problem in y-cruncher. The usual causes are:
Large pages used to not matter in the past, but they do now in the post-Spectre/Meltdown world. Mitigations for the Meltdown vulnerability can have a noticeable performance drop for y-cruncher (up to 5% has been observed). It turns out that turning on large pages can mitigate the penalty for this mitigation. (pun intended)
Refer to the memory allocation guide on how to turn on large pages.
This is probably one of the most complicated features in y-cruncher.
Everything in this section is in the process of being re-verified and moved to: https://github.com/Mysticial/y-cruncher/issues
Pi and other Constants:
Hardware and Overclocking:
Here's some interesting sites dedicated to the computation of Pi and other constants:
Contact me via e-mail. I'm pretty good with responding unless it gets caught in my school's junk mail filter.
You can also find me on Twitter as @Mysticial.