![]() |
y-cruncher - A Multi-Threaded Pi-Program |
![]() |
From a high-school project that went a little too far...By Alexander J. Yee |
(Last updated: July 25, 2023)
Shortcuts:
The first scalable multi-threaded Pi-benchmark for multi-core systems...
How fast can your computer compute Pi?
y-cruncher is a program that can compute Pi and other constants to trillions of digits.
It is the first of its kind that is multi-threaded and scalable to multi-core systems. Ever since its launch in 2009, it has become a common benchmarking and stress-testing application for overclockers and hardware enthusiasts.
y-cruncher has been used to set several world records for the most digits of Pi ever computed.
Current Release:
Windows: Version 0.8.1 Build 9517 (Released: July 12, 2023)
Linux : Version 0.8.1 Build 9517 (Released: July 12, 2023)
Official Mersenneforum Subforum (new).
Official HWBOT forum thread.
Version 0.8.1 Released: (July 11, 2023) - permalink
And it's finally here! Part one of the revamp is now complete. This release brings forward the newly rewritten algorithms which will have the most performance impact for in-memory computations.
Here are some benchmarks showing the improvements brought by v0.8.1 and AVX512. Because of the large performance swings, HWBOT integration will be withheld until the HWBOT community decides what to do.
|
|
||||||||||||||||||||||||||||||||||||||||
|
|||||||||||||||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||||||||||||||
Last year when I did the Zen 4 optimizations, I was disappointed (but not surprised) that I was only able to gain 1-2% speedup with AVX512. In fact, this was so embarrassingly bad that I couldn't publish any numbers. Sure, Zen 4's AVX512 is "double-pumped" and doesn't have wider units. But there's a lot more to AVX512 than just the 512-bit width.
In reality, I was able to achieve around 10% speedup for AVX512 on Zen 4 - but only within cache. Upon scaling it up, it was completely wiped out by the memory inefficiencies in the old algorithm. And it certainly didn't help that Zen 4 set a new record for insufficient memory bandwidth.
This memory bottleneck I suspect is the primary reason why the overall benefit of AVX512 remains higher on Intel than AMD even in v0.8.1. y-cruncher has been memory-bound on every high-end chip since 2017 with AMD faring worse due to having twice as many cores and lower memory speeds. While it's also tempting to blame Zen 4's "double-pumped" AVX512 as part of the problem, in reality it isn't much worse than Intel chips that lack the second 512-bit FMA.
Memory bandwidth as a whole has been a problem that has gone completely out of control. Since 2015, computational power has increased by more than 5x while memory bandwidth has barely improved by 50%. Needless to say, this trend is completely unsustainable at least for this field of high performance computing.
Stress Testing:
Testing and validation of v0.8.1 was done on 8 computers which were long believed to be stable (most aren't even overclocked). All 8 of these machines held against older versions of y-cruncher during past releases. But for this release, 2 of them were found to be unstable. Neither were overclocked and were completely within spec.
Neither machine could be fixed by downclocking or overvolting. One of them (an Intel laptop) had to be retired. The other (a custom-built AMD desktop) was eventually stabilized by changing the motherboard. (Yes, this was a huge headache and a massive distraction from the software development.)
What does this mean for stress-testing? While it's tempting to conclude that v0.8.1 is more stressful than older versions, this sample size of 8 really isn't enough. So I'll leave it to the rest of the overclocking community to decide. The specific stress-test you want to run is called "VT3" which is the newly rewritten version of the "VST" test that everyone seems to love. Likewise, any large in-memory computation will be running the new code.
The Hybrid NTT Algorithm:
As promised in the previous announcement, y-cruncher's good old Hybrid NTT algorithm has now been published here. Despite its importance to y-cruncher's early days, it is not as conceptually spectacular as one would assume by modern (adult) standards. But as a kid when I first wrote it, it was amazing.
Anyways, I hope everyone enjoys this new version. As mentioned, this is just part one of the ongoing rewrite of the internal algorithms. While there's still a lot of work to do (including optimizations), development will now shift to swap mode. So in the short term, I don't expect any more performance swings beyond compiler changes and new optimizations for new processors.
Upcoming Changes for v0.8.x: (June 7, 2023) - permalink
In an effort to clean up and modernize the project, most of the large multiply algorithms are getting either refreshed or removed. Algorithms that are useful on modern processors are getting redesigned and rewritten from scratch while the rest will be completely removed from the codebase.
The implication of this will be performance gains on newer processors and regressions on older processors.
If this sounds big, it is. More than 400,000 lines of code will be touched. Work actually began more than 3 years ago, but very little progress was made until this year where I'm on garden leave and therefore not working.
As of today, enough has been done to get some preliminary in-memory benchmarks:
| Processor | Architecture | Clock Speeds | Binary | ISA | Pi computation Speedup vs. v0.7.10 | |
| Core i7 920 | Intel Nehalem | 2008 | 3.5 GHz + 3 x 1333 MT/s | 08-NHM ~ Ushio | x64 SSE4.1 | -27% |
| Core i7 3630QM | Intel Ivy Bridge | 2012 | stock + 2 x 1600 MT/s | 11-SNB ~ Hina | x64 AVX | -10% |
| FX-8350 | AMD Piledriver | 2012 | stock + 2 x 1600 MT/s | 11-BD1 ~ Miyu | x64 FMA4 | -1% |
| Core i7 5960X | Intel Haswell | 2013 | 4.0 GHz + 4 x 2400 MT/s | 13-HSW ~ Airi | x64 AVX2 | 3 - 4% |
| Core i7 6820HK | Intel Skylake | 2015 | stock + 2 x 2133 MT/s | 14-BDW ~ Kurumi | x64 AVX2 + ADX | 4 - 7% |
| Ryzen 7 1800X | AMD Zen 1 | 2017 | stock + 2 x 2866 MT/s | 17-ZN1 ~ Yukina | x64 AVX2 + ADX | ~1% |
| Core i9 7900X | Intel Skylake X | 2017 | 3.6 GHz (AVX512) + 4 x 3000 MT/s | 17-SKX ~ Kotori | x64 AVX512-DQ | 6 - 9% |
| Core i9 7940X | 3.6 GHz (AVX512) + 4 x 3466 MT/s | 10 - 13% | ||||
| Ryzen 9 3950X | AMD Zen 2 | 2019 | stock + 2 x 3000 MT/s | 19-ZN2 ~ Kagari | x64 AVX2 + ADX | 13 - 14% |
| Core i3 8121U | Intel Cannon Lake | 2018 | stock + 2 x 2400 MT/s | 18-CNL ~ Shinoa | x64 AVX512-VBMI | 16 - 17% |
| Core i7 1165G7 | Intel Tiger Lake | 2020 | stock + 2 x 2666 MT/s | 12 - 22% | ||
| Core i7 11800H | stock + 2 x 3200 MT/s | 23 - 27% | ||||
| Ryzen 9 7950X | AMD Zen 4 | 2022 | stock + 2 x 4400 MT/s | 22-ZN4 ~ Kizuna | x64 AVX512-GFNI | 23 - 31% |
The loss of performance for the oldest processors is primarily due to the removal of the Hybrid NTT. Yes, the Hybrid NTT that started the entire y-cruncher project is now gone. While it was the fastest thing in 2008, it unfortunately did not age very well. Stay tuned for a future blog about the algorithm. It will no longer be a secret.
Overall, there is still a lot of work to do. For example, swap-mode is still using the old implementations and will need to be revamped as well. But since the new code has reached or exceeded performance parity for the chips I care about, this is a good stopping point for v0.8.1 pending testing and validation.
Nevertheless, the benchmarks above are not final and are subject to change. Specifically, there are unresolved toolchain issues where Intel is removing their old compiler while its replacement is still significantly worse. And it's unclear whether it can be fixed before it is no longer possible to keep using their old compiler.
A big unknown is how stress-testing will be affected. Despite not being designed for this purpose, y-cruncher's stress-test is notorious for its ability to expose memory instabilities that other (even dedicated) memory testing applications cannot. In other words, it is one of the best memory testers out there. But with so much stuff being rewritten, there's no telling how this will change. Nevertheless, it doesn't make a whole lot of sense to keep around hundreds of thousands of lines of old code if turns out to be the better stress test.
So yeah... Out with the old and in with the new. Expect to see Zen 4 gaining up to 20% speedup with AVX512 vs. just AVX2 - no wider execution units needed.
y-cruncher has been used to set a number of world record sized computations.
Blue: Current World Record
Green: Former World Record
Red: Unverified computation. Does not qualify as a world record until verified using an alternate formula.
| Date Announced | Date Completed: | Source: | Who: | Constant: | Decimal Digits: | Time: | Computer: |
| May 13, 2023 | May 13, 2023 | Jordan Ranous & Kevin O'Brien |
Euler-Mascheroni Constant | 700,000,000,000 | 2 x AMD Epyc 9654 @ 2.4 GHz 1.5 TB |
||
| July 17, 2022 | July 15, 2022 | Seungmin Kim | Lemniscate | 1,200,000,000,100 | 2 x Intel Xeon Gold 6140 @ 2.30 GHz |
||
| June 8, 2022 | March 21, 2022 | Emma Haruka Iwao | Pi | 100,000,000,000,000 | 128 vCPU Intel Ice Lake (GCP) |
||
| March 14, 2022 | March 9, 2022 | Seungmin Kim | Catalan's Constant | 1,200,000,000,100 | Compute: 48.6 days | 2 x Intel Xeon Gold 6140 @ 2.30 GHz |
|
| January 5, 2022 | November 12, 2021 | Tizian Hanselmann | Square Root of 2 | 10,000,000,001,000 | Intel Xeon E7-4870 @ 2.4 GHz 896 GB |
||
| October 4, 2021 | September 30, 2021 | Chris Danneil |
Zeta(5) | 200,000,000,000 | Intel Xeon E5-268v4 @ 2.1 GHz |
||
| October 4, 2021 | September 9, 2021 | William Echols | Log(2) | 1,500,000,000,000 | 2 x Intel Xeon E5-2690 v3 @ 2.6 GHz 256 GB |
||
| August 17, 2021 | August 14, 2021 | Source | UAS Grisons | Pi | 62,831,853,071,796 | Compute: 108 days Verify: 34.4 hours |
AMD Epyc 7542 @ 2.9 GHz 1 TB 34 + 4 Hard Drives |
| February 14, 2021 | February 12, 2021 | Clifford Spielman | Golden Ratio | 10,000,000,000,000 | AMD Threadripper 3995WX @ 2.7 GHz 512 GB |
||
| December 5, 2020 | November 22, 2020 | David Christle | e | 31,415,926,535,897 | 2 x Intel Xeon E5-2680 v2 @ 2.8 GHz 252 GB |
||
| September 13, 2020 | September 6, 2020 | Seungmin Kim | Log(10) | 1,200,000,000,100 | 2 x Intel Xeon E5-2699 v3 @ 2.3 GHz 756 GB 2 x Intel Xeon Gold 5220 @ 2.2 GHz 754 GB |
||
| August 9, 2020 | July 26, 2020 | Seungmin Kim | Zeta(3) - Apery's Constant | 1,200,000,000,100 | Compute: 31.7 days | 2 x Intel Xeon E5-2670 v3 @ 2.3 GHz 503 GB 2 x Intel Xeon Gold 5220 @ 2.2 GHz 754 GB |
|
| August 9, 2020 | July 23, 2020 | Andrew Sun | Gamma(1/3) | 500,000,001,337 | Compute: 17.3 days | 2 x Intel Xeon E5-2690 v4 @ 2.6 GHz 315 GB |
|
| June 28, 2020 | June 22, 2020 | Seungmin Kim | Zeta(3) - Apery's Constant | 1,200,000,000,000 |
Not Verified |
2 x Xeon E5-2670 v3 @ 2.3 GHz 503 GB |
|
| June 28, 2020 | May 27, 2020 | Andrew Sun | Gamma(1/4) | 500,000,000,000 | 2 x Intel Xeon E5-2690 v4 @ 2.6 GHz 315 GB |
||
| January 29, 2020 | January 29, 2020 | Blog | Timothy Mullican | Pi | 50,000,000,000,000 | 4 x Intel Xeon E7-4880 v2 @ 2.5 GHz 315 GB 48 Hard Drives |
|
| December 4, 2019 | November 13, 2019 | Christophe Patris de Broe & Alexandre Gouy & Cyril Hsu |
Golden Ratio | 20,000,000,000,000 |
Not Verified |
2 x Intel Xeon Platinum 8268 @ 2.9 GHz 768 GB |
|
| October 21, 2019 | October 17, 2019 | Marco Julian Hummel | Gamma(1/3) | 274,877,906,944 | Compute: 11.2 days | 2 x Intel Xeon E5-2651 v2 @ 1.8 GHz 192 GB |
|
| March 14, 2019 | January 21, 2019 | Blogs |
Emma Haruka Iwao | Pi | 31,415,926,535,897 | Compute: 121 days | 2 x Undisclosed Intel Xeon @ 2.00 GHz > 1.40 TB DDR4 > 240 TB SSD |
| November 15, 2016 | November 11, 2016 | Blog Sponsor |
Peter Trueb | Pi | 22,459,157,718,361 | Compute: 105 days | 4 x Xeon E7-8890 v3 @ 2.50 GHz 1.25 TB DDR4 20 x 6 TB 7200 RPM Seagate |
| June 28, 2016 | June 19, 2016 | Ron Watkins | Square Root of 2 | 10,000,000,000,000 | 2 x Xeon X5690 @ 3.47 GHz 141 GB |
||
| October 8, 2014 | October 7, 2014 | Sandon Van Ness (houkouonchi) |
Pi | 13,300,000,000,000 | 2 x Xeon E5-4650L @ 2.6 GHz 192 GB DDR3 @ 1333 MHz 24 x 4 TB + 30 x 3 TB |
||
| December 28, 2013 | December 28, 2013 | Source | Shigeru Kondo | Pi | 12,100,000,000,050 | 2 x Xeon E5-2690 @ 2.9 GHz 128 GB DDR3 @ 1600 MHz 24 x 3 TB |
See the complete list including other notably large computations. If you want to set a record yourself, the rules are in that link.
The main computational features of y-cruncher are:
Latest Releases: (July 11, 2023)
Downloading any of these files constitutes as acceptance of the license agreement.
OS Download Link Size Windows
44.1 MB Linux (Static)
35.2 MB Linux (Dynamic)
28.9 MB
The Linux version comes in both statically and dynamically linked versions. The static version should work on most Linux distributions, but lacks TBB and NUMA binding. The dynamic version supports all features, but is less portable due to the DLL dependency hell.
The Windows download comes bundled with the HWBOT submitter which allows benchmarks to be submitted to HWBOT.
System Requirements:
Windows:
- Windows 7 or later.
- The HWBOT submitter requires the Java 8 Runtime.
Linux:
- 64-bit Linux is required. There is no support for 32-bit.
- The dynamic version has been tested on Ubuntu 22.04.
All Systems:
- An x86 or x64 processor.
Very old systems that don't meet these requirements may be able to run older versions of y-cruncher. Support goes all the way back to even before Windows XP.
Version History:
Other Downloads (for C++ programmers):
Advanced Documentation:
Comparison Chart: (Last updated: July 11, 2023)
Computations of Pi to various sizes. All times in seconds. All computations done entirely in ram.
The timings include the time needed to convert the digits to decimal representation, but not the time needed to write out the digits to disk.
Blue: Benchmarks are up-to-date with the latest version of y-cruncher.
Green: Benchmarks were done with an old version of y-cruncher that is comparable in performance with the current release.
Red: Benchmarks are significantly out-of-date due to being run with an old version of y-cruncher that is no longer comparable with the current release.
Purple: Benchmarks are from unreleased internal builds that are not speed comparable with the current release.
Laptops + Low-Power:
| Processor(s): | Core i7 6820HK | Core i7 11800H | Core i7 11800H |
| Generation: | Intel Skylake | Intel Tiger Lake | Intel Tiger Lake |
| Cores/Threads: | 4/8 | 8/16 | 8/16 |
| Processor Speed: | 3.2 GHz (stock) | ~2.5 GHz (45W PL) | ~3.0 GHz (60W PL) |
| Memory: | 64 GB @ 2133 MT/s | 64 GB @ 3200 MT/s | 64 GB @ 3200 MT/s |
| Version: | v0.8.1 (14-BDW) | v0.8.1 (18-CNL) | v0.8.1 (18-CNL) |
| Instruction Set: | x64 AVX2 + ADX | x64 AVX512-VBMI | x64 AVX512-VBMI |
| 25,000,000 | 1.500 | 0.655 | 0.530 |
| 50,000,000 | 3.307 | 1.406 | 1.125 |
| 100,000,000 | 7.238 | 3.005 | 2.447 |
| 250,000,000 | 20.596 | 8.576 | 6.855 |
| 500,000,000 | 45.967 | 19.747 | 15.356 |
| 1,000,000,000 | 102.885 | 42.727 | 34.308 |
| 2,500,000,000 | 290.824 | 123.523 | 96.918 |
| 5,000,000,000 | 640.506 | 247.705 | 218.782 |
| 10,000,000,000 | 1,391.204 | 526.212 | 480.197 |
| Credit: |
| Processor(s): | Core i3 8121U | Core i7 11800H | ||||
| Generation: | Intel Cannon Lake | Intel Tiger Lake | ||||
| Cores/Threads: | 2/4 | 8/16 | ||||
| Processor Speed: | ~2.5 - 3.2 GHz (stock) | ~2.5 - 2.8 GHz (45W PL) | ||||
| Memory: | 8 GB @ 2400 MT/s | 64 GB @ 3200 MT/s | ||||
| Version: | v0.8.1 (14-BDW) | v0.8.1 (17-SKX) | v0.8.1 (18-CNL) | v0.8.1 (14-BDW) | v0.8.1 (17-SKX) | v0.8.1 (18-CNL) |
| Instruction Set: | x64 AVX2 + ADX | x64 AVX512-DQ | x64 AVX512-VBMI | x64 AVX2 + ADX | x64 AVX512-DQ | x64 AVX512-VBMI |
| 25,000,000 | 2.857 | 2.467 | 1.988 | 0.907 | 0.853 | 0.655 |
| 50,000,000 | 6.446 | 5.501 | 4.392 | 2.075 | 1.862 | 1.406 |
| 100,000,000 | 14.335 | 12.257 | 9.490 | 4.176 | 3.749 | 3.005 |
| 250,000,000 | 42.566 | 36.204 | 27.137 | 12.014 | 10.705 | 8.576 |
| 500,000,000 | 99.040 | 85.443 | 64.359 | 28.805 | 24.123 | 19.747 |
| 1,000,000,000 | 228.863 | 198.405 | 151.605 | 63.898 | 55.264 | 42.727 |
| 2,500,000,000 | 187.882 | 148.423 | 123.523 | |||
| 5,000,000,000 | 375.130 | 327.776 | 247.705 | |||
| 10,000,000,000 | 794.573 | 709.606 | 526.212 | |||
| Credit: | ||||||
Mainstream Desktops:
| Processor(s): | Ryzen 7 1800X | Ryzen 7 3800X | Core i9 11700K | Ryzen 9 3950X | Ryzen 9 5950X | Ryzen 9 7950X |
| Generation: | AMD Zen 1 | AMD Zen 2 | Intel Rocket Lake | AMD Zen 2 | AMD Zen 3 | AMD Zen 4 |
| Cores/Threads: | 8/16 | 8/16 | 8/16 | 16/32 | 16/32 | 16/32 |
| Processor Speed: | stock | stock | stock | stock | stock | stock |
| Memory: | 64 GB - 2866 MT/s | 32 GB - 3600 MT/s | 32 GB - 3200 MT/s | 128 GB - 2666 MT/s | 64 GB - 3200 MT/s | 128 GB - 4400 MT/s |
| Program Version: | v0.8.1 (17-ZN1) | v0.8.1 (19-ZN2) | v0.8.1 (18-CNL) | v0.8.1 (19-ZN2) | v0.8.1 (19-ZN2) | v0.8.1 (22-ZN4) |
| Instruction Set: | x64 AVX2 + ADX | x64 AVX2 + ADX | x64 AVX512-VBMI | x64 AVX2 + ADX | x64 AVX2 + ADX | x64 AVX512-GFNI |
| 25,000,000 | 1.150 | 0.654 | 0.501 | 0.588 | 0.490 | 0.312 |
| 50,000,000 | 2.527 | 1.415 | 1.114 | 1.257 | 1.090 | 0.679 |
| 100,000,000 | 5.555 | 3.028 | 2.223 | 2.685 | 2.345 | 1.517 |
| 250,000,000 | 15.760 | 8.404 | 6.220 | 7.251 | 6.371 | 4.157 |
| 500,000,000 | 34.659 | 18.440 | 13.573 | 15.556 | 13.395 | 8.883 |
| 1,000,000,000 | 78.690 | 41.097 | 30.415 | 33.925 | 29.301 | 18.542 |
| 2,500,000,000 | 220.278 | 117.788 | 86.119 | 96.695 | 82.204 | 50.743 |
| 5,000,000,000 | 493.388 | 266.719 | 193.718 | 215.333 | 181.355 | 110.379 |
| 10,000,000,000 | 1,078.187 | 473.958 | 399.012 | 241.162 | ||
| 25,000,000,000 | 1,361.732 | 680.344 | ||||
| Credit: | Oliver Kruse | Oliver Kruse |
|
Oliver Kruse |
| Processor(s): | Core i7 920 | FX-8350 | Core i7 4770K |
| Generation: | Intel Nehalem | AMD Piledriver | Intel Haswell |
| Cores/Threads: | 4/8 | 8/8 | 4/8 |
| Processor Speed: | 3.5 GHz | stock | 4.0 GHz |
| Memory: | 12 GB - 1333 MT/s | 32 GB - 1600 MT/s | 32 GB - 2133 MT/s |
| Program Version: | v0.8.1 (08-NHM) | v0.8.1 (11-BD1) | v0.8.1 (13-HSW) |
| Instruction Set: | x64 SSE4.1 | x64 FMA4 | x64 AVX2 |
| 25,000,000 | 7.032 | 3.677 | 1.546 |
| 50,000,000 | 17.174 | 7.703 | 3.259 |
| 100,000,000 | 36.164 | 16.576 | 6.987 |
| 250,000,000 | 105.789 | 46.597 | 19.588 |
| 500,000,000 | 236.096 | 103.165 | 43.197 |
| 1,000,000,000 | 531.676 | 230.780 | 96.845 |
| 2,500,000,000 | 669.594 | 274.336 | |
| 5,000,000,000 | 1,460.714 | 606.605 | |
| 10,000,000,000 | |||
| 25,000,000,000 | |||
| Credit: |
High-End Desktops:
| Processor(s): | Core i7 5960X | Threadripper 1950X | Core i9 7900X | Core i9 7940X | Threadripper 3990X |
| Generation: | Intel Haswell | AMD Zen 1 | Intel Skylake X | Intel Skylake X | Zen 2 |
| Cores/Threads: | 8/16 | 16/32 | 10/20 | 14/28 | 64/128 |
| Processor Speed: | 4.0 GHz | stock | ~3.6 GHz (200W PL) | 3.6 GHz (AVX512) | 2.9 GHz |
| Memory: | 64 GB - 2400 MT/s | 64 GB - 2800 MT/s | 128 GB - 3000 MT/s | 128 GB - 3466 MT/s | ~141 GB |
| Program Version: | v0.8.1 (13-HSW) | v0.8.1 (17-ZN1) | v0.8.1 (17-SKX) | v0.8.1 (17-SKX) | v0.8.1 (19-ZN2) |
| Instruction Set: | x64 AVX2 | x64 AVX2 + ADX | x64 AVX512-DQ | x64 AVX512-DQ | x64 AVX2 + ADX |
| 25,000,000 | 0.807 | 0.756 | 0.522 | 0.404 | 0.584 |
| 50,000,000 | 1.743 | 1.579 | 1.028 | 0.721 | 1.181 |
| 100,000,000 | 3.647 | 3.273 | 2.048 | 1.451 | 2.409 |
| 250,000,000 | 10.088 | 8.990 | 5.752 | 4.056 | 5.724 |
| 500,000,000 | 22.075 | 19.604 | 12.830 | 9.017 | 10.881 |
| 1,000,000,000 | 49.232 | 43.014 | 28.906 | 20.518 | 21.496 |
| 2,500,000,000 | 139.404 | 121.645 | 82.764 | 60.636 | 58.009 |
| 5,000,000,000 | 311.388 | 271.983 | 186.233 | 137.906 | 126.513 |
| 10,000,000,000 | 669.736 | 613.450 | 401.820 | 302.121 | 274.050 |
| 25,000,000,000 | 1,125.775 | 843.498 | 768.212 | ||
| Credit: | Oliver Kruse | Paul Underwood |
Multi-Processor Workstation/Servers:
Due to high core count and the effect of NUMA (Non-Uniform Memory Access), performance on multi-processor systems are extremely sensitive to various settings. Therefore, these benchmarks may not be entirely representative of what the hardware is capable of.
| Processor(s): | Xeon Platinum 8124M | Xeon Gold 6148 | Xeon Platinum 8175M | Xeon Platinum 8275CL | Epyc 7742 | Epyc 7B12 | Epyc 7742 |
| Generation: | Intel Skylake Purley | Intel Skylake Purley | Intel Skylake Purley | Intel Cascade Lake | AMD Rome | AMD Rome | AMD Rome |
| Sockets/Cores/Threads: | 2/36/72 | 2/40/40 | 2/48/96 | 2/48/96 | 2/128/256 | 2/112/224 | 2/128/256 |
| Processor Speed: | 3.0 GHz | 2.4 GHz | 2.5 GHz | 3.0 GHz | 2.25 GHz | 2.25 GHz | |
| Memory: | 137 GB - ?? | 188 GB - ?? | ~756 GB - ?? | 192 GB | ~504 GB | ~882 GB | 2 TB |
| Program Version: | v0.7.5 (17-SKX) | v0.7.6 (17-SKX) | v0.7.6 (17-SKX) | v0.7.8 (17-SKX) | v0.7.7 (17-ZN1) | v0.7.8 (19-ZN2) | v0.7.8 (19-ZN2) |
| Instruction Set: | x64 AVX512-DQ | x64 AVX512-DQ | x64 AVX512-DQ | x64 AVX512-DQ | x64 AVX2 + ADX | x64 AVX2 + ADX | x64 AVX2 + ADX |
| 25,000,000 | 0.540 | 0.329 | 0.294 | 0.283 | 0.534 | 0.439 | 0.513 |
| 50,000,000 | 0.981 | 0.683 | 0.617 | 0.544 | 1.027 | 0.838 | 0.920 |
| 100,000,000 | 1.905 | 1.456 | 1.305 | 1.169 | 2.298 | 1.796 | 1.887 |
| 250,000,000 | 5.085 | 3.737 | 3.591 | 3.125 | 5.854 | 4.509 | 4.650 |
| 500,000,000 | 10.372 | 7.750 | 7.293 | 6.309 | 10.502 | 8.196 | 8.066 |
| 1,000,000,000 | 21.217 | 16.550 | 15.041 | 13.042 | 17.836 | 14.252 | 13.246 |
| 2,500,000,000 | 55.701 | 45.693 | 39.329 | 34.028 | 35.485 | 30.592 | 27.011 |
| 5,000,000,000 | 118.151 | 99.078 | 83.601 | 71.777 | 62.432 | 58.405 | 49.940 |
| 10,000,000,000 | 247.928 | 212.984 | 176.695 | 153.169 | 115.543 | 116.900 | 98.156 |
| 25,000,000,000 | 599.653 | 491.988 | 425.442 | 307.995 | 314.907 | 258.081 | |
| 50,000,000,000 | 1,081.181 | 690.662 | 741.633 | 598.716 | |||
| 100,000,000,000 | 1715.123 | 1,370.714 | |||||
| 250,000,000,000 | 3,872.397 | ||||||
| Credit: | Jacob Coleman | Oliver Kruse | newalex | Xinyu Miao | Carsten Spille | Greg Hogan | Song Pengei |
| Processor(s): | Xeon E5-2683 v3 | Xeon E7-8880 v3 | Xeon E5-2687W v4 | Xeon E5-2686 v4 | Xeon E5-2696 v4 | Epyc 7601 | Xeon Gold 6130F |
| Generation: | Intel Haswell | Intel Haswell | Intel Broadwell | Intel Broadwell | Intel Broadwell | AMD Naples | Intel Skylake Purley |
| Sockets/Cores/Threads: | 2/28/56 | 4/64/128 | 2/24/48 | 2/36/72 | 2/44/88 | 2/64/128 | 2/32/64 |
| Processor Speed: | 2.03 GHz | 2.3 GHz | 3.0 GHz | 2.3 GHz | 2.2 GHz | 2.2 GHz | 2.1 GHz |
| Memory: | 128 GB - ??? | 2 TB - ??? | 64 GB | 504 GB - ??? | 768 GB - ??? | 256 GB - ?? | 256 GB - ?? |
| Program Version: | v0.6.9 (13-HSW) | v0.7.1 (13-HSW) | v0.7.6 (14-BDW) | v0.7.7 (14-BDW) | v0.7.1 (14-BDW) | v0.7.3 (17-ZN1) | v0.7.3 (17-SKX) |
| Instruction Set: | x64 AVX2 | x64 AVX2 | x64 AVX2 + ADX | x64 AVX2 + ADX | x64 AVX2 + ADX | x64 AVX2 + ADX | x64 AVX512-DQ |
| 25,000,000 | 0.907 | 1.176 | 0.490 | 0.494 | 0.715 | 2.459 | 1.150 |
| 50,000,000 | 1.745 | 2.321 | 1.072 | 0.982 | 1.344 | 4.347 | 1.883 |
| 100,000,000 | 3.317 | 4.217 | 2.303 | 2.193 | 2.673 | 6.996 | 3.341 |
| 250,000,000 | 8.339 | 8.781 | 6.196 | 6.044 | 6.853 | 14.258 | 7.731 |
| 500,000,000 | 17.708 | 15.879 | 13.046 | 12.582 | 14.538 | 24.930 | 15.346 |
| 1,000,000,000 | 37.311 | 32.078 | 27.763 | 26.852 | 31.260 | 47.837 | 31.301 |
| 2,500,000,000 | 102.131 | 78.251 | 76.202 | 73.596 | 84.271 | 111.139 | 82.871 |
| 5,000,000,000 | 218.917 | 164.157 | 165.046 | 160.094 | 192.889 | 228.252 | 179.488 |
| 10,000,000,000 | 471.802 | 346.307 | 356.487 | 346.305 | 417.322 | 482.777 | 387.530 |
| 25,000,000,000 | 1,511.852 | 957.966 | 1,006.131 | 980.784 | 1,186.881 | 1,184.144 | 1,063.850 |
| 50,000,000,000 | 2,096.169 | 2,202.558 | 2,156.854 | 2,601.476 | |||
| 100,000,000,000 | 4,442.742 | 6,037.704 | |||||
| 250,000,000,000 | 17,428.450 | ||||||
| Credit: | Shigeru Kondo | Jacob Coleman | Cameron Giesbrecht | newalex | "yoyo" | Dave Graham | |
The full chart of rankings for each size can be found here:
These fastest times may include unreleased betas.
Got a faster time? Let me know: a-yee@u.northwestern.edu
Note that I usually do not respond to these emails. I simply put them into the charts which I update periodically (typically within 2 weeks).
Decimal Digits of Pi - Times in Seconds Core i9 7940X @ 3.7 GHz AVX512 |
||
| Memory Frequency: | 2666 MT/s | 3466 MT/s |
| 25,000,000 | 0.839 | 0.758 |
| 50,000,000 | 1.424 | 1.338 |
| 100,000,000 | 2.701 | 2.425 |
| 250,000,000 | 6.489 | 5.877 |
| 500,000,000 | 13.307 | 11.917 |
| 1,000,000,000 | 27.913 | 24.915 |
| 2,500,000,000 | 76.837 | 68.322 |
| 5,000,000,000 | 168.058 | 148.737 |
| 10,000,000,000 | 365.047 | 322.115 |
| 25,000,000,000 | 1,037.527 | 916.039 |
High core count Skylake X processors are known to be heavily bottlenecked by memory bandwidth.
Memory Bandwidth:
Because of the memory-intensive nature of computing Pi and other constants, y-cruncher needs a lot of memory bandwidth to perform well. In fact, the program has been noticably memory bound on nearly all high-end desktops since 2012 as well as the majority of multi-socket systems since at least 2006.
Recommendations:
Don't be surprised if y-cruncher exposes instabilities that other applications and stress-tests do not. y-cruncher is unusual in that it simultaneously places a heavy load on both the CPU and the entire memory subsystem.
Parallel Performance:
y-cruncher has a lot of settings for tuning parallel performance. By default, it makes a best effort to analyze the hardware and pick the best settings. But because of the virtually unlimited combinations of processor topologies, it's difficult for y-cruncher to optimally pick the best settings for everything. So sometimes the best performance can only be achieved with manual settings.
*These are advanced settings that cannot be changed if you're using the benchmark option in the console UI. To change them, you will need to either run benchmark mode from the command line or use the custom compute menu.
Load imbalance is a faily common problem in y-cruncher. The usual causes are:
Large Pages:
Large pages used to not matter in the past, but they do now in the post-Spectre/Meltdown world. Mitigations for the Meltdown vulnerability can have a noticeable performance drop for y-cruncher (up to 5% has been observed). It turns out that turning on large pages can mitigate the penalty for this mitigation. (pun intended)
Refer to the memory allocation guide on how to turn on large pages.
Swap Mode:
This is probably one of the most complicated features in y-cruncher.
Everything in this section is in the process of being re-verified and moved to: https://github.com/Mysticial/y-cruncher/issues
Performance Issues:
Pi and other Constants:
Program Usage:
Hardware and Overclocking:
Academia:
Programming:
Other:
Here's some interesting sites dedicated to the computation of Pi and other constants:
Contact me via e-mail. I'm pretty good with responding unless it gets caught in my school's junk mail filter.
You can also find me on Twitter as @Mysticial.