y-cruncher - A Multi-Threaded Pi-Program

From a high-school project that went a little too far...

By Alexander J. Yee

(Last updated: July 26, 2017)

 

Shortcuts:

 

The first scalable multi-threaded Pi-benchmark for multi-core systems...

 

How fast can your computer compute Pi?

 

y-cruncher is a program that can compute Pi and other constants to trillions of digits.

It is the first of its kind that is multi-threaded and scalable to multi-core systems. Ever since its launch in 2009, it has become a common benchmarking and stress-testing application for overclockers and hardware enthusiasts.

 

y-cruncher has been used to set several world records for the most digits of Pi ever computed.

 

Current Release:

Windows: Version 0.7.3 Build 9472 (Released: July 12, 2017)

Linux      : Version 0.7.3 Build 9472 (Released: July 12, 2017)

 

Official HWBOT thread.

Official XtremeSystems Forums thread.

 

News:

 

Skylake X and AVX512: (July 6, 2017)

 

Let's talk about Skylake X and AVX512. Because everyone's been waiting for this. Since there's currently a lack of AVX512 benchmarks and stress tests. And because of that, I've had at least half a dozen people and organizations contact me about y-cruncher's AVX512.

 

Okay... some AVX512 benchmarks already existed. SiSoftware Sandra had some support. And my little-known FLOPs benchmark did too. But people either weren't aware of them, or wanted more. And by advertising y-cruncher's internal AVX512 support for at least a year now, I basically brought this on myself.

 

So let's get to the point. Unfortunately, AVX512 will not bring the "instant massive performance gain" that a lot of people were expecting. Realistically speaking, the speedups over AVX2 seem to vary around 10 - 50% - usually on the lower end of that scale. While the investigation is on-going, there are some known factors:

  1. Not all Skylake X and Skylake Purley processors will have the full AVX512 capability.
  2. "Phantom throttling" of performance when certain thermal limits are exceeded.
  3. Memory bandwidth is a significant bottleneck.
  4. Amdahl's law and other unknown scalability issues.

 

Not all Skylake X and Skylake Purley processors will have the full AVX512 capability:

 

While this reason doesn't apply to my system, it's worth mentioning it anyway.

 

Architecturally, Skylake X retains Skylake desktop's architecture with 2 x 256-bit FMA units. In Skylake X, those two 256-bit FMA units can merge to form a single 512-bit FMA. On the processors with full-throughput AVX512, there is also a dedicated 512-bit FMA - thereby providing 2 x 512-bit FMA capability.

 

However, that dedicated 512-bit FMA is only enabled on the Core i9 parts. The 6-core and 8-core Core i7 parts are supposed to have it disabled. Therefore they only have half the AVX512 performance.

 

It's worth mentioning that there is a benchmark on an engineering-sample 6-core Core i7 that shows full-throughput AVX512 anyway. However, engineering sample processors are not always representative of the retail parts.

 

So as of this writing, I still don't know if the 6 and 8-core Skylake X Core i7's have the full AVX512. The only Skylake X processor I have at this time is the Core i9 7900X which is supposed to have the full AVX512 anyway. (and indeed it does based on my tests)

 

 

Update (July 14, 2017):

 

Carsten Spille from www.pcgameshardware.de has notified me that the retail Core i7 7800X does in fact have full-throughput AVX512. This goes against all the reviews that have repeatedly stated that only the Core i9s have the full AVX512. So far, Intel has not commented on this.

 

 

 

"Phantom throttling" of performance when certain thermal limits are exceeded:

 

Within minutes of getting my system setup, I started noticing inconsistencies in performance. And after spending a long Friday night investigating the issue, I determined that there was a sort of "Phantom throttling" of AVX512 code when certain thermal limits are exceeded.

 

"Phantom throttling" is the term that I used to describe the problem in my emails with the Silicon Lottery vendor. And it looks like I'm not the only one using that term anymore. Phantom throttling is when the processor gets throttled without a change in clock frequency. For many years, processors have throttled down for many reasons to protect it from damage. But when throttling happens, it has always been done by lowering the clock frequency - which is visible in a monitor like CPUz. Skylake X is the first line of processors to break from this and it makes it more difficult to detect the throttling.

 

Right now, the phantom throttling phenomenon is still not well understood. Overclocker der8auer has mentioned that it could be caused by CPUz not reacting fast enough to actual clock frequency changes. On the other hand, the tests that Silicon Lottery and myself have done seem to show the that there really is no drop in clock frequency at all.

 

Initially, I observed this effect only with AVX512 code and thus hypothesized that the mechanism behind the throttling is the shutdown of the dedicated 512-bit FMA. But others have found that phantom throttling also occurs on AVX and scalar code as well. In short, much more investigation is needed. The lack of AVX512 programs out there certainly doesn't help and is partially why I'm rushing this release of y-cruncher v0.7.3.

 

Currently, there are no known reliable ways of stopping the throttling and results vary heavily by motherboard manufacturer. But maxing out thermal limits and disabling all thermal protections seems to help. (Don't try this at home if you don't know what you're doing or you aren't at least moderately experienced in overclocking. You can destroy your processor and/or motherboard if you aren't careful.)

 

 

Update (July 9, 2017):

 

I got asked about this, so here's some data showing the phantom throttling at stock settings. The pink entries are the ones with phantom throttling.

10 billion Hex-Digit of Pi - Plouffe's 4-term BBP Formula (y-cruncher v0.7.3)

Core i9 7900X - Gigabyte AORUS Gaming 7 (BIOS F7a)

All Stock Settings

Binary: AVX2 (14-BDW) AVX512 (17-SKX)
Threads/Cores Time (secs) Clock Speed Power Max Temperature Time (secs) Clock Speed Power Max Temperature
1 thread/1 core 408.118 4.5 GHz 58 W 70°C 215.399 4.5 GHz 62 W 71°C
2 threads/2 cores 211.103 4.0 - 4.1 GHz 77 W 72°C 110.990 4.1 GHz 87 W 74°C
4 threads/4 cores 111.948 4.0 GHz 99 W 61°C 58.836 4.0 GHz 136 W 74°C
8 threads/8 cores 57.189 4.0 GHz 160 W 67°C 30.145 4.0 GHz 244 W 94°C
10 threads/10 cores 45.957 4.0 GHz 194 W 69°C 51.879 4.0 GHz 188 W 68°C
20 threads/10 cores 41.669 4.0 GHz 217 W 74°C 72.242 4.0 GHz 160 W 68°C

 

And here's the same set of benchmarks with the throttling eliminated with the appropriate BIOS settings. (Thanks to the guys on Overclock.net for helping me here.) The two benchmarks which phantom throttled before are no longer phantom throttled. But instead, they run hot enough to hit temperature throttling which has a visible drop in frequency.

10 billion Hex-Digit of Pi - Plouffe's 4-term BBP Formula (y-cruncher v0.7.3)

Core i9 7900X - Gigabyte AORUS Gaming 7 (BIOS F7a)

Package Power Limit1/2 = 400 W

CPU VCore Loadline Calibration = Medium

CPU VCore Current Protection = High

AVX and AVX512 capped at 4.0 GHz (turbo set to flat 41x, AVX + AVX512 offsets set to 1x)

All other settings left at default.

Binary: AVX2 (14-BDW) AVX512 (17-SKX)
Threads/Cores Time (secs) Clock Speed Power Max Temperature Time (secs) Clock Speed Power Max Temperature
1 thread/1 core 454.325 4.0 GHz 48 W 53°C 239.082 4.0 GHz 58 W 70°C
2 threads/2 cores 228.641 4.0 GHz 62 W 55°C 119.740 4.0 GHz 80 W 74°C
4 threads/4 cores 113.700 4.0 GHz 94 W 59°C 59.900 4.0 GHz 134 W 74°C
8 threads/8 cores 57.146 4.0 GHz 159 W 67°C 30.061 4.0 GHz 239 W 95°C
10 threads/10 cores 46.033 4.0 GHz 191 W 68°C 24.340 3.8 - 4.0 GHz 283 W 95°C
20 threads/10 cores 42.143 4.0 GHz 209 W 73°C 24.972 3.7 - 4.0 GHz 294 W 95°C

It's worth noting that there is something wrong here. At stock settings, the motherboard/BIOS is failing to apply the AVX/AVX512 offsets in most of the tests here. This allows all cores to run at 4.0 GHz under AVX512 which is causing the throttling. Furthermore, it allows individual cores to turbo up to 4.5 GHz under AVX512. In other words, the motherboard is overclocking the procesor by default.

 

The problem with my chip is that the "weakest" core cannot run AVX512 @ 4.5 GHz at default voltages. Doing so will crash (BSOD) the system. Therefore, I had to manually cap the AVX and AVX512 clocks to 4.0 GHz.

 

While I've fixed this by manually setting the AVX/AVX512 offsets, I hope that a BIOS update will fix this for everyone else who hasn't (or doesn't know to) do this. Dropping the all-core AVX512 clock speed down to 3.6 GHz was enough to avoid all throttling with the default thermal limits.

 

 

Memory bandwidth is a significant bottleneck:

 

y-cruncher was already slightly memory-bound on Haswell-E. Now on Skylake X, it is much worse. While I had anticpiated a memory bottleneck on Skylake X with AVX512, it seems that I've underestimated the severity of it:

 

(The CPU frequencies in this benchmark were chosen to be low enough to avoid any throttling or phantom throttling.)

1 billion digits of Pi - Core i9 7900X @ 3.8 GHz

Times in Seconds

Threads Memory Frequency Instruction Set
AVX2 AVX512

1 thread

2133 MHz 444.434 325.543
3200 MHz 438.432 319.737

20 threads

2133 MHz 51.884 45.658
3200 MHz 47.672 39.723

In the single threaded benchmarks, the memory frequency has less than 2% effect for both AVX2 and AVX512. Multi-threaded, that jumps to 9% and 15% respectively. This is much more than is expected for a program that used to be completely compute-bound just a few years ago.

 

 

Amdahl's law and other unknown scalability issues:

 

In a typical y-cruncher computation, only about 80% of the CPU time is spent running vectorized code when AVX2 is used. So by Amdahl's law, even if we get perfect scaling with the AVX512, we can only cut 40% off the run-time. Right now, the single-threaded benchmarks (which are least memory-bound) are only showing 27% speedup with AVX512 over AVX2.

 

This remaining 13% discrepancy is currently unresolved. Microbenchmarks of y-cruncher's AVX512 code show near perfect 2x speedups over AVX2. (Some show >2x thanks to the increased register count.) But this speedup seems to drop off as the data sizes increase - even while still fitting in cache. This seems to hint at unknown bottlenecks within the L2 and L3 caches. The fact that cache sizes haven't increased along with wider the SIMD also doesn't help.

 

For now, investigation is difficult because none of my performance profilers support Skylake X yet.

 

 

Implications for Stress-Testing:

 

y-cruncher's failure to achieve a decent speedup for AVX512 also means that it is unable to put a heavy load on the AVX512 computation units. Therefore it is not a great stress-test for Skylake X with full AVX512.

 

But there is one y-cruncher feature which seems to be unaffected - the BBP benchmark.

 

The BBP benchmark feature is contained entirely in cache is thus free of the memory bottleneck. It is able to put a much higher stress than the stress-tester and the computations. So if you run the BBP benchmark (option 4) and set the offset to 100 billion, you can still put a pretty heavy load on your AVX512-capable processor.

 

A future version of y-cruncher will revamp the stress-tester to incorporate the BBP benchmark as well as other possible improvements.

 

 

 

 

Older News

 

Records Set by y-cruncher:

y-cruncher has been used to set a number world record size computations.

 

Blue: Current World Record

Green: Former World Record

Red: Unverified computation. Does not qualify as a world record until verified using an alternate formula.

Date Announced Date Completed: Source: Who: Constant: Decimal Digits: Time: Computer:
November 15, 2016 November 11, 2016 Blog
Sponsor
Peter Trueb Pi 22,459,157,718,361 Compute:  105 days

Verify:  28 hours

Validation File

4 x Xeon E7-8890 v3 @ 2.50 GHz
1.25 TB DDR4
20 x 6 TB 7200 RPM Seagate
September 3, 2016 August 29, 2016   Ron Watkins e 5,000,000,000,000

Compute:  48.6 days

Verify:  48.7 days

2 x Xeon X5690 @ 3.47 GHz
141 GB
August 14, 2016 June 26, 2016   Ron Watkins Euler-Mascheroni Constant 477,511,832,674

Compute:  34.4 days

Not Verified

4 x Xeon E5-4660 v3 @ 2.1 GHz
1 TB
July 11, 2016 July 5, 2016   "yoyo" Golden Ratio 10,000,000,000,000

Compute:  6.2 days

Not Verified

2 x Intel Xeon E5-2696 v4 @ 2.2 GHz
768 GB
June 28, 2016 June 19, 2016   Ron Watkins Square Root of 2 10,000,000,000,000

Compute:  18.8 days

Verify:  25.2 days

2 x Xeon X5690 @ 3.47 GHz
141 GB
June 4, 2016 May 29, 2016   Ron Watkins Lemniscate 250,000,000,000

Compute:  91.7 hours

Verify:  270 hours

4 x Xeon E5-4660 v3 @ 2.1 GHz - 1TB
4 x Xeon X6550 @ 2 GHz - 512 GB
June 4, 2016 June 2, 2016   "yoyo" Golden Ratio 5,000,000,000,000

Compute:  67.9 hours

Not Verified

2 x Intel Xeon E5-2696 v4 @ 2.2 GHz
768 GB
May 25, 2016 May 18, 2016   Ron Watkins Euler-Mascheroni Constant 250,000,000,000

Compute:  35.9 days

Verify:  30.65 days

2 x Xeon E5-4660 v3 @ 2.1 GHz - 1 TB
4 x Xeon X6550 @ 2.0 GHz - 512 GB
April 24, 2016 April 18, 2016   Ron Watkins Log(2) 500,000,000,000

Compute:  12.8 days

Verify:  14.4 days

4 x Xeon X5690 @ 3.47 GHz - 141 GB
April 17, 2016 April 12, 2016   Ron Watkins Catalan's Constant 250,000,000,000

Compute:  204 hours

Verify:  207 hours

4 x Xeon E5-4660 v3 @ 2.1 GHz
1 TB
April 9, 2016 April 3, 2016   Ron Watkins Log(10) 500,000,000,000

Compute:  14.4 days

Verify:  15.2 days

2 x Xeon X5690 @ 3.47 GHz
141 GB
February 8, 2016 February 6, 2016   Mike A Catalan's Constant 500,000,000,000

Compute:  26.1 days

Not Verified

2 x Intel Xeon E5-2697 v3 @ 2.6 GHz
128 GB
December 21, 2015 December 21, 2015   Dipanjan Nag Zeta(3) - Apery's Constant 400,000,000,000

Compute:  22 days

Verify:  24 days

Xeon E5-2698B @ 2.0 GHz - 224 GB
July 24, 2015 July 22, 2015
July 23, 2015
Source Ron Watkins
Dustin Kirkland
Golden Ratio 2,000,000,000,000

Compute:  77.3 hours

Verify:  76.33 hours

Compute:  79.3 hours

Verify:  80.8 hours

4 x Xeon X6550 @ 2 GHz - 512 GB
Xeon E5-2676 v3 @ 2.4 GHz - 64 GB
October 8, 2014 October 7, 2014   "houkouonchi" Pi 13,300,000,000,000

Compute:  208 days

Verify:  182 hours

Validation File

2 x Xeon E5-4650L @ 2.6 GHz
192 GB DDR3 @ 1333 MHz
24 x 4 TB + 30 x 3 TB
December 28, 2013 December 28, 2013 Source Shigeru Kondo Pi 12,100,000,000,050

Compute: 94 days

Verify: 46 hours

2 x Xeon E5-2690 @ 2.9 GHz
128 GB DDR3 @ 1600 MHz
24 x 3 TB

See the complete list including other notably large computations.

 

If you wish to set a record, you must run two computations using different formulas (one to compute, the other to verify). Then send me the validation files, but do not make any attempt to modify them. The validation files are protected with a checksum to prevent tampering/cheating. Yes, people have tried to cheat before.

 

An exception to the "two computations rule" can be made for Pi since it can be verified using BBP formulas.

 

Note that for anyone attempting to set a Pi world record: Should the attempt succeed, I kindly ask that you make yourself sufficiently available for external requests to access or download the digits in its entirety (at least until it is broken again by someone else). Pi is popular enough that people do actually want to see the digits.

 

Features:

Aside from computing Pi and other constants, y-cruncher is great for stress testing 64-bit systems with lots of ram.

 

 

Download:

Sample Screenshot: 100 billion digits of Pi

Core i7 5960X @ 4.0 GHz - 128GB DDR4 @ 2666 MHz - 16 HDs

 

Latest Releases: (July 12, 2017)

OS Programs Download Link Size

Windows

y-cruncher + HWBOT Submitter

y-cruncher v0.7.3.9472.zip

25.0 MB

Linux (Static)

y-cruncher Only

y-cruncher v0.7.3.9471-static.tar.gz

23.9 MB

Linux (Dynamic)

y-cruncher Only

y-cruncher v0.7.3.9471-dynamic.tar.gz

16.5 MB

Windows

HWBOT Submitter Only

HWBOT Submitter v0.9.7.116.jar

2.53 MB

 

 

 

 

 

 

 

 

 

The Linux version comes in both statically and dynamically linked versions. The static version should work on most Linux distributions, but lacks Cilk Plus and NUMA binding. The dynamic version supports all features, but is less portable due to the DLL dependency hell.

 

The HWBOT submitter allows y-cruncher benchmarks to be submitted to HWBOT - which is a competitive overclocking site. It is currently only available for Windows.

 

System Requirements:

Windows:

Linux:

All Systems:

Very old systems that don't meet these requirements may be able to run older versions of y-cruncher. Support goes all the way back to even before Windows XP.

 

Version History:

 

Other Downloads (for C++ programmers):

 

Advanced Documentation:

 

 

 

 

 

Known Issues:

 

Functionality Issues:

 

Performance Issues:

So while it may be difficult to believe, Windows is currently the more suitable OS for running y-cruncher.

 

 

 

Benchmarks:

Comparison Chart: (Last updated: April 14, 2017)

 

Computations of Pi to various sizes. All times in seconds. All computations done entirely in ram.

The timings include the time needed to convert the digits to decimal representation, but not the time needed to write out the digits to disk.

 

 

Laptops + Low-Power:

Processor(s): Core i7 3630QM VIA C46501 Xeon E3-1535M v52 Core i7 6820HK Pentium N42001
Generation: Intel Ivy Bridge VIA Isaiah Intel Skylake Intel Skylake Intel Apollo Lake
Cores/Threads: 4/8 4/4 4/8 4/8 4/4
Processor Speed: 3.2 GHz 2.0 GHz 2.9 GHz 3.2 GHz 1.1 - 2.5 GHz
Memory: 8 GB - 1600 MHz 16 GB 16 GB 48 GB - 2133 MHz 4 GB
Version: v0.7.2 - AVX v0.7.2 - AVX v0.7.1 - ADX v0.7.2 - ADX v0.7.2 - SSE4.1
25,000,000 3.767 17.207 1.865 1.745 11.739
50,000,000 8.496 39.049 4.102 3.833 26.289
100,000,000 19.056 87.626 9.007 8.376 65.147
250,000,000 55.089 277.711 25.444 23.577 192.473
500,000,000 128.311 587.516 56.566 52.134 493.551
1,000,000,000 299.217 1,350.868 130.055 115.661  
2,500,000,000   3,884.838   327.784  
5,000,000,000       727.042  
10,000,000,000       1,602.565  

1Credit to Tralalak.

2Credit to Kaupo Karuse.

 

 

Mainstream Desktops:

Processor(s): Core 2 Quad Q6600 Core i7 920 FX-8350 Core i7 4770K Core i7 5775C1 Core i7 7700K2 Ryzen 7 1800X
Generation: Intel Core Intel Nehalem AMD Piledriver Intel Haswell Intel Broadwell Intel Kaby Lake AMD Zen
Cores/Threads: 4/4 4/8 8/8 4/8 4/8 4/8 8/16
Processor Speed: 2.4 GHz 3.5 GHz (OC) 4.0 GHz 4.0 GHz (OC) 3.8 GHz (OC) 4.8 GHz (OC) 3.7 GHz
Memory: 6 GB - 800 MHz 12 GB - 1333 MHz 32 GB - 1333 MHz 32 GB - 2133 MHz 16 GB - 2400 MHz 64 GB - 3000 MHz 64 GB - 2133 MHz
Version: v0.7.2 - SSE3 v0.7.2 - SSE4.1 v0.7.2 - XOP v0.7.2 - AVX2 v0.7.1 - ADX v0.7.1 - ADX v0.7.2 - ADX
25,000,000 10.591 4.998 3.598 1.678 1.730 1.271 1.566
50,000,000 23.698 11.310 8.070 3.767 3.940 2.817 3.291
100,000,000 53.502 25.268 17.675 8.207 8.739 6.198 7.279
250,000,000 157.269 74.230 50.004 22.695 25.073 17.384 20.124
500,000,000 351.470 166.724 112.364 50.442 56.343 38.176 44.189
1,000,000,000 801.731 381.903 249.087 111.593 125.967 84.432 96.368
2,500,000,000   1,119.114 729.652 316.052 369.738 238.194 273.675
5,000,000,000     1,636.260 700.029   527.186 605.845
10,000,000,000           1,151.396 1327.901

1Credit to André Bachmann.

2Credit to Oliver Kruse.

 

 

High-End Desktops:

Processor(s): Core i7 5820K1 Core i7 5960X Core i9 7900X
Generation: Intel Haswell Intel Haswell Intel Skylake Purley
Cores/Threads: 6/12 8/16 10/20
Processor Speed: 4.5 GHz (OC) 4.0 GHz (OC) 3.8 GHz (all-core AVX512)
Memory: 32 GB - 2400 MHz 128 GB - 2666 MHz 128 GB - 3200 MHz
Version: v0.7.3 - AVX2 v0.7.2 - AVX2 v0.7.3 - AVX512-DQ
25,000,000 1.287 1.044 0.695
50,000,000 2.499 2.067 1.475
100,000,000 5.401 4.329 3.110
250,000,000 14.732 12.145 8.408
500,000,000 32.294 26.060 18.326
1,000,000,000 71.225 58.598 39.589
2,500,000,000 200.323 160.576 111.995
5,000,000,000 443.543 354.845 247.849
10,000,000,000   771.584 547.678
25,000,000,000   2,156.038 1,607.553

1Credit to Sean Heneghan.

 

 

Multi-Processor Workstation/Servers:

 

Due to high core count and the effect of NUMA (Non-Uniform Memory Access), performance on multi-processor systems are extremely sensitive to various settings. Therefore, these benchmarks may not be entirely representative of what the hardware is capable of.

 

For example, enabling node-interleaving in the BIOS can improve performance by around 2x. But tweaks like these are often not possible as many of these systems corporate or university machines that are heavily locked down and do not provide the user with sufficient access privileges. Furthermore, due the exponentially large space of settings and configurations, it's often difficult to find the optimal set of settings.

Processor(s): Xeon X5482 Xeon E5-26901 Xeon E5-2683 v31 Xeon E5-2696 v42 Xeon E7-8880 v33 Epyc 76014 Xeon Gold 6130F4
Generation: Intel Penryn Intel Sandy Bridge Intel Haswell Intel Broadwell Intel Haswell AMD Naples Intel Skylake Purley
Sockets/Cores/Threads: 2/8/8 2/16/32 2/28/56 2/44/88 4/64/128 2/64/128 2/32/64
Processor Speed: 3.2 GHz 3.5 GHz 2.03 GHz 2.2 GHz 2.3 GHz 2.2 GHz 2.1 GHz
Memory: 64 GB - 800 MHz 256 GB - ??? 128 GB - ??? 768 GB - ??? 2 TB - ??? 256 GB - ?? 256 GB - ??
Version: v0.7.2 - SSE4.1 v0.6.2/3 - AVX v0.6.9 - AVX2 v0.7.1 - ADX v0.7.1 - AVX2 v0.7.3 - ADX v0.7.3 - AVX512-DQ
25,000,000 4.548 2.283 0.907 0.715 1.176 2.459 1.150
50,000,000 9.779 4.295 1.745 1.344 2.321 4.347 1.883
100,000,000 20.834 8.167 3.317 2.673 4.217 6.996 3.341
250,000,000 60.049 20.765 8.339 6.853 8.781 14.258 7.731
500,000,000 134.978 42.394 17.708 14.538 15.879 24.930 15.346
1,000,000,000 308.679 89.920 37.311 31.260 32.078 47.837 31.301
2,500,000,000 874.588 239.154 102.131 84.271 78.251 111.139 82.871
5,000,000,000 1,946.683 520.977 218.917 192.889 164.157 228.252 179.488
10,000,000,000 4,317.677 1,131.809 471.802 417.322 346.307 system instability 387.530
25,000,000,000   3,341.281 1,511.852 1,186.881 957.966 system instability 1,063.850
50,000,000,000   7,355.076   2,601.476 2,096.169    
100,000,000,000       6,037.704 4,442.742    
250,000,000,000         17,428.450    

1Credit to Shigeru Kondo.

2Credit to "yoyo".

3Credit to Jacob Coleman.

4Credit to Dave Graham.

 

 

I've been asked a few times on what benchmarks quality for these tables. But there aren't any specific rules. For the most part, I try to maximize the variety of processors on the list. So I won't put more than one system in each processor line unless they have drastically different capabilities such as core count. I also have a strong preference for systems that are at the top of their line and have as much memory as possible.

 

Perhaps the most important part is that the benchmarks are representative of the hardware. If there is any evidence of interference that may cause the hardware to perform suboptimally, they will be excluded. Examples of this include (but are not limited to), underclocking, disabled cores, disabled hyperthreading, disabled AVX, fewer than all memory channels, background programs, thermal throttling, using an outdated version of y-cruncher, etc... Some leeway is given to multi-processor servers since they are so sensitive to numerous factors.

 

Likewise, absurdly high overclocks will be excluded. These tables are meant to compare systems running at real life speeds. Benchmarks done with extreme overclocks (especially with liquid nitrogen) show go on HWBOT. Just be aware that HWBOT has stringent rules on submissions since it's competitive.

 

 

Fastest Times:

The full chart of rankings for each size can be found here:

These fastest times may include unreleased betas.


Got a faster time? Let me know: a-yee@u.northwestern.edu

Note that I usually don't respond to these emails. I simply put them into the charts which I update periodically.


Algorithms and Developments:

 

FAQ:

 

Pi and other Constants:

 

Hardware and Overclocking:

 

Academia:

 

Programming:

 

Program Usage:

 

Other:

 

Links:

Here's some interesting sites dedicated to the computation of Pi and other constants:

 

Questions or Comments

Contact me via e-mail. I'm pretty good with responding unless it gets caught in my school's junk mail filter.