y-cruncher - Versions and Developments
(Last updated:
April 6, 2011)
All Versions:
Bold indicates a theoretical change in speed of the program. (due to optimizations or other changes in code)
| Date |
OS |
Version |
Download |
Relative Speed:
1 billion digits of Pi
Core i7 2600K @ 4.4 GHz* |
| April 6, 2011 |
Windows |
v0.5.5.9180 (fix 2) |
Download |
|
| February 20, 2011 |
Windows |
v0.5.5.9179 (fix 1) |
Download |
|
| February 20, 2011 |
Linux |
v0.5.5.9179 (fix 1) |
Download |
|
| February 3, 2011 |
Linux |
v0.5.5.9178 Alpha |
Download |
333.574 |
| February 1, 2011 |
Windows |
v0.5.5.9178 Alpha |
Download |
337.825 |
| August 28, 2010 |
Linux |
v0.5.4.9157 (fix 1) |
Download |
386.433 |
| August 16, 2010 |
Linux |
v0.5.4.9150 (fix 1) |
Download |
384.391 |
| August 5, 2010 |
Windows |
0.5.4.9148 (fix 1) |
Download |
369.568 |
| August 2, 2010 |
Windows |
0.5.4.9146 Alpha |
Download |
369.208 |
| May 13, 2010 |
Windows |
0.5.3.9134b (fix 2) |
Download |
369.548 |
| April 26, 2010 |
Windows |
0.5.3.9133b (fix 1) |
Download |
369.927 |
| April 15, 2010 |
Windows |
0.5.3.9132 Alpha |
Download |
369.136 |
| March 10, 2010 |
Windows |
0.5.2.9082 Alpha 3 |
Download |
407.163 |
| February 26, 2010 |
Windows |
0.5.2.9040 Alpha 2 |
Download |
407.585 |
| February 23, 2010 |
Windows |
0.5.2.9025 Alpha |
Download |
406.157 |
| January 6, 2010 |
Windows |
0.4.4.7762b (fix 2) |
Download |
424.590 |
| December 2, 2009 |
Windows |
0.4.4.7760 (fix 1) |
Download |
425.659 |
| November 18, 2009 |
Windows |
0.4.4.7748 |
Download |
424.804 |
| September 29, 2009 |
Windows |
0.4.3.7681 |
Download |
424.162 |
| August 10, 2009 |
Windows |
0.4.2.7438 |
Download |
493.460 |
| July 24, 2009 |
Windows |
0.4.1.7412 (fix 1) |
Download |
494.714 |
| July 22, 2009 |
Windows |
0.4.1.7409 |
Download |
495.984 |
| July 20, 2009 |
Windows |
0.4.1.7408 |
Download |
495.934 |
| May 14, 2009 |
Windows |
0.3.2.6953 Alpha (fix 1) |
Download |
517.452 |
| April 30, 2009 |
Windows |
0.3.2.6945 Alpha |
Download |
518.321 |
| April 17, 2009 |
Windows |
0.3.1.6897 Alpha |
Download |
524.479 |
| April 10, 2009 |
Windows |
0.2.1.6841 Alpha |
Download |
525.814 |
| Janurary 19, 2009 |
Windows |
0.1.0.6013 Alpha |
Download |
554.815 |
Click here for full version history.
*These timings are subject to many factors including background programs.
Therefore, they may or may not be truely representative of the actual relative speeds of each version.
I make no gurantees that new versions will always be equal to or faster than all previous versions. Changes such as bug fixes and major code rewrites may result in slower code.
However, I will make my best effort to ensure this. Starting from version v0.4.3, new optimizations are pooled so that they can be enabled at a later time to "compensate" for any theoretical or unexpected slow-downs due to development.
Newly Completed Developments:
These developments are either complete or are near completion. They will likely be included in the next release.
- Support for Advanced Vector Extensions (AVX) - Completed - v0.5.5
- This is the new 256-bit floating-point vectors that will be featured in upcoming Intel and AMD processors.
- Improved Digit Viewer -Completed - v0.5.5
- A near complete rewrite of the Digit Viewer. It is now much faster than before.
- Originally intended to be multi-threaded. But turns out that it is fast enough as it is (and is disk-bound).
- Partial source code may be released later.
Current Developments:
These developments are currently in progress.
- y-cruncher Multi-Precision Front-End Library (YMP) - Very Slow Progress - Lack of time...
- A publicly available dynamic-link library that exposes the internals of y-cruncher for use outside of y-cruncher.
- The target platform will be 64-bit Windows with SSE3.
- Will use an experimental "partial word" representation for large numbers to allow vectorization.
- Native support for multi-threading.
- y-cruncher v0.6.x and beyond will be based on this.
Future Developments:
These are confirmed developments that are currently not ongoing.
- Support for FMA3/4 and XOP Instruction Sets - When I get my hands on the hardware...
- Processor-Specific Tuning for Linux - Postponed until v0.6.x or later.
- The current Linux version only supports the generic and untuned x64 SSE3 architecture.
- All processor-specific binaries that are currently in the Windows version will be compiled for Linux as well.
- The difference in performance for the specialized binaries are very small. So there is no point in this for now.
- Hybridized Number-Theoretic Transform (version 2) - On Hold... Lack of free time...
- This is the implementation of the full Hybrid NTT algorithm as developed back in 2008.
- It has several theoretical advantages over the current Hybrid NTT:
- Better run-time complexity.
- More cache-friendly.
- More vectorizable.
- Canceled due to its sheer complexity and difficulty to implement.
- This is a very enticing algorithm, so I may try again to implement it in the future.
- Number-Theoretic Transform over several primes - On Hold... Lack of free time...
- A very popular algorithm that is used on the largest products where Floating-point FFT becomes impractical.
- Although this algorithm has been known to be horrifically slow in the past, there is reason to believe that it may be efficient on modern hardware.
- Everyone, uses this algorithm... So I might as well try it too...
And when I say "everyone", I mean every single major Pi program except for y-cruncher uses this algorithm...
Possible Future Developments:
These are some developments that I intend to "eventually" add to y-cruncher. If and when I do them will depend highly on my workload from school.
- Constants:
- Features:
- A small GUI launcher program for selecting the options. (the main program will still output into command line)
- All options can be modified by a simple and easy to use GUI.
- Only the option-selection will have a GUI. The computation itself will still be in the console for the sake of efficiency.
- Optimizations:
- Better Binary Splitting performance for all applicable constants.
- Re-use redundant transforms for FFTs and Hybrid NTTs.
- GCD-extraction.
- Better performance for non-power of two threads.
- NUMA-friendly implementations for all multi-threaded code.