y-cruncher Multi-Precision Library

A Multi-threaded Multi-Precision Arithmetic Library

(Powered by y-cruncher)

(Last updated: September 24, 2016)

 

Shortcuts:

YMP Library:

Large Number Objects:

Low-Level Programming:

Low-Level Programming

 

The standard functions for BigIntO and BigFloatO abstract away a lot of complexity. While they are easy to use, this abstraction comes with significant overhead - sometimes more than 50%.

 

This overhead is largely from memory allocation and page-commit. While this isn't a problem for small allocations, normal memory allocators do not attempt to pool memory when the sizes are on the order of gigabytes. Custom allocators can go a long way to solving this problem, but they still suffer from bookkeeping overhead and heap fragmentation.

 

To get around these problems, y-cruncher uses a form of "bare-metal" programming where there is no concept of RAII and all memory usage is micro-managed. This is the reason why y-cruncher's memory usage is flat throughput a computation. It computes exactly how much memory is needed, allocates it at the start of the computation, and frees it at the end.

 

Not surprisingly, this sort of bare-metal programming is very difficult and error-prone. But is by far the most efficient as it bypasses all overhead from the operating system and the memory allocation.

 

YMP exposes this bare-metal interface for users who wish to squeeze out the last 2x in performance.

 

 

Large Multiply Parameters:

 

As with nearly everything related to bignums, it all boils down to large multiplication...

A large multiply in y-cruncher/YMP currently requires 4 resources in addition to the input/output operands:

This list of resources/parameters is not static and may change in the future. So to keep things manageable in the long run, these parameters are put together into a struct called, BasicParameters.

 

Why this weird name? Because internally, there's a different one for swap mode/out-of-core operations called, SwapParameters - which currently has 7 parameters. Swap Mode probably isn't coming to the public interface any time soon. So we won't open that can of worms.

 

 

What can you do with the Parameters?

 

Any function in y-cruncher/YMP that takes these parameters promises to do the entire operation without any resource acquisition except for those triggered indirectly by the parallel framework or from exception handling. Such functions are paired with "sizing" functions that give the minimum sizes of the scratch buffer(s).

 

This gives the user complete and absolute control over all memory usage. There are no uncertainties involving size and fragmentation.

 

The sizing functions can be used for pre-planning computations and estimating resource requirements with provable upper-bounds. This is how y-cruncher does all its memory calculations in the Custom Compute menu.

 


Basic Parameters Object:

struct BasicParameters{

    const LookupTable& m_tw;

    Parallelizer& m_parallelizer;

    upL_t m_tds;

    void* m_M;

    upL_t m_ML;

};

Name Parameter Description
tw Twiddle Factor/Lookup Table

If the operation needs a lookup table, it will use this one rather than the global one.

parallelizer Parallel Framework

The framework that should be used for parallelizing tasks.

 

The Parallelizer object is a superclass of the ParallelFramework object. So you can directly cast a ParallelFramework object into a Parallelizer object.

 

This conversion cannot be made implicit since both objects are exposed as incomplete types.

tds Task Decomposition

The desired level of task decomposition for the purpose of parallelism.

How it is actually parallelized is dependent on the underlying framework.

 

In most cases, set this to the number of logical cores in the system unless you plan on parallelizing at a higher level. Setting it to 0 results in undefined behavior.

 

The task decomposition parameter is a suggestion rather than a requirement. The actual level of decomposition may vary depending on numerous factors.

{M, ML} Scratch Memory

M is a pointer to a scratch memory buffer.

The buffer must be at least ML bytes large.

 

Operations that need scratch memory will use this buffer. This buffer must be sufficiently large and be aligned to at least ALIGNMENT bytes. Failing either condition will result in undefined behavior.

 

Most operations will check their size requirement against ML and throw an exception if it is too small. But this behavior should not be relied upon. Some performance-critical operations will skip the check if it is too expensive to compute the size that is actually needed.

 

All functions that need scratch memory are paired with a sizing function that gives how large the buffer must be.

 

The ML parameter currently serves no other purpose than for optional buffer overrun detection. Setting it to -1 (largest unsigned integer) will have no effect on a correct program.

Due to the nature of this object, it will likely change between versions of YMP. Neither binary nor source compatibility is guaranteed.

 

 

Constructors:

BasicParameters::BasicParameters(

    const LookupTable& tw,

    Parallelizer& parallelizer,

    upL_t tds

);

 

BasicParameters::BasicParameters(

    const LookupTable& tw,

    Parallelizer& parallelizer,

    upL_t tds,

    void* M, upL_t ML

);

 


 

Basic Parameters Object (Owner):

For convenience, BasicParametersO is and RAII wrapper for BasicParameters that will allocate and own the scratch memory. It can also be constructed without providing a lookup table or a parallelizer. In such cases, it will use the global table and parallelizer.

 

The BasicParametersO object is used by both the BigIntO and BigFloatO classes to abstract away all the low-level stuff.

 

 

Constructors:

BasicParametersO::BasicParametersO(

    const LookupTable& tw,

    Parallelizer& parallelizer,

    upL_t tds,

    upL_t ML

);

 

BasicParametersO::BasicParametersO(

    const LookupTable& tw,

    ParallelFramework& framework,

    upL_t tds,

    upL_t ML

);

 

BasicParametersO::BasicParametersO(

    upL_t tds,

    upL_t ML

);