Minor change about check.
Extend granularity on size and Marsaglia RNG generators. Add both asynchrone and synchrone MPI calls.
Add granularity choice on type of counters and type of Marsaglia generator.
Support for Intel Xeon Phi
Replace synchrone to asynchrone MPI calls as in Hybrid version.
Replace synchrone to asynchrone MPI Send/Receive. At the beginning only to avoid distribution of tasksbut it was a problem on OpenIB (mlx4_core.log_mtts_per_seg=5 to add in GRUB)
Modify output to provide rates.
Convert CUDA implementation as OpenCL one.
Split MainLoop* by calls on one MainLoop
Add different Marsaglia RNG.
Minor modifications.
Add vendor print and strip output on device name.
Add print of Platform vendor on startup
Suppress extra explorations to keep only atomic ones...
Suppress function with no use.
Add Distributed splutter version with atomic version of increment.
Minor changes
Add Hybrid MPI/OpenMP version
Add CUDA version for Sparse a Dense exploration.
Add Sparse mode.
Correct size of allocation.
Add boundaries.
Add comment.
Remove log files.
Add Splutter version to test memory access and effenciency of RNG
Add hostname print. Correct bugs.
Exception add.
Add selection of platform/device under OpenCL
Clean clBLAS portion code.
Add clBLAS for OpenCL use.
Add Simple/Double precision in Kernels. Modify output to simplify import in CSV and GNUplot
Change Pi estimation from global division to atomic division.
Add Xeon Phi support for ACCELERATOR type
Improved version.
Changes on Device selection and metrology statements.
Minor changes on output filename.
Add date and hostname in bench output folder.
Minor changes on bench.
Minor changes.
Change size of iterations variables (in functions)...
Change size of Iterations variable.
Minor changes on default values for bench.
Change minor bug.
Change optimization to O3.
Change to O3 optimization.
Change name of executable.
Change atol to atoll to support long long
Add architecture information. Add better support on 32bits architecture
Minor changes on output information about architecture.
Change unsigned long to long long for 32 bits compatibility.
Add support for 32 bits and large number of iterations (use of long long int)
Delete executable
Changes to support long (over 2^32 iterations)
Add script to reduce GProf output to show source code lines.
Merge INT and LOG elements.Add sqrt and classical test on quadrant.
Add makefile for Debian default Nvidia package
Add Ising model.
Improve choice of GPU/CPU.
Modify randint call domain for 32 bits machines.
Add jobs mention in output process.
Delete executables.
Modify bench to specify OMP_NUM_THREADS, needed on ARM architecture
Add Pthreads implementation.
Add script for bench on each implementation.
Tiny modifications for input/output
Add MPI simple version
Add OpenMP version.
Add Pi test for Metrology tests.
Add OpenBLAS support
Change to all versions.
Add Double support for CuFFT
Add FFT in 2 dimensions with FFTW and CuFFT librairies
Importation de la version originelle