Add dynamic allocation of each inside exploration.
Add metrology inside code. Add dynamic allocation on inside array.
Include metrology inside code. Correct mrproper in Makefile
Add OpenACC with PGI compiler for Nvidia GPUs
Correct to compile.
Add -k option to perform classical test based on if.
Correct CUDA call with direct ccbin compiler call (clang compiles but gcc over 5 not...)
Add comments on top of source files.
Tiny changes.
Wrong cast in FP64 subroutine.
Replace *PU by xPU to avoid prolems on files.
Add INT32, INT64, FP64 possibilities.
Add granularity on variables and Marsaglia RNG.
Correct tiny bug on metrology estimation.
Add C/OpenCL implementation.
Cleaning process.
Modify the "device.type" in order not to crash with POCL implementation.
Change default name of output.
Add Pi estimation.
Correct some errors on CUDA implementation.
Transform from Python 2 to Python 3
Suppress stdout messages and retreive computing time on master.
Suppress stdout messages on slaves and import elapsed on master.
Add asynchrone MPI communications under Hybrid MPI/OpenMP implementation.
Identation for XPU parsing.
Correct CUDA implementation.
Improve MPI distribution.
Add MPI and pThreads version to use multiple OpenCL devices.
Add MPI version which support multi GPU.
Change structure of call for OpenCL Metropolis Routine
Internal modifications on statistics.
Changes on micmac global variables.
Minor change on doc.
Major evolution (statistics, etc).
First revision. Crash on hybrid.
Add hybrid version.
Correct bug with NP=1
Licence definition. Change default type and RNG.
Minor changes on Maximum thread values.
Add granularity on variables types and Marsaglia RNG versions. Add licence
Add Cecill v2 licence on source code
Add granularity on variable types and Marsaglia RNG generators.
Minor change about check.
Extend granularity on size and Marsaglia RNG generators. Add both asynchrone and synchrone MPI calls.
Add granularity choice on type of counters and type of Marsaglia generator.
Support for Intel Xeon Phi
Replace synchrone to asynchrone MPI calls as in Hybrid version.
Replace synchrone to asynchrone MPI Send/Receive. At the beginning only to avoid distribution of tasksbut it was a problem on OpenIB (mlx4_core.log_mtts_per_seg=5 to add in GRUB)
Modify output to provide rates.
Convert CUDA implementation as OpenCL one.
Split MainLoop* by calls on one MainLoop
Add different Marsaglia RNG.
Minor modifications.
Add vendor print and strip output on device name.
Minor changes
Add Hybrid MPI/OpenMP version
Add comment.
Add hostname print. Correct bugs.
Exception add.
Add Simple/Double precision in Kernels. Modify output to simplify import in CSV and GNUplot
Change Pi estimation from global division to atomic division.
Add Xeon Phi support for ACCELERATOR type
Improved version.
Changes on Device selection and metrology statements.
Minor changes on output filename.
Minor changes.
Change size of iterations variables (in functions)...
Change size of Iterations variable.
Minor changes on default values for bench.
Change minor bug.
Change optimization to O3.
Change to O3 optimization.
Change name of executable.
Change atol to atoll to support long long
Add architecture information. Add better support on 32bits architecture
Minor changes on output information about architecture.
Change unsigned long to long long for 32 bits compatibility.
Add support for 32 bits and large number of iterations (use of long long int)
Delete executable
Changes to support long (over 2^32 iterations)
Add script to reduce GProf output to show source code lines.
Merge INT and LOG elements.Add sqrt and classical test on quadrant.
Improve choice of GPU/CPU.
Modify randint call domain for 32 bits machines.
Add jobs mention in output process.
Delete executables.
Modify bench to specify OMP_NUM_THREADS, needed on ARM architecture
Add Pthreads implementation.
Add script for bench on each implementation.
Tiny modifications for input/output
Add MPI simple version
Add OpenMP version.
Add Pi test for Metrology tests.