Change minor bug.
Change to O3 optimization.
Change name of executable.
Add support for 32 bits and large number of iterations (use of long long int)
Delete executable
Changes to support long (over 2^32 iterations)
Add script for bench on each implementation.
Tiny modifications for input/output
Add MPI simple version