Change optimization to O3.
Change atol to atoll to support long long
Add architecture information. Add better support on 32bits architecture
Delete executables.
Modify bench to specify OMP_NUM_THREADS, needed on ARM architecture
Add script for bench on each implementation.
Tiny modifications for input/output
Add OpenMP version.