Keep old version for PGI compiler.
Modify for gcc offload implementation.
Add dynamic allocation on each inside process.
Add OpenACC with PGI compiler for Nvidia GPUs