Modify for gcc offload implementation.
Add dynamic allocation on each inside process.
Add OpenACC with PGI compiler for Nvidia GPUs