The VTune and Advisor analyses run extremly fast when compared to my non-parallelized raw C++ code in VS2013.
Does VTune/Advisor use their own switch settings? Does the Intel analyses actually run the code or simulate it?
The code is an electromagnetic simulation and only generates matrices and has nothing to do with graphics. I have an AMD 8-core CPU with two Radeon 6990 boards (4 GPUs total because they each have 2 GPUs). I haven't parallized anything in the code yet. I need to understand the fundamentals of what I'm seeing first.
I'm using VS2013 with Intel Composer XE SP1