Hello!
I am trying to profile a C++ application with OpenMP using Intel Vtune Profiler and I've run Hotspots, Threading user-mode and hardware-based analyzes (see "hotspots", "user" and "hardware" pictures +"threads" picture from hardware analysis).
I have several questions about results of these analyzes and I ask to help me.
1) What do these results generally mean? If I'm not mistaken, Hotspots analysis revealed that most of time was spent usefully and then Threading analyzes shows the opposite.
2) What is Semaphore object in Threading user-mode analysis?
3) Why one thread has such a lot of load? ("threads" picture) Most of work is done in parallel region.
What should I do to increase parallelism of this application?
I've read the documentation: https://software.intel.com/en-us/vtune-help-windows-targets but still can't understand what's happening in my case.
Algorithm of application is simple:
#pragma omp parallel num_threads(8){
if(myID==0){
<master thread job>
}
#pragma omp for schedule(static)
<parallel cycle>
if(myID==0){
<master thread job>
}
}
Many thanks! :)
P.S. I have Windows 10 and NetBeans with MinGW compiler