Quantcast
Channel: Intel® VTune™ Profiler (Intel® VTune™ Amplifier)
Viewing all articles
Browse latest Browse all 1347

Effective time in VTune increases with OMP_NUM_THREADS

$
0
0

Hello,

I am new to VTune and have been trying to interpret the times reported in VTune. I have a Fortran application I have been playing with to understand the effect of openmp threading on the performance of the application. I have been using VTune to identify the hotspots. To perform collection for VTune, I use the very basic commands for openmp hotspot analysis -  amplxe-cl -collect hotspots -knob analyze-openmp=true <target application>

I have been seeing some strange behavior ( or probably normal behavior that appears strange only to me due to my lack of experience with VTune). With OMP_NUM_THREADS=1, I get an effective time lower than that obtained for when OMP_NUM_THREADS=4. I have attached a couple of images for reference. There is a huge spin and overhead time associated with OMP_NUM_THREADS=4. However my understanding is that the effective time does not take into account the spin or overhead times. So I fail to understand why is the effective time so much higher for OMP_NUM_THREADS=4. Any insight into this behavior would be extremely helpful for me to better understand how to interpret the times reported in VTune. 

Please note, when running the application without VTune, and using the time command in Linux, the reported wall time for OMP_NUM_THREADS=1 matches for when the application is run using VTune. However this isn't the case for OMP_NUM_THREADS where VTune reports a higher effective time than the wall time reported by Linux.

Thank you,
Ashesh Sharma


Viewing all articles
Browse latest Browse all 1347

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>