Quantcast
Channel: Intel® VTune™ Profiler (Intel® VTune™ Amplifier)
Viewing all articles
Browse latest Browse all 1347

Question about performance

$
0
0
I'm writing to see if someone could help me understand an issue in our solver that recently came up while using Vtune Amplifier. I'll try and describe this here:   Using vtune amplifier we see that the time spent in a function "mucal" goes up as number of threads increase. On 8 threads, mucal is at the top of the list.   mucal is a function that calculates viscosity. This is called in the following manner.     do ijk=1,iend   mu(ijk)=mucal(ijk,iopt) end do   CFD mesh First cell index: 1 CFD mesh Last cell index:  iend   OpenMP threads split ijk index.    Inside mucal function we use 2 modules and include 6 common blocks.  Modules have arrays of size (1:iend). These are mostly 1D arrays that store velocity, pressure etc. Common blocks has mostly scalar variables but a lot of them.   To fix this, we tried the following:   (1) Instead of using array modules inside mucal, pass that ijk value to mucal function (eg. mu(ijk)=mucal(ijk,iopt,u(ijk)). This did not help. (2) Instead of including common blocks, again pass those variables to mucal function. This also did not help (3) Calculate and store mucal(ijk) in a separate new array and then re-use that array, thereby reducing number of calls to the function mucal. This helped and for 8 threads mucal was no longer at the top of the list.   My question is why does time spent in mucal increase with number of threads? Is it a combination of using common blocks and modules or something else? What's the best approach to prevent issues like this?   Thanks!

Viewing all articles
Browse latest Browse all 1347

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>