I'm writing to see if someone could help me understand an issue in our solver that recently came up while using Vtune Amplifier. I'll try and describe this here:
Using vtune amplifier we see that the time spent in a function "mucal" goes up as number of threads increase. On 8 threads, mucal is at the top of the list.
mucal is a function that calculates viscosity. This is called in the following manner.
do ijk=1,iend mu(ijk)=mucal(ijk,iopt) end doCFD mesh First cell index: 1 CFD mesh Last cell index: iend OpenMP threads split ijk index. Inside mucal function we use 2 modules and include 6 common blocks. Modules have arrays of size (1:iend). These are mostly 1D arrays that store velocity, pressure etc. Common blocks has mostly scalar variables but a lot of them. To fix this, we tried the following:
- Instead of using array modules inside mucal, pass that ijk value to mucal function (eg. mu(ijk)=mucal(ijk,iopt,u(ijk)). This did not help.
- Instead of including common blocks, again pass those variables to mucal function. This also did not help
- Calculate and store mucal(ijk) in a separate new array and then re-use that array, thereby reducing number of calls to the function mucal. This helped and for 8 threads mucal was no longer at the top of the list.