Quantcast
Channel: Intel® VTune™ Profiler (Intel® VTune™ Amplifier)
Viewing all articles
Browse latest Browse all 1347

4K aliasing - what causes it in this case?

$
0
0

I am using vtune on a numerically intensive Fortran code with input parameters JD and KD which control the problem size.  When I run with input parameters JD=41 and KD=41, vtune highlighted "4K Aliasing".  This was new to me so I educated myself a bit about write-after-read hazards.  So far, so good.  Inside vtune, there are two subroutines which show 4K aliasing numbers of 1.000.  One of the subroutines is essentially this:

      SUBROUTINE DECJ  ( JPER,B,D,H,XSC,JD,KD )
      LOGICAL, INTENT (IN) :: JPER
      INTEGER, INTENT (IN) :: JD,KD
      REAL*8,  DIMENSION(JD,KD), INTENT (INOUT) :: B,D
      REAL*8,  DIMENSION(JD,KD), INTENT (IN) :: H,XSC
      INTEGER :: J,JP,JM,K
      DO K = 1,KD
      DO J = 2,JD-1
         JP          = J+1
         JM          = J-1
            B(JP,K)     = B(JP,K) - H(JP,K)*(0.5*XSC(J,K))
            D(JM,K)     = D(JM,K) + H(JM,K)*(0.5*XSC(J,K))
      ENDDO
      ENDDO

This is called twice:
      CALL DECJ  ( JPER,B,D,H,XSCP,JD,KD )
      CALL DECJ  ( JPER,BT,DT,H,XSCM,JD,KD )

The arguments here are automatic arrays in the calling routine  The calling routine has several automatic arrays, declared like this:

      REAL*8,  DIMENSION(JD,KD) :: A,B,C,D,E
      REAL*8,  DIMENSION(JD,KD) :: AT,BT,CT,DT,ET
      REAL*8,  DIMENSION(JD,KD,5) :: G
      REAL*8,  DIMENSION(JD,KD) :: H,UU,XSCP,XSCM 

My basic question is, what specifically triggers 4K aliasing in the case JD=41, KD=41 and not in the case JD=41, KD=40 (experimentally, with JD=41 and KD=40, vtune shows minimal 4K aliasing in subroutine decj, aliasing number is 0.109).

Compilation was with ifort 2015.3.187 using the options
 -O3 -axCORE-AVX2,AVX -xSSE4.2 -g -ip -pad -align -auto -fpe0 -ftz -traceback

The loop in decj is unrolled 4 times by the compiler, so presumably after unrolling it looks something like this:

          B(J+1,K) = B(J+1,K) - H(J+1,K)*(0.5*XSC(J,  K))
          D(J-1,K) = D(J-1,K) + H(J-1,K)*(0.5*XSC(J,  K))
          B(J+2,K) = B(J+2,K) - H(J+2,K)*(0.5*XSC(J+1,K))
          D(J,  K) = D(J,  K) + H(J,  K)*(0.5*XSC(J+1,K))
          B(J+3,K) = B(J+3,K) - H(J+3,K)*(0.5*XSC(J+2,K))
          D(J+1,K) = D(J+1,K) + H(J+1,K)*(0.5*XSC(J+2,K))
          B(J+4,K) = B(J+4,K) - H(J+4,K)*(0.5*XSC(J+3,K))
          D(J+2,K) = D(J+2,K) + H(J+2,K)*(0.5*XSC(J+3,K))

I did some testing and couldn't find any addresses that differed by a multiple of 4096.  The worst I could find was
some addresses that differed by a multiple of 256.

 


Viewing all articles
Browse latest Browse all 1347

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>