Quantcast
Channel: Intel® VTune™ Profiler (Intel® VTune™ Amplifier)
Viewing all articles
Browse latest Browse all 1347

Packed non-vectorized FP operations

$
0
0

I am using vtune 2020u0 on intel 8280 platform. I carried out an HPC characterization analysis  and was looking at the Heading of Vectorization  Section which has

Vectorization:	77.7% of Packed FP Operations
    Instruction Mix:	
    SP FLOPs:	15.4%
    Packed:	79.8%
    128-bit:	0.0%
    256-bit:	0.1%
    512-bit:	79.8%
    Scalar:	20.2%
    DP FLOPs:	0.4%
    x87 FLOPs:	0.0%
    Non-FP:	84.2%
    FP Arith/Mem Rd Instr. Ratio:	0.462
    FP Arith/Mem Wr Instr. Ratio:	1.369

 - 
checked for a detailed explanation here  , but was unable to gain clarity so asking my queries here.
From report it seems code issued packed + non packed instructions and, out of all the packed FP instructions issued during code execution, only 77.7% were vectorized - Which (AFAIK) means these instructions resulted in use of AVX/AVX2/AVX512 bit registers.

Could you please explain / refer me to an article which explains the (general) reasons for  non-vectorization of (in my case - 22.3% of packed instructions) packed instructions? and how these  packed instructions would execute (using scalar registers?)?

For example - mm256_add_ps is a packed instruction,  so could you help me in understanding that how the  add operation could be non-vectorized in following context -

float f[8]={1.0,2.0,1.2,2.1, 5.2,5.3,10.1,11.0};
__m256 v=_mm256_load_ps(&f[0]);
v=_mm256_add_ps(v,v);

The aforementioned code is not related to the code which i have profiled.


Viewing all articles
Browse latest Browse all 1347

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>