I am new to VTune and I am using it to analyse a simple algorithm like finding min and max of a large array. I used the microarchitecture exploration mode hoping to investigate if it is memory bound or front end bound. So in this situation, what does it mean to be 100% core bound? Or more likely, what am I doing wrong while measuring? Sorry if I am asking a stupid question and thank you for taking the time to read my question.
Other info:
Linux kernel version: 4.14.81.bm.21-amd64
LSB: Debian GNU/Linux 9.12 (stretch)
Model name: Intel(R) Xeon(R) Gold 6130 CPU @ 2.10GHz (Skylake)
Code I am running, the source array can go up to 1GB (== source size of 2^28 as each data is 4 bytes):
__attribute__((noinline)) void findMinMax(int32_t *src, uint32_t src_size, int32_t &min, int32_t &max) { min = src[0]; max = src[0]; for (int i = 1; i < src_size; i++) { auto current = src[i]; if (current < min) { min = current; } if (current > max) { max = current; } } }
The executable is also compiled using GCC 9.3 with -O3 flag. I also have tried different levels of optimisation flags, including no optimisation flag, and all of them yield 100% core bound.
I also have tried on various other simple algorithms and they seem to give the same 100% core bound issue.