Quantcast
Channel: Intel® VTune™ Profiler (Intel® VTune™ Amplifier)
Viewing all articles
Browse latest Browse all 1347

Huge amount of data getting generated by Vtune

$
0
0

Hi,

We are trying to profile an application with Vtune ( Intel(R) VTune(TM) Amplifier XE 2013 (build 353306) ). It is an MPI application and for now we are running it as a single process mpi job.

We tried snb-access-contention profile with-call-stack(11 GB) and without-call-stack(18GB).

I ran them as

Without call stack: amplxe-cl -r snb-access-contention -collect snb-access-contention  -data-limit=0

With call stack: amplxe-cl -r snb-access-contention_cs -collect snb-access-contention -knob enable-stack-collection=true -data-limit=0

The log shows it uses the performance counters as follows with sampling rate in brackets

CPU_CLK_UNHALTED.REF_TSC(2000003)       

CPU_CLK_UNHALTED.THREAD(2000003)        

INST_RETIRED.ANY(2000003)

MEM_UOPS_RETIRED.ALL_STORES_PS(2000003)

MEM_UOPS_RETIRED.LOCK_LOADS_PS(100007)

 

I also used Hpctoolkit(http://hpctoolkit.org/) with similar sampling rates. E.g. CPU_CLK_UNHALTED:REF_P(2000000)

The data collected is only around 1.5 MB and if I enable tracing which gives a timeline view it goes to 15MB

Then I need to create a program structure file which is around 25Mb this can be kept as a common file for different counters.

Is there some sort of hint why data collected is comparably so huge for vtune? Ours is a sandybridge machine.

I cannot use the pause/resume API because I cannot change the source code.  https://software.intel.com/en-us/articles/how-to-call-resume-and-pause-api-from-fortran-code

Thank you

Sriraj


Viewing all articles
Browse latest Browse all 1347

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>