Quantcast
Channel: Intel® VTune™ Profiler (Intel® VTune™ Amplifier)
Viewing all 1347 articles
Browse latest View live

Profile Failed

$
0
0

Hello,I profiled the sample project tachyon vc10, and got a problem.. B elow is the log:

Analyzing Debug configuration
    11/24/14 19:49:16  Profiling Debug configuration may provide misleading results. Change active configuration for performance measurements to Release to accurately reflect the behavior of your released product. 

Collection failed
    11/24/14 19:49:23  Collection failed. The data cannot be displayed. 
    [Instrumentation Engine]: SYSCALL_INSPECTOR: Too long trace in the NTDLL!NtSetContextThread function Incompatible operating system or incompatible software installed on the system Pin is exiting due to fatal error 

My environment is:

intel-i3

WIN 7 64bit 

Microsoft VS 2013

NO anti-virus program is running 

Thanks in advance, any advise would be appreciated.


Finalizing results hangs with large .pdb file

$
0
0

I am experiencing an issue with Intel VTune Amplifier.

My executable has a rather large .pdb file (~390MB). After collecting data, the resolve step hangs while resolving symbols. I can fix the problem by removing un-referenced functions from the .exe and .pdb files (Visual Studio Linker option /OPT:REF). This shrinks the .pdb file to ~140MB and the .exe file from ~22MB to ~9MB.

Is this a known issue with V-Tune? Even with very small samples; V-Tune will hang if the symbol file is large.

 

Additional Details:

I have tried using the command-line client as well as the GUI.

Compiler: VS2013

V-TUNE Version: Tried 2013u17 and 2015u1

 

 

Vtune "named user license" on both windows and linux

$
0
0

Hello,

I've read online that it is possible to use Vtune on muliple machines with a  "named user license".  Is this true for both Windows and Linux at the same time?  Can I use my "named user license" on both operating systems?

When I went to purchase Vtune I noticed there are two versions (Windows or Linux).  Both  have the same price.  So, I will I be locked to either Windows or Linux depending on which I purchase?   If a "named user license" can be used for both OSes, how do I get the application code for OS that I didn't originally purchase?

Thanks for your help.

Stephen

Error: [Instrumentation Engine]: Function IMG_FindByUnloadAddress called without holding lock

$
0
0

Hi

I have been profiling the same FORTRAN OpenMP code on 4 threads on a node comprising 2x 6-core Westmere, under the SGE batch system where I have reserved all 12 cores exclusively for my use.

Out of my 20 runs, I have one that failed with the following to standard error:

amplxe: Collection started. To stop the collection, either press CTRL-C or enter from another console window: amplxe-cl -r /mnt/iusers01/support/mccssmb2/ResearchIT/applications_support/Popelier/ferebus/vtune/various_tests/Schedule/r029hs -command stop.
amplxe: Error: [Instrumentation Engine]: Function IMG_FindByUnloadAddress called without holding lock. Call PIN_LockClient()/PIN_UnlockClient()
amplxe: Collection failed.
amplxe: Internal Error

I am running Amplifier XE version:

$ amplxe-cl --version
Intel(R) VTune(TM) Amplifier XE 2015 (build 367959) Command Line Tool
Copyright (C) 2009-2014 Intel Corporation. All rights reserved.

The code was compiled with ifort version:

$ ifort --version
ifort (IFORT) 14.0.3 20140422
Copyright (C) 1985-2014 Intel Corporation.  All rights reserved.

and the Linux version on the compute nodes is

$ uname -a && cat /etc/*release
Linux int00 2.6.32-358.18.1.el6.x86_64 #1 SMP Tue Aug 27 14:23:09 CDT 2013 x86_64 x86_64 x86_64 GNU/Linux
Alces Core HPC Configuration package release 3.0
Scientific Linux release 6.2 (Carbon)

All help appreciated. Yours, M

 

 

_init() instrumentation failed

$
0
0

I'm getting an instrumentation fault when I attempt to do a hotspot analysis. It is an evaluation copy and the binary was generated by a Go compiler. Does anyone have a clue why vtune is looking for queens_init() and what it wants to do with it? The only other possibly unrelated issue was that I had to remove my /etc/fuse.conf to work around an install seg fault but it is hard to imagine that is the problem.

Thanks in advance...

/opt/intel/vtune_amplifier_xe_2015.1.0.367959/bin64/amplxe-cl -collect hotspots -target-duration-type=veryshort -app-working-dir mumble/work/code/queens/src --search-dir sym:p=/usr/local/google/home/rlh/work/code/queens/src -- ~mumble/work/code/queens/src/queens 10

Data collection is completed with warnings
    Mon 08 Dec 2014 11:10:44 AM EST  The result file 'mumble/intel/amplxe/projects/queens/r002hs/r002hs.amplxe' is created and added to the project queens. Please see warning messages for details. 
    [2014.12.08 11:10:35] mumble/work/code/queens/src/queens _init() instrumentation failed.

Finalization completed with warnings 
    Mon 08 Dec 2014 11:10:45 AM EST  Result finalization has completed with warnings that may affect the representation of the analysis data. Please see details below. 
    Cannot find data to precompute. Skipping the precomputation step.

Cannot locate file

$
0
0

Hi,

After having spent half an hour collecting the result of a collect, VTune tells me that it cannot find a set of file, including libraries from intel compiler.

Note that VTune has access to exactly the same libraries as those available during the run since both collection an analysis are performed on the same cluster:

screen copy with the list of the file not found

 

 

Time spent in functions ?

$
0
0

Hi,

I have an application built with intel compilers (version 15) and -g -O2 mode (optim+debug info).

The application was run through amplxe-cl -collect hotspots.

When lauchin VTune on the collected info, I would expect to find (as explained in a tutorial video) the time spent in the more time consuming function. Instead, I get:

vtune hotspots bottom up

Which is not really helpful.

Also, VTune seems a little bit confused with the concept of function vs library (see image below).

Is there an alternative to VTune do profile code compiled with  Intel ? I just need the usual information, time spend in functions, loops, cache misses etc...

Regards

funct vs library

satus for perf events in perf stats

$
0
0

Hi all,

I am using Intel i5-3337U processor. I want to get several perf events for a process via PID.

However when I am running perf stat command with several events for some seconds,in some execution its giving stats but sometimes its displayed all counters as "<not counted>"..

Could you please tell me the probable reason for that and how to get rid off this problem?

 


Where do context switches come from?

$
0
0

Hi, I'm trying to analyze which line of code the preemption context switches and synchronization context switches come from in my program using VTune 2013 update 17. I have no idea how to do it. Can anyone help me out? Many thanks.

VTune installation problem

$
0
0

HI, when I'm installing VTune 2013 update 17, at the last step, an error log appeared shown below:

Warning:  no sep3_15 driver was found loaded in the kernel.
Checking for PMU arbitration service (PAX) ... not detected.
Attempting to start PAX service ...
Executing: insmod ./pax/pax-x32_64-3.13.0-32-genericsmp.ko
insmod: error inserting './pax/pax-x32_64-3.13.0-32-genericsmp.ko': -1 Unknown symbol in module

Error:  pax driver failed to load!

You may need to build pax driver for your kernel.
Please see the pax driver README for instructions.

Error: failed to start or connect to required PAX service

 

Can anyone help me out? Many thanks.

 

amplxe-runss.py looking for 32bit collector; should use 64 bit instead

$
0
0

My host system is 64 bit, my target system (xeon phi) is also 64 bit. When I run

export AMPLXE_TARGET_PRODUCT_DIR="/amplxe"
amplxe-runss.py [...]

It gives me back "/amplxe/bin32/amplxe-runss: No such file or directory" This is expected since in /amplxe/ there is only bin64 (and lib64, message).

What am I doing wrong that amplxe-runss.py looks for 32 bit when it should be looking for 64 bit? Nb: I tried running amplxe-runss.py from the bin64 folder manually instead of relying on the one set by "source /opt/intel/vtune_amplifier_xe/amplxe-vars.sh". It made no difference though.

 

 

PMU resource(s) currently being used by another profiling tool or process

$
0
0

Hi,

 

I would please your help with the following. We are using intel vtune on a cluster. I submit two different jobs that collect hardware counters  on two different nodes/boxes on our cluster. The first job runs okay, the second fails with the error   "Error: PMU resource(s) currently being used by another profiling tool or process." Now thse are two difffrened nodes on the cluster so I could not think why hardware counters cannot be used at the same time for two different jobs.

Is there a problem with installation or a vtune issue?Can anything be doen to resolve this?

Thanks a lot,

M

 

How to identify the cause for the high CPI rate

$
0
0

I was trying to identify the reason for the lateness of my program. And I notice that one function has high CPI value (4.5), and it says the reason may

  • Memory stalls 
  • Instruction starvation 
  • Branch misprediction 
  • Long latency instructions 

How can I explore those things using Vtune. Can anyone help me to identify the specific reason for the high CPI? 

I am using vtune 2015 U1 (trial version). and i am a windows user 

Profiling an application which uses SIGNALS

$
0
0

Hello,
we are using Intel VTune 2015 for profiling our application which is running under CentOS 5.11.
Our application uses c++ signals for the control flow. When trying to do a basic hotspots analysis using amplxe-cl command line tool with the following parameters: 
-duration 20 --run-pass-thru=--profiling-signal=1
VTune yields the following error message when detaching after the 20 seconds duration. Alternative to the number 1 I also tried number 4 without any change in results.

amplxe: Error: Assertion failed: handler_ex1445: obj->is_first_class_handler_set[signo] == 1 : BUG! : signo == 1. Please contact the technical support. 

Without using "--run-pass-thru=--profiling-signal=1" as parameter the signal handlers of our application do not work after profiling and the process is ended when receiving a signal.
Please provide guidance on how to use VTune in this scenario.

Best Regards,
Stephan

Performance Overhead Introduced By Tool Itself

$
0
0

Hi all,

We are now trying to evaluate this tool for our products.

Our main interest in vtune is wether is can profile apps without any overhead.

The problem with profiling is that generally the profiling code itself ads overhead:

  • Extra cycles are performed for accounting. In tight loops, it is not uncommon at all that the profiling code take more time to process than the code you are profiling. This serious messes up measurements and can make results very confusing
  • The profiling code may also mess up CPU pipelining/branch prediction,  caching, content switches (between threads) and jit. Again, this can skew the profiling result significantly.

We are hoping that maybe vtune can help on some of these issues by using CPU counters more, sometime that hopefully may allow the code to run at full speed without interruption.

Could you give us helpful comments on this concern?

Thanks in advance,

Kim.


Call stack mechanism implementation question

$
0
0

I am running a Go program with dwarf information and VTune does a good job figuring out line numbers and so forth but it struggles with stack walks. I am guessing that it is because Go's stack conventions, how Go uses EBP for example, are different than those supported by Vtune. Is there a document or some sort of clue sheet about what Vtune expects from the stack formats? Also can anyone think of a work around that doesn't require Go changing its conventions.

insufficient virtual memory

$
0
0

I am getting an error while loading a result from a general exploration experiment.  This is an MPI executable, but collection was only done for one process on each node.  See attached PNG for the error message.  Any ideas?

limit
cputime      unlimited
filesize     unlimited
datasize     4096000 kbytes
stacksize    7340032 kbytes
coredumpsize unlimited
memoryuse    1024000 kbytes
vmemoryuse   unlimited
descriptors  65536
memorylocked unlimited
maxproc      600

Intel(R) VTune(TM) Amplifier XE 2015 Update 1 (build 380310) Command Line Tool
Operating System          3.0.101-0.31.1.20140612-nasa SUSE Linux Enterprise Server 11 (x86_64)
VERSION = 11
PATCHLEVEL = 3

The result was collected using amplxe-cl -collect general-exploration -data-limit=100 ./3dh-r516+

I have tried running the GUI under gdb.  Here is a partial trace.  I don't have permission to install debug symbols on this machine.

Missing separate debuginfo for /nasa/intel/vtune_2015u1/intel/vtune_amplifier_xe_2015.1.1.380310/lib64/../lib64/../lib64/../lib64/libamplxe_file_finder_symbol_file_resolver_2.17.so
Try: zypper install -C "debuginfo(build-id)=4d9f614da242ad1479871d7dbfad96f2ea857b14"
Missing separate debuginfo for /usr/lib64/gio/modules/libgiogconf.so
Try: zypper install -C "debuginfo(build-id)=040d5845f253866ca3a688f112eb7695ed27d9da"
Missing separate debuginfo for /usr/lib64/libgconf-2.so.4
Try: zypper install -C "debuginfo(build-id)=4fa3827905f6332eb159e295a280bb1d3ad71c20"
Missing separate debuginfo for /usr/lib64/libORBit-2.so.0
Try: zypper install -C "debuginfo(build-id)=686f87bf164f077e1671c53cf7a9c0cc8b0be2d5"
Missing separate debuginfo for /usr/lib64/libdbus-glib-1.so.2
Try: zypper install -C "debuginfo(build-id)=c122a3188160b6baafdbab4395c30f971313859c"
Missing separate debuginfo for /lib64/libnsl.so.1
Try: zypper install -C "debuginfo(build-id)=1ca235c2af788c5f2505a4d7f3502c050e1c8462"
Missing separate debuginfo for /lib64/libdbus-1.so.3
Try: zypper install -C "debuginfo(build-id)=1482da7004b3bfeab07ac2f7d9bd4c1ec0a70098"
Missing separate debuginfo for /usr/lib64/gio/modules/libgiofam.so
Try: zypper install -C "debuginfo(build-id)=0755b389e9b7fea346a93a49dbf93c696f739673"
Missing separate debuginfo for /usr/lib64/libfam.so.0
Try: zypper install -C "debuginfo(build-id)=d3687487c4923fce53ba11c78a8103c657935dab"
Missing separate debuginfo for /usr/lib64/gio/modules/libgioremote-volume-monitor.so
Try: zypper install -C "debuginfo(build-id)=faf97450f43cca2c44b6dc98155c22bd283862ca"
Missing separate debuginfo for /usr/lib64/libgvfscommon.so.0
Try: zypper install -C "debuginfo(build-id)=5581c980efc0d98b142b17aad2f94b70f6b6ee35"
Missing separate debuginfo for /usr/lib64/gio/modules/libgvfsdbus.so
Try: zypper install -C "debuginfo(build-id)=3ad48cca78be85e98678248d144faa6345b669ef"
Missing separate debuginfo for /usr/lib64/libbeagle.so.1
Try: zypper install -C "debuginfo(build-id)=dd68a302550d54e69772fd36cae359f9a5ee139f"
[New Thread 0x7ff8dd7c0700 (LWP 89043)]
[New Thread 0x7ff71d7bf700 (LWP 89044)]
[Thread 0x7ff71d7bf700 (LWP 89044) exited]
[New Thread 0x7ff71d7bf700 (LWP 89045)]
[Thread 0x7ff71d7bf700 (LWP 89045) exited]
Missing separate debuginfo for /usr/lib64/gtk-2.0/2.10.0/loaders/libpixbufloader-xpm.so
Try: zypper install -C "debuginfo(build-id)=9b054e0277a6e97ee7198379695e13fa79facf75"
[Thread 0x7ff8dd7c0700 (LWP 89043) exited]
[Thread 0x7ffaa7fff700 (LWP 88898) exited]
Detaching after fork from child process 89131.
Detaching after fork from child process 89132.
[Thread 0x7ffe2ef08700 (LWP 88892) exited]
Detaching after fork from child process 89133.
Detaching after fork from child process 89134.
Detaching after fork from child process 89135.
Detaching after fork from child process 89136.
Detaching after fork from child process 89137.
[Thread 0x7ffc6ef07700 (LWP 88893) exited]
[Inferior 1 (process 88839) exited normally]
(gdb) quit

sfdump5 tool in VTune

$
0
0

There seems to be some mention of a command-line sfdump5 tool that can be used to process/view the samples within a .tb6 file. There is also documentation of this tool in the 3.11 revision of the SEP User Guide.

However, I can't seem to find this tool in the latest VTune Amplifier XE installation - C:\Program Files\Intel\VTune Amplifier XE 2015\bin32.

Has this sfdump5 tool been deprecated?

Centos7 kernel oops when running

$
0
0

When evaluating the vtune_amplifier_xe_2015.1.0.367959 on Linux I experienced a kernel oops in the vtune kernel modules. I was trying to run the microarchitecture -> general exploration -> bandwidth test. Centos 7 x86 default install updated with all patches. Code was running on SNB machine with the vtune CLI_install installed as per manual.

(CLI_install has another issues, the RHEL/Centos kernel sources are not in /usr/src/linux, installer does not pick that up automatically)
(Manual notes that power sampler should be installed but I read that it was removed earlier, update docs?)

Any ideas besides it's open source, please submit a patch? :)

code under test

compiled as user_loop (gcc 4.8.2  -g)

int main(void)
{
        volatile unsigned long i=0;
        while(i<1000000000)
        {
                ++i;
        }
        return 0;
}

crash summary

      KERNEL: /usr/lib/debug/lib/modules/3.10.0-123.el7.x86_64/vmlinux
    DUMPFILE: /var/crash/127.0.0.1-2015.01.04-12:44:17/vmcore  [PARTIAL DUMP]
        CPUS: 16
        DATE: Sun Jan  4 12:43:16 2015
      UPTIME: 01:49:45
LOAD AVERAGE: 0.10, 0.07, 0.06
       TASKS: 367
     RELEASE: 3.10.0-123.el7.x86_64
     VERSION: #1 SMP Mon Jun 30 12:09:22 UTC 2014
      MEMORY: 32 GB
       PANIC: "Oops: 0002 [#1] SMP " (check log for details)
         PID: 29144
     COMMAND: "user_loop"
        TASK: ffff8805fc9571c0  [THREAD_INFO: ffff8805fca2e000]
         CPU: 8
       STATE: TASK_RUNNING (PANIC)

 

log:

[ 4291.860357] PAX: PMU arbitration service v1.0.1 has been started.
[ 4292.902500] sep3_15: PMU collection driver v3.15.5 (EMON) has been loaded.
[ 4292.934677] sep3_15: Chipset support is enabled.
[ 4292.956584] sep3_15: IDT vector 0x21 will be used for handling PMU interrupts.
[ 4295.038257] vtss++ kernel module ("v1.4.4-367959 Intel(R) VTune(TM) Amplifier XE 2013") registered
[ 6584.773197] BUG: unable to handle kernel paging request at ffffc900183f2000
[ 6584.805419] IP: [<ffffffffa05adeab>] UNC_COMMON_PCI_Read_Counts+0x6b/0x1b0 [sep3_15]
[ 6584.841380] PGD 42f405067 PUD 83f403067 PMD 2aa331067 PTE 0
[ 6584.867465] Oops: 0002 [#1] SMP

 

bt
PID: 29144  TASK: ffff8805fc9571c0  CPU: 8   COMMAND: "user_loop"
 #0 [ffff8805fca2fa90] machine_kexec at ffffffff81041181
 #1 [ffff8805fca2fae8] crash_kexec at ffffffff810cf0e2
 #2 [ffff8805fca2fbb8] oops_end at ffffffff815ea548
 #3 [ffff8805fca2fbe0] no_context at ffffffff815daf63
 #4 [ffff8805fca2fc30] __bad_area_nosemaphore at ffffffff815daff9
 #5 [ffff8805fca2fc78] bad_area_nosemaphore at ffffffff815db163
 #6 [ffff8805fca2fc88] __do_page_fault at ffffffff815ed36e
 #7 [ffff8805fca2fd88] do_page_fault at ffffffff815ed58a
 #8 [ffff8805fca2fdb0] page_fault at ffffffff815e97c8
    [exception RIP: UNC_COMMON_PCI_Read_Counts+107]
    RIP: ffffffffa05adeab  RSP: ffff8805fca2fe60  RFLAGS: 00010002
    RAX: 0000000000000058  RBX: 0000000000000001  RCX: 0000000000000080
    RDX: 0000000000000001  RSI: ffffc900183f1f80  RDI: 0000000000000001
    RBP: ffff8805fca2fea8   R8: 0000000000000003   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000000  R12: 000000000000003f
    R13: 0000000000000040  R14: 0000000000000058  R15: ffffc900183f1f80
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
 #9 [ffff8805fca2feb0] PMI_Interrupt_Handler at ffffffffa05a3b14 [sep3_15]
#10 [ffff8805fca2ff50] SYS_Perfvec_Handler at ffffffffa05b0f85 [sep3_15]
    RIP: 000000000040050a  RSP: 00007fff61c5d0b0  RFLAGS: 00000206
    RAX: 0000000015d95a9f  RBX: 0000000000000000  RCX: 0000000000400520
    RDX: 00007fff61c5d1a8  RSI: 00007fff61c5d198  RDI: 0000000000000001
    RBP: 00007fff61c5d0b0   R8: 00007f15a1e68e80   R9: 0000000000000000
    R10: 00007fff61c5cf40  R11: 00007f15a1acea00  R12: 0000000000400400
    R13: 00007fff61c5d190  R14: 0000000000000000  R15: 0000000000000000
    ORIG_RAX: 0000000015d95a9f  CS: 0033  SS: 002b

 

A 'Failed to create sampling data base' Problem

$
0
0

I am newer to VTune, and using VTune performance analyzer v9.1 on Windows XP(Intel Pentium G3420).

When I try to log Clockticks information, the VTune always show erro that "Failed to create sampling data base. probably .tb5 files are corrupted or don't exist".

When using "Quick performace analysis wizard", the "Clockticks" column always be zero('0') while other columns seem normal.

By the way, the programma to be analysied is development with Visual Studio 2008, MFC.

Can any one help me with this.

Thank u.

 

 

 

 

 

Viewing all 1347 articles
Browse latest View live