Quantcast
Channel: Intel® VTune™ Profiler (Intel® VTune™ Amplifier)
Viewing all 1347 articles
Browse latest View live

Blue Screen when using vtune after upgrading to Windows 10

$
0
0

Hi

In my company, I'm currently evaluating vtune. In the process I upgraded my machine from windows 8.1 to windows 10. Since then, any kind of analysis requiring the kernel driver (Advanced hotspot analysis for example) triggers a blue screen. To be more precise - I start my app in paused mode and the blue screen happens as soon I press the "resume" button in vtune, I end up in the blue screen. The error code I get is not always the same, but I have seen IRQ_NOT_LESS_EQUAL now several times, but there have been others as well. 

Is this a known issue or is there anything I can do to solve or assist? 

Thank you 

Benjamin Schindler


blue screen and error irql not less or equal when starting Advanced hotspots

$
0
0

I am using Vtune Amplifier XE 2106 (build 424694), Win8.1

When starting Advanced hotspots analysis my laptop ALWAYS crashes with blue screen and above error msg.

When using Basic analysis everything works fine

Any ides how to fully resolve this issue?

Thanks

Unable to collection openmp* performance with VTune

$
0
0

hi,

Currently,I want profiling openmp* in my programs,so i write a test program,then compile it with icpc.run .but ,unfortunate VTune do not collect any openmp informations.

follow is my steps:

1、edit a sample test code,

2、edit makefile,

ADD_DEFINITIONS(-g -O2 -openmp),

SET(CMAKE_C_COMPILER icc)SET(CMAKE_CXX_COMPILER icpc)

TARGET_LINK_LIBRARIES(OMPMKL libiomp5.so mkl_rt)

3、make

4、run on debian 6(elapsed time : about 20s.),use command line :

amplxe-cl -collect advanced-hotspots -k collection-detail=stack-sampling -data-limit=0 --target-pid=20540.and the console print:

amplxe-cl -collect advanced-hotspots -k collection-detail=stack-sampling -data-limit=0 --target-pid=20540
amplxe: Collection started. To stop the collection, either press CTRL-C or enter from another console window: amplxe-cl -r /UIH/bin/r028ah -command stop.
amplxe: Collection detached.
amplxe: Collection stopped.
amplxe: Using result path `/UIH/bin/r028ah'
amplxe: Executing actions 16 % Resolving module symbols
amplxe: Warning: Cannot locate file `[vsyscall]'.
amplxe: Executing actions 16 % Resolving information for `libiomp5.so'
amplxe: Warning: Cannot locate debugging symbols for file `/usr/lib/libstdc++.so.6.0.13'.
amplxe: Executing actions 17 % Resolving information for `libdl-2.11.3.so'
amplxe: Warning: Cannot locate debugging symbols for file `/lib/libdl-2.11.3.so'.
amplxe: Executing actions 17 % Resolving information for `libpthread-2.11.3.so'
amplxe: Warning: Cannot locate file `vtsspp.ko'.
amplxe: Executing actions 17 % Resolving information for `vtsspp'
amplxe: Warning: Cannot locate debugging symbols for file `/lib/libpthread-2.11.3.so'.
amplxe: Executing actions 17 % Resolving information for `libmkl_intel_thread.s
amplxe: Warning: Cannot locate debugging symbols for file `/lib/libc-2.11.3.so'.
amplxe: Executing actions 18 % Resolving information for `ld-2.11.3.so'
amplxe: Warning: Cannot locate debugging symbols for file `/lib/ld-2.11.3.so'.
amplxe: Executing actions 50 % Generating a report

Collection and Platform Info
----------------------------
Parameter                 r028ah
------------------------  -----------------------
Application Command Line
Operating System          2.6.32-5-amd64 6.0.10
Computer Name             debian-irip-test1
Result Size               68809086
Collection start time     02:02:34 16/12/2015 UTC
Collection stop time      02:02:58 16/12/2015 UTC

CPU
---
Parameter          r028ah
-----------------  -----------------------------------
Name               Intel(R) Xeon(R) E5/E7 v2 processor
Frequency          2793150674
Logical CPU Count  20

Summary
-------
Elapsed Time:       23.295
CPU Time:           78.232
Average CPU Usage:  3.466
CPI Rate:           0.390

Event summary
-------------
Hardware Event Type       Hardware Event Count:Self  Hardware Event Sample Count:Self  Events Per Sample
------------------------  -------------------------  --------------------------------  -----------------
INST_RETIRED.ANY                       559902227941                            109204  2800000
CPU_CLK_UNHALTED.THREAD                218545613987                            109312  2800000
CPU_CLK_UNHALTED.REF_TSC               218514693572                            109085  2000003
amplxe: Executing actions 100 % done

5、because i don't have linux gui,so copy the result to a windows7,and open result by VTune(windows version).

I did not find any openmp informations.

versions: icpc version 15.0.0 (gcc version 4.4.5 compatibility);VTune Amplifier XE 2015;

Intel(R) OMP Copyright (C) 1997-2014, Intel Corporation. All Rights Reserved.
Intel(R) OMP version: 5.0.20140611
Intel(R) OMP library type: performance
Intel(R) OMP link type: dynamic
Intel(R) OMP build time: 2014-06-13 19:14:45 UTC
Intel(R) OMP build compiler: Intel C++ Compiler 14.0
Intel(R) OMP alternative compiler support: yes
Intel(R) OMP API version: 4.0 (201307)
Intel(R) OMP dynamic error checking: no
Intel(R) OMP thread affinity support: not used
Intel(R) OMP debugger support version: 1.1

i can't understand what's wrong with me,somebody help me!

thank you very much.

Collect arguments for a certain function call

$
0
0

Hi,

I am analyze my workload which has a very hot function. I wonder if I can use Vtune to report what arguments are passed into this function? For example, if I have a function func() take one character as an input, 

argument "a" : 10 times

argument "b" : 100 times

argument "c": 50 times

 

I try to google the answer but there seems to be no such solution, no matter in Vtune or other tools.The problem is I want to do this without changing the source code.

Thank you very much

 

 

 

 

 

VTune_Amplifier_XE_2016_update1 cannot profile GPU

$
0
0

Hello,

My former VTune is VTune_Amplifier_XE_2015_update1 and everything is fine.

However, after I installed the latest driver and VTune_Amplifier_XE_2016_update1, I cannot profile Platform Analysis/ CPU/GPU Concurrency.

I run the VTune in Administrator mode and every time I run it, it appears the screen in the attachment.

It writes "BUG: . Please contact the technical support."

What should I do?

AttachmentSize
Downloadimage/pngerror20151224.PNG23.92 KB

generate a report via command line for "platform" informations.

$
0
0

Hi there,

I'm using Intel Amplifier VTune for performance measurements of Java Applications. Is it possible to generate a report via command line like the "Platform" Tab in the UI? I need to know how many threads my programm use and how much time each thread uses for computation (CPU), Idling and synchronization. The "Platform" tab in the ui show most of the information, but i would like to generate a report via command line.

Thanks

Sven

vtune command line on mic native mode to analysis file-IO application

$
0
0

Hi Friedns:

this problem like this:

1)I have a large data file(at /home/datafile),my mic native mode application need to read the data from disk,then do many operations to that.than write back.

2)I need to use intel vtune 2016 to collect this application from command line.

So 1)how can I read data from disk on mic,2)how to use vtune collect this from command line.

Thanks

Yongbei

Running remote target application with root permissions

$
0
0

Hi,

I'm using VTune Aplifier XE 2016 update 2. I installed a collector on an Ubuntu 12.04 Server machine and a full installation on an Ubuntu 12.04 Desktop machine. I successfully set up a password-less SSH from the host to the target.

The application I'm trying to run remotely requires root permission so I run it using sudo. I can locally run the application on the target machine by doing

sudo amplxe-cl <options> <target> <target options>

However, I don't seem to be able to run this application remotely using amplxe-gui, probably due to the root permissions. How can I configure the project in amplxe-gui to be able to do that?

Thanks,

  Oren


Unable to see function names in analysis windows

$
0
0

First some background information:

VTune version: Intel(R) VTune(TM) Amplifier XE 2016 (build 444464)

host and target systems:  Linux 3.16.0-41-generic #57~14.04.1-Ubuntu SMP Thu Jun 18 18:01:13 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

----

I'm running a process on my target system called /usr/bin/ccc_mgr.  It's running as a service, and I've compiled and built it using the -g -O3 options using this command:

/usr/bin/c++   -std=c++11 -Wall -Wextra -Wno-unused-parameter -fPIE -g -O3 -mtune=core-avx2 -march=corei7-avx

The application uses man, many c++ std::threads if that matters.

  I collect data on it using this command:

    amplxe-cl --target-install-dir=/opt/intel/vtune_amplifier_xe_2016.2.0.444464 -collect advanced-hotspots --target-pid <insert_pid_here>

Then I import the resulting data into the VTune GUI and look at the top down analysis, and instead of seeing my source code names for functions I instead get names like this:

    func@0x4f0c99

 

The collection log window gives complaints about not being about to find symbols for things like libpthread, but no warnings for my application.

---

As a test I compiled the tachyon example c++ on my target machine and followed the same process that I used for my application.  I'm able to see all source code symbols in the tachyon example, so it's definitely an issue with my application, but I can't figure out what I'm missing.

 

Any thoughts or suggestions ?

-Brian 

 

Is there a way to profile php interpreted code?

$
0
0

Hi,

I am looking for a way to profile php interpreted code. I suppose VTune don't offer native support for it but I wander if there are some APIs to be integrated in the Zend PHP engine for this purpose.

I had a look on JIT Profiling API as I was hooked by the title: "Profiling Runtime Generated and Interpreted Code with Intel® VTune™ Amplifier". Unfortunately it seems that the library might work for native JIT code but is useless for the case where there is no JIT in place and the bytecode is purely interpreted (meaning that VTune sampling interrupts will occur only in the PHP engine code, not in the interpreted bytecode).

Is there a way to configure some interpreter engine callbacks in VTune for providing the bytecode/source code related information? The APIs used for .NET & Java VM's for managed code profiling could they be reused somehow for other kind of VMs? Other options?

Many Thanks,

Bogdan

 

I cannot see all functions in my C code when Advanced Hotspots are used

$
0
0

Hello,

I use Windows 8.1, Visual Studio 2012 and 2013, Intel Parallel Studio XE 2015, Intel VTune Amplifier XE 2015. I created a c/c++ project, want to see assembly code for Release. I use Advanced Hotspots and would like to see source/assembly code for some of my functions. However when I double click on a function I need to access I either go to a place where this function is called from (4.png) or to a place which is not related to this function (3.png). However I expect to see code as in 2.png + information provided by amplifier.

Debug/Liner settings can be found in the other png files.

I am not sure but when I used Intel C++ 13.0 there were not such problems.

I guess there should be some settings either for the compiler or Amplifier I need to adjust in order to see the code in VTune Amplifier.

Options related OpenMP/Intel Processor-specific Optimization do not change the behaviour.

Kind regards,

Sofya

 

 

AttachmentSize
Downloadimage/png1.png147.05 KB
Downloadimage/png2.png100.58 KB
Downloadimage/png3.png41.85 KB
Downloadimage/png4.png42.25 KB
Downloadimage/png5.png54.74 KB
Downloadimage/png6.png55.52 KB
Downloadimage/png7.png62.09 KB
Downloadimage/png8.png61.06 KB
Downloadimage/png9.png56.46 KB
Downloadimage/png10.png54.14 KB
Downloadimage/png11.png54.13 KB
Downloadimage/png12.png56.38 KB

Poor usage

$
0
0

Hi,

I am evaluating VTune for the first time. I am running Basic or Advanced Hotspots on Linux with Intel (R) Xeon (R) CPU D-1540 @ 2.00GHz with 8 CPU cores. My application (written in C) is in an initial stage and currently has an affinity to a specific core which means the same code is not running in more than 2 cores. Only 4 cores are being used at the moment. A main loop runs in 2 cores and the other 2 cores are dedicated for some other purpose.

Main loop is a simple while loop which processes an incoming frame. If there is no traffic, it has nothing to process. I am currently running VTune analysis without any frames and it shows poor usage of all the functions.

Questions:

1. How to use VTune meaningfully in the above said scenario ? This experiment is to evaluate if VTune helps in the long run.

2. Currently everything is red/poor-usage. Help says "by default, poor usage is when the number of simultaneously running CPUs is less than or equal to 50% of the target CPU usage". Does it mean VTune is going to display poor usage always for single threaded applications on a multi-core processor ? Eg: if only one core (out of 8 cores) is running an application, does it always show the analysis as poor (in red) ?

Thanks in advance for your help!

 

 

 

4K aliasing - what causes it in this case?

$
0
0

I am using vtune on a numerically intensive Fortran code with input parameters JD and KD which control the problem size.  When I run with input parameters JD=41 and KD=41, vtune highlighted "4K Aliasing".  This was new to me so I educated myself a bit about write-after-read hazards.  So far, so good.  Inside vtune, there are two subroutines which show 4K aliasing numbers of 1.000.  One of the subroutines is essentially this:

      SUBROUTINE DECJ  ( JPER,B,D,H,XSC,JD,KD )
      LOGICAL, INTENT (IN) :: JPER
      INTEGER, INTENT (IN) :: JD,KD
      REAL*8,  DIMENSION(JD,KD), INTENT (INOUT) :: B,D
      REAL*8,  DIMENSION(JD,KD), INTENT (IN) :: H,XSC
      INTEGER :: J,JP,JM,K
      DO K = 1,KD
      DO J = 2,JD-1
         JP          = J+1
         JM          = J-1
            B(JP,K)     = B(JP,K) - H(JP,K)*(0.5*XSC(J,K))
            D(JM,K)     = D(JM,K) + H(JM,K)*(0.5*XSC(J,K))
      ENDDO
      ENDDO

This is called twice:
      CALL DECJ  ( JPER,B,D,H,XSCP,JD,KD )
      CALL DECJ  ( JPER,BT,DT,H,XSCM,JD,KD )

The arguments here are automatic arrays in the calling routine  The calling routine has several automatic arrays, declared like this:

      REAL*8,  DIMENSION(JD,KD) :: A,B,C,D,E
      REAL*8,  DIMENSION(JD,KD) :: AT,BT,CT,DT,ET
      REAL*8,  DIMENSION(JD,KD,5) :: G
      REAL*8,  DIMENSION(JD,KD) :: H,UU,XSCP,XSCM 

My basic question is, what specifically triggers 4K aliasing in the case JD=41, KD=41 and not in the case JD=41, KD=40 (experimentally, with JD=41 and KD=40, vtune shows minimal 4K aliasing in subroutine decj, aliasing number is 0.109).

Compilation was with ifort 2015.3.187 using the options
 -O3 -axCORE-AVX2,AVX -xSSE4.2 -g -ip -pad -align -auto -fpe0 -ftz -traceback

The loop in decj is unrolled 4 times by the compiler, so presumably after unrolling it looks something like this:

          B(J+1,K) = B(J+1,K) - H(J+1,K)*(0.5*XSC(J,  K))
          D(J-1,K) = D(J-1,K) + H(J-1,K)*(0.5*XSC(J,  K))
          B(J+2,K) = B(J+2,K) - H(J+2,K)*(0.5*XSC(J+1,K))
          D(J,  K) = D(J,  K) + H(J,  K)*(0.5*XSC(J+1,K))
          B(J+3,K) = B(J+3,K) - H(J+3,K)*(0.5*XSC(J+2,K))
          D(J+1,K) = D(J+1,K) + H(J+1,K)*(0.5*XSC(J+2,K))
          B(J+4,K) = B(J+4,K) - H(J+4,K)*(0.5*XSC(J+3,K))
          D(J+2,K) = D(J+2,K) + H(J+2,K)*(0.5*XSC(J+3,K))

I did some testing and couldn't find any addresses that differed by a multiple of 4096.  The worst I could find was
some addresses that differed by a multiple of 256.

 

How does VTune collect HPC data?

$
0
0

Hi,

I am new to VTune and was interested in knowing how does VTune collect the performance counter data? Does vTune capture counters while the OS code is executing? Are the counters being updated in user space or in kernel space?

Thanks.

VTSS.SYS error installing Parallel Studio XE

$
0
0

[Moved from Fortran forum]

I ran into a a different problem while installing PSXE 2016 update 1 Cluster Edition. I'm currently running Windows 8.1 pro and I have visual studio 2012 Ultimate installed. I also installed Windows Software Development Kit for Windows 8.1 as the installation guide suggested. However, I ran into crash at the middle of the installation and get the blue screen error "KMODE_EXCEPTION_NOT_HANDLED (VTSS.SYS)" Please help. 

 


/lib32/ld-2.19.so _init() instrumentation failed

$
0
0

I use tried to use vtune 2013/2015 to get info from some pin-based simulator on ubuntu12.03.

/opt/intel/vtune_amplifier_xe_2013/bin64/amplxe-cl -collect hotspots -app-working-dir /home/parsec -- /home/parsec/run.parsec.sh

The result:

amplxe: Collection started. To stop the collection, either press CTRL-C or enter from another console window: amplxe-cl -r /home/zsim/r000hs -command stop.
amplxe: Warning: [2016.01.20 16:19:07] /lib32/ld-2.19.so _init() instrumentation failed.
amplxe: Warning: [2016.01.20 16:19:07] /lib32/ld-2.19.so _init() instrumentation failed.
amplxe: Warning: [2016.01.20 16:19:07] /lib/x86_64-linux-gnu/ld-2.19.so _init() instrumentation failed.
amplxe: Warning: [2016.01.20 16:19:07] /lib/x86_64-linux-gnu/ld-2.19.so _init() instrumentation failed.
amplxe: Warning: [2016.01.20 16:19:07] /lib/x86_64-linux-gnu/ld-2.19.so _init() instrumentation failed.

E:Could not instrument process 4703: need execute and read access to /proc/4703/exe
[SNIPER] End
[SNIPER] Elapsed time: 0.58 seconds

amplxe: Error: Failed to attach to the specified target process. Please make sure the process exists and VTune Amplifier process has enough permissions to attach to the target process. See the Troubleshooting help topic for more details.
amplxe: Error: [Instrumentation Engine]: Attach to pid 4960 failed: Operation not permitted 
amplxe: Collection stopped.
amplxe: Internal Error

What may cause this error? I run vtune with root and how can I give it more permission?

Capturing program stdout

$
0
0

Does the VTune gui capture program stdout in its reports anywhere? I need to check that the optimisations I am performing are not changing the results but I can't see them from the gui. At the moment they're being printed to stdout in the terminal from which I ran amplxe-gui which isn't ideal, so I was just wondering if there was another way of capturing them.

Download missing dlls from symbol server

$
0
0

I'm trying to analyze a result from another PC, but VTune can't locate ntdll.dll and other windows dlls. I think the dlls don't match because the PC that generated the result has fewer windows updates installed than the PC that analyzes the result. I tried setting _NT_SYMBOL_PATH and configuring Microsoft symbol servers in VTune itself but couldn't make VTune download the dlls. Does VTune support downloading dlls from symbol servers?

Examining the serialized memory access effect, in multi-threaded softwares

$
0
0

hello everyone,

 

I am working on a multi-threaded video encoder application (x265).

I need to prove that, while increasing the number of threads can improve the total run-time, after certain number of thread (cores), it will cause in insufficient memory resources. that is to say, concurrent memory accesses from different cores, will lead to a queue of request at the DRAM, so the delay from the memory can affect the performance.

1- what do you think is the best method to get these results?

2- I have performed y tests with 2,4, and 8 threads (cores) on my machine (intel ivy bridge i7), on memory access analysis mode. But while the "Memory Latency" factor in vtune starts to increase (2threads: 0.048, 4threads: 0.533, 8threads: 0.735), the "Average Latency (cycles)" remains almost constant (around 11 or 12). why do you think that happens? because I think the average latency should've increased due to longer DRAM access time. can anyone please tell me what "Average Latency (cycles)" and "Memory Latency" exactly are? does the average latency take into account the memory latency too?

 

 

thanks in advance,

Farhad

Collect locks and waits data failed

$
0
0

Hi,

I run vtune on linux (redhat 7.1) ,use the gui tool. when I collected the locks and waits type, collection can not continue:

Application sets its own handler for signal 38 that is used for internal needs of the tool.

but other type of analysis works well.

what's the problem and how to fixed it ? I want to collect locks and waits data of my system.

thank you.

Viewing all 1347 articles
Browse latest View live


Latest Images

<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>