Problems using OSX VTune

March 28, 2019, 4:42 pm

Latest and popular articles on Intel Technologies

≫ Next: Unable to create threads when running under vtune

≪ Previous: vtune community edition not working for skylake

Just tried OSX VTune (to profile a remote Linux host) for the first time. The overall experience started out really nicely. Then it went completely off the rails. Here's what happened and some suggestions to improve things:

1) setup of project went well

2) clicked "start and pause", and after a couple of seconds it showed a message saying that something went wrong and I needed to run the 'status' command -- without saying how

3) no sign of any status command in the GUI, searched docs and found out that amplxe-cl is needed

4) tried to run amplxe-cl on command line (no luck)

5) searched Mac and remote machine for amplxe-cl, with no luck

6) past experience with Mac dev tools led me to peek into the VTune OSX package manually, discovering the bash script to setup the command line tools -- pretty terrible user experience, especially since the application has all sorts of spaces in it and a .app extension that isn't obvious

7) tried status command with amplxe-cl, but it says it doesn't have such an option and sure enough its printed list of options doesn't include status like the docs imply it should -- why doesn't it have this option?

8) found dialog in GUI where the command being run is shown -- copy&pasted to terminal and ran it. This produced a bunch of output from the Intel tool, include a note that no data was collected. No clue as to why.

9) noticed that the remote command it was using was printed -- copy&pasted that to remote host ssh session and ran it. This (finally) showed me my own program's output, and instantly _why_ it was failing almost as soon as it started. Why wasn't my program's output captured and shown in the OSX GUI?! Would have solved the problem in seconds instead of this wild goose chase through multiple command lines.

10) The results of the command line run don't appear in the GUI for some reason. Re-run from the GUI now that the issue is resolved. This time it worked and said please wait "a few moments" (which turned out to be about 10 minutes).

The VTune user experience, especially for remote profiling, has come very far. Too bad it fell flat for me immediately upon trying to use it the first time from a Mac.

↧

Unable to create threads when running under vtune

April 2, 2019, 9:47 am

Latest and popular articles on Intel Technologies

≫ Next: Questions about "context switches" in Hotspots analysis

≪ Previous: Problems using OSX VTune

I have an MPI application that I am attempting to profile using VTune. No one at my company has experience with VTune, so I'm at a loss as to how to proceed, because I can't get it working.

My corporation has a cluster that we use for development purposes. VTune is installed, and has been used by others in the corporation (not my immediate company though) successfully.

When I run the application with any collect type other than hotspots, my application is unable to create threads using the pthreading library. The pthread_create() call returns an error status, which translates to "Resource temporarily unavailable". This vague error message leaves us baffled.

I would appreciate it if anyone has any ideas on what could be going on. Thank you.

Dalon

↧

Questions about "context switches" in Hotspots analysis

April 3, 2019, 1:41 am

Latest and popular articles on Intel Technologies

≫ Next: checking whether Vtune has been attached to a process inside the code

≪ Previous: Unable to create threads when running under vtune

I have a bunch of question about context switches info in Hotspots analysis.

1. In Hotspots->Top-down Tree view:

A. What is the difference between "Context Switch Count: Total" and "Context Switch Count: Self"?

B. Is it counting switch-in, or switch-out, or both?

2. In Hotspots->Bottom-up Tree view, context switch info is shown on the floating menu around cursor

However it appears difficult to understand.

A. For blocks colored with CPU time, it does not tell which CPU it is running on.

How do I know the CPU it is running on? Should I refer to the previous block or next block for the CPU?

B. For blocks colored with synchronization, sometimes there are multiple context switches info shown.

What does it mean?

C. Is the context switches switch-in or switch-out?

D. For blocks colored with preemption, why there are CPU time shown for the thread?

E. For context switched, how do I know which thread is switched to?

↧

checking whether Vtune has been attached to a process inside the code

April 4, 2019, 8:49 am

Latest and popular articles on Intel Technologies

≫ Next: how to get the pt trace packet

≪ Previous: Questions about "context switches" in Hotspots analysis

Hi all,

I am successfully using Vtune Amplifier with attaching to a process feature. I was just wondering if there is any way either with ITT or any flag that I can pause the code until Vtune is attached. I know, it's doable by setting an environment variable and pausing the code until it gets set but I was looking for a cleaner approach. Thank you.

Regards,

Hossein

↧

how to get the pt trace packet

April 9, 2019, 6:23 am

Latest and popular articles on Intel Technologies

≫ Next: problem Installing vTune on win10/x64

≪ Previous: checking whether Vtune has been attached to a process inside the code

I learned that Intel® Processor Trace (Intel® PT) is supported in Vtune .

Intel PT offers control flow tracing, which includes in data packets timing and program flow information (e.g. branch targets, branch taken/not taken indications) .

How can I get these packets by using this driver? I didn't find these data.

Thanks ahead of time for any help.

↧

problem Installing vTune on win10/x64

April 14, 2019, 11:02 pm

Latest and popular articles on Intel Technologies

≫ Next: Analyse Memory Consumption of an application

≪ Previous: how to get the pt trace packet

Hi,

I have successfully downloaded vTune "VTune_Amplifier_2019_update3_setup" for my win10/x64 machine. however, when I double-click the install icon, nothing happens. I have done the followings with no luck:

- executed through cmd with admin rights

- re-downloaded and tested

- tested different versions

any help is much appreciated.

Regards

Hoss

↧

Analyse Memory Consumption of an application

April 24, 2019, 1:13 am

Latest and popular articles on Intel Technologies

≫ Next: Result path not found

≪ Previous: problem Installing vTune on win10/x64

I'm trying to analyse the memory consumption of our application with VTune Amplifier 2019, collected data and wanted to find out in the "bottom -up" timeline with "filtered in by selection" for a peak all the memory which is at that point allocated by which function.

Is this possible or must i select all steps in the time line and sum that up? The total memory consumption is only be shown in the timeline or is it possible to get a more precise value?

thank you

Claudio

↧

Result path not found

April 25, 2019, 3:30 am

Latest and popular articles on Intel Technologies

≫ Next: Cannot see actual Fortran subroutine names when checking performance of optimized code

≪ Previous: Analyse Memory Consumption of an application

I want to create a hotspot report according to the help information

1) Generate the 'hotspots' report for the result directory 'r000hs'.
 
    amplxe-cl -report hotspots -r r000hs

However, I always get the error message

amplxe: Error: Result path `/<basename>/r000hs' does not exist

This does not change if I create the directory /<basename>/r000hs via mkdir beforehand.

↧

Cannot see actual Fortran subroutine names when checking performance of optimized code

May 14, 2019, 12:08 pm

Latest and popular articles on Intel Technologies

≫ Next: VTune Hotspots messing with TCP Stack (Zookeeper)

≪ Previous: Result path not found

Hello,

I have a large program in Fortran, which does not exhibit the level of speed that I wished to have, despite using the optimization options to maximize speed in the Intel compiler.

I am trying to use VTune Amplifier to detect specific regions in specific routines which are responsible for performance bottlenecks. The report produced by Vtune Amplifier does not list the actual names of the subroutines that I have written. I think this has to do with using a Release configuration-build for my program. Is there a way to build a release configuration, using optimization, and still be able to see the actual subroutine names in the Vtune Amplifier reports?

I do not want to benchmark a debug configuration-build for my program, as this will not indicate which parts of my code I may need to manually modify/optimize.

Thank you,

Yannis

↧

VTune Hotspots messing with TCP Stack (Zookeeper)

May 23, 2019, 3:25 pm

Latest and popular articles on Intel Technologies

≫ Next: Unable to download VTune Amlifier for linux

≪ Previous: Cannot see actual Fortran subroutine names when checking performance of optimized code

I am using the VTune Amplifier Hotspots which worked the first two times I used it. However, sometimes it causes my TCP stack to fail. I am using Zookeeper which is not being run when using Hotspots. If I try another analysis such as this

amplxe-cl -collect cpugpu-concurrency NoCameraApplicationStart.bat

then it works.

Does anyone know what's wrong?

Thanks

↧

Unable to download VTune Amlifier for linux

May 27, 2019, 5:46 pm

Latest and popular articles on Intel Technologies

≫ Next: Unable to install VTune Amplifier 2019 (any version) on Ubuntu 16.04

≪ Previous: VTune Hotspots messing with TCP Stack (Zookeeper)

Hello

I am trying to download Intel® VTune™ Amplifier for linux. (just Vtune Amplifier only)

Homepage gives me a download link, however it ends up with 404 error.

Any help will be greatly appreciated.

Thanks regards, Kim.

↧

Unable to install VTune Amplifier 2019 (any version) on Ubuntu 16.04

June 1, 2019, 2:06 am

Latest and popular articles on Intel Technologies

≫ Next: Amplifier cannot detect remote machine configuration.

≪ Previous: Unable to download VTune Amlifier for linux

Hello fellow developers,

i have been unable to install any version of Intel VTune Amplifier 2019 on a machine running Ubuntu 16.04.

Upon starting the installation process (./install.sh as root user / the same behaviour is observed when running with sudo) in cli-mode, the "Initializing, please wait..." is shown and the program starts allocating RAM until the latter is exhausted at which point the process is simply killed. The machine has 16GB of RAM + 32 GB of swap space.

Installation as a non-root user works properly, however the installer does not detect the user being able to evaluate rights, hence drivers etc. are not installed. My user is part of the wheel group and has (password protected) access to sudo.

I did have Parallel Studio XE 2017 installed previously (to /opt/intel/..., i.e. the canonical installation path), but uninstalled it using the supplied uninstall script and removed /opt/intel afterwards.

I have a working installation of VTune on a Arch Linux machine that used the same installer.

I hope anyone experienced similiar issues and may be able to assist me.

Best regards and thanks,

Patrick.

↧

Amplifier cannot detect remote machine configuration.

June 5, 2019, 2:14 am

Latest and popular articles on Intel Technologies

≫ Next: CPU/GPU concurrency analysis Error

≪ Previous: Unable to install VTune Amplifier 2019 (any version) on Ubuntu 16.04

When I use vtune on windows host to analysis the target linux application with ssh, the vtune show error "Amplifier cannot detect remote machine configuration." Can anybody tell me how to correct it? Thanks very much!

So for, I have installed the drivers required on target linux and copy the vtune_amplifier_2018.3.0.566015 folder to /opt/intel.

Then link the /lib64/ld-linux-x86-64.so.2 to /lib/.

Run amplxe-runss --context-value-list show follow logs.

targetOS: Linux
OS: Linux
OSBuildNumber: 0
OSBitness: 64
RootPrivileges: true
isPtraceScopeLimited: false
isTSXAvailable: true
isHTEnabled: false
fpgaOnBoard: None
pciClassParts:
isSGXAvailable: false
LinuxRelease: 4.8.0-36-generic
IsNUMANodeWithoutCPUsPresent: false
Hypervisor: VMwareVMware
PerfmonVersion: 1
isPtraceAvailable: true
areGpuHardwareMetricsAvailable: UnsupportedInterfaceVersion
i915Status: MissingDriver
isFtraceAvailable: yes
isMdfEtwAvailable: false
isCSwitchAvailable: yes
isGpuBusynessAvailable: unsupportedHardware
isGpuWaitAvailable: no
isFunctionTracingAvailable: yes
isIowaitTracingAvailable: yes
isVSyncAvailable: yes
isPAXDriverLoaded: true
isHyperVEnabled: false
isDeviceOrCredentialGuardEnabled: false
isSEPDriverAvailable: true
platformType: 103
CPU_NAME: Intel(R) Processor code named Skylake
PMU: skylake
referenceFrequency: 3200000000
isVTSSPPDriverAvailable: true
isNMIWatchDogTimerRunning: false
LinuxPerfCredentials: Unlimited
LinuxPerfCapabilities: breakpoint:raw;cpu:raw,format,events,ldlat,frontend;msr:raw,format,events;power:raw,format,events;software:raw;tracepoint:raw
LinuxPerfStackCapabilities: fp,dwarf
isTPSSAvailable: true
isPytraceAvailable: true
isGENDebugInfoAvailable: false
isGTPinCollectionAvailable: ErrorUnsupportedHardware
isSTTAvailable: no
isCOHNPKCtrlLibAvailable: false

↧

CPU/GPU concurrency analysis Error

June 12, 2019, 2:42 am

Latest and popular articles on Intel Technologies

≫ Next: vtune_amplifier_2019_update4's Prerequisite

≪ Previous: Amplifier cannot detect remote machine configuration.

Hi,

I have an error when i launch a cpu/gpu concurrency : Cannot finalize the result. Error 0x4000002a (Database interface error) -- Precompute error

please help !

↧

vtune_amplifier_2019_update4's Prerequisite

June 13, 2019, 12:41 am

Latest and popular articles on Intel Technologies

≫ Next: GPU in-kernel hardware not support but it should be

≪ Previous: CPU/GPU concurrency analysis Error

Hi,

I install vtune_amplifier_2019_update4 on centos 7.2. And below is the message about Missing Prerequisite:

Prerequisites > Missing Prerequisite(s)
--------------------------------------------------------------------------------
There are one or more unresolved issues based on your system configuration and
component selection.

You can resolve all the issues without exiting the installer and re-check, or
you can exit, resolve the issues, and then run the installation again.

--------------------------------------------------------------------------------
Missing optional prerequisites
-- The installed version of the Network Security Services library is not
supported
-- Kernel source directory is not found. Sampling driver cannot be built.
--------------------------------------------------------------------------------
1. Skip prerequisites [ default ]
2. Show the detailed info about issue(s)
3. Re-check the prerequisites

h. Help
b. Back
q. Quit installation

--------------------------------------------------------------------------------
Please type a selection or press "Enter" to accept default choice [ 1 ]:

What's the required minimum version of Network Security Services library?

Thanks!

Andrew

↧

GPU in-kernel hardware not support but it should be

June 13, 2019, 1:46 am

Latest and popular articles on Intel Technologies

≫ Next: How relevant are the line-by-line timings of a hotspot analysis

≪ Previous: vtune_amplifier_2019_update4's Prerequisite

Hi,

I can't run GPU In-kernel Profiling with this error : Current hardware does not support GPU in-kernel profiling. Whereas I have :

Intel I7-6700K with HD 530
Windows 7 / 64 bits
VTune 2018

and Intel documentation say GPU In-kernel is available on the processors based on Intel® microarchitecture code name Broadwell and later.

Broadwell is 5th Gen and i have 6th Gen (Skylabe). I don't understand why i can't run this profiling.

Do you have some ideas ?

Thanks

↧

How relevant are the line-by-line timings of a hotspot analysis

June 17, 2019, 2:08 am

Latest and popular articles on Intel Technologies

≫ Next: I can't have Estimated GPU cycles for kernel into In-Kernel Profiling

≪ Previous: GPU in-kernel hardware not support but it should be

Not 100% sure if this is the correct sub-forum, please be gentle...

I am currently trying to optimise a code for execution time. Hence the use of VTune Amplifier. I have some doubts/questions about the accuracy/relevance of the "cpu time" column. Here is a result from my code:

-See Attachments-

For example, I can not quite understand why line 34 "f_temp(11) = f_1(11,links(14,n))" takes so much longer than all of the similar lines.

So I tried swapping the order of lines, putting this one first. Surprisingly, now the line that took it's place as the 12th line of this block took equally long to execute, according to VTune amplifier.

Is this a real effect or am I falling prey to some kind of measurement error/aliasing effect?

For this analysis I used a single (pinned) thread on a system with 2x Xeon 2687W v3. The clock speed was locked to 3GHz. Compiler options were -O3 -ip -ipo -xHost -fopenmp -g. Ifort version is 18.0.03 on a Linux operating system. Advanced hotspot analysis with 1ms sampling interval.

And a follow-up: provided the line-by-line results are anything to go by, how would I optimise the code further for execution time?

I could imagine that the indirect memory access here causes a lot of LLC misses and makes the code memory (latency) bound. Additional analyses using VTune Amplifier seem to confirm this. But as far as I can tell there is not much I could do about it. The nodes are already reordered using a space-filling curve to increase data locality. And then why would only some of those lines with indirect memory access take up most of the time. Anything I am missing here? The problem size used in this example does not fit into cache, which will be the use-case later.

Another possibility -judging by publications in this area- could be streaming stores. This might be able to reduce the time taken by the last line of the code shown "f_2(i,n) = f_temp(i) - omega * (f_temp(i)-feq)". However, I have no idea how I would implement this. All the samples I could find used different languages.

Attachment	Size
Download amplifier_01.png	120.34 KB
Download amplifier_02.png	97.26 KB

↧

I can't have Estimated GPU cycles for kernel into In-Kernel Profiling

June 17, 2019, 4:33 am

Latest and popular articles on Intel Technologies

≫ Next: VTune on AMD cpus?

≪ Previous: How relevant are the line-by-line timings of a hotspot analysis

Hi,

When i go to source code of my kernel in GPU In-Kernel Profilng, i can't see column "Estimated GPU Cycles" (https://software.intel.com/en-us/node/810015 picture 2). I have column "Computing Task" with time just for my kernel's first line. How can I column Estimated GPU Cycles ?

I have also a warning for data collection : [Instrumentation Engine]: GTPin: FillMyFunctionMap() couldn't find GTPIN_IGC_OCL_GetSupportedFeatures

Thanks

↧

VTune on AMD cpus?

June 17, 2019, 1:38 am

Latest and popular articles on Intel Technologies

≫ Next: Application sets its own handler for signal 38 that is used for internal needs of the tool

≪ Previous: I can't have Estimated GPU cycles for kernel into In-Kernel Profiling

Hello,

I have an AMD Ryzen 2700X. Can I profile an application I'm developing using vtune?

Regards

↧

Application sets its own handler for signal 38 that is used for internal needs of the tool

June 17, 2019, 4:58 pm

Latest and popular articles on Intel Technologies

≫ Next: Error when opening a previous profile run through MSVS 2019 integration

≪ Previous: VTune on AMD cpus?

We have a simple Go and DPDK application that listens only for interrupt signals. I'm trying to setup vtune analysis via CLI/remote on a target Linux machine running the app. I'm using vtune_amplifier_2019.4.0.597835.

For the command

./amplxe-cl -collect hotspots \
-result-dir ~/vtune_results/run1 \
-target-process nff-go-nat \
-knob sampling-mode=sw \
-search-dir ~/go/src/github.com/intel-go/nff-go-nat/ \
-run-pass-thru=--profiling-signal=40

I see the error, despite using the workaround "-run-pass-thru=--profiling-signal=40".

amplxe: Error: Application sets its own handler for signal 38 that is used for internal needs of the tool. Collection cannot continue. Refer to the Troubleshooting section of the online help for possible workarounds.
amplxe: Collection failed.
amplxe: Internal Error

Any suggestions will be much appreciated.

↧