Quantcast
Channel: Intel® VTune™ Profiler (Intel® VTune™ Amplifier)
Viewing all 1347 articles
Browse latest View live

PS XE 2018 Update 4 & VTune

$
0
0

Colleagues,

One of our team members has installed Parallel Studio XE 2018 Professional Edition Update 4. (Compiler, VTune, Inspector, and Advisor), integrating into Visual Studio 2017. After installing the update, VTune is no longer in Visual Studio. There is no VTune button in the ribbon bar and VTune is not listed in the components in the VS Help drop-down. 

Running a Repair did not change matters. Watching closely we saw the installer list VTune Graphical Interface as one of thee elements (151 or 152 of 179).  We had previously (just days ago) received the stand-alone update for Vtune: VTune Amplifier 2018 Update 4. Running that produces the same empty result: After every sign of a successful VTune integration into VS 2017, it is not actually in VS..

Similar experiences? Any suggestions?


VTune version and license info

$
0
0

Hi, I originally installed VTune Amplifier XE 2017 back in 2017 and I had an Educator/academic license.

This expired in early 2018 but I recently received a new one and installed it using Intel Software Manager tool.

Then I opened up VTune and went to "About VTune / Details" tab and it now says Expiration Date: 9/18/2019, so I think that worked :-)

But my questions for this forum are:

1. Do I have the latest version of VTune?  On the same About VTune / Details tab it says: Product Version: Update 1 (build 486001).

When I use the Intel Software Manager / Downloads tab, it says "you currently have the latest available software installed.." (which I doubt is correct)

2. Does the license/serial-number which I have active now allow me to install the 2018 VTune releases (up to Update 4)? (my current license is for Educator Intel® Parallel Studio XE Cluster Edition for Windows*)

My system is running Windows 10 RS4.

Please advise. Thank you,

Prof. Colin Reinhardt

University of Washington, Electrical Engineering Dept.

 

Why VTune can profile a stripped application?

$
0
0

AFAIK, an unstripped application is necessary to display the function name rather than the func@address when getting profiled.

In my environment, I firstly installed an unstripped version of my application, the profiling is good, the function name can be displayed.

However, in my same environment, I setup a stripped version and profile this, the function name are still displayed. So is my understanding wrong about the requirement of profiling? How does VTune do actually?

To make sure the application is stripped, I used "objdump --syms" to see there is indeed "no symbol" reported.

OpenCL NEO driver (24.20.100.6286) and GPU Hotspots error "Intel Graphics Driver is obsolete"

$
0
0

I have the Intel OpenCL 2.1 NEO driver (24.20.100.6286) and just installed VTune 2018 Update 4 (build 573462).

My system is Intel Core i7-8650U CPU with UHD Graphics 620.

When I try to create a new GPU Hotspots analysis I get error message "Your version of the Intel Graphics Driver is obsolete and needs to be updated before collection."  

What to do?

Thanks, Colin

 

License type to purchase

$
0
0

Hi,

I am evaluating which license is suitable for me to purchase.

Could you please explain about the difference of 3 license type?

  • Product with Priority Support
  • Priority Support Renewal
  • Product Upgrade with Priority Support

 

question related to kernel.kptr_restrict=0

$
0
0

Hi all,

I was trying to setup VTune for profiling my simulation written in Fortran OpenMP on my institute's HPC cluster, the whole reason was to see the actual performance and understand the parallelisation module. I've been getting warnings from analysis result,

Data collection is completed with warnings
    Thu 18 Oct 2018 02:23:48 PM AEDT  The result file '/dir/VTuneResult/r007ah/r007ah.amplxe' is created and added to the project VTuneResult. Please see warning messages for details. 
    Access to /proc/kallsyms file is limited. Consider changing /proc/sys/kernel/kptr_restrict to 0 to enable resolution of OS kernel and kernel modules symbols.
    To profile kernel modules during the session, make sure they are available in the /lib/modules/kernel_version/ location.

Finalization completed successfully 
    Thu 18 Oct 2018 02:23:50 PM AEDT  Result is ready for analysis.

The whole analysis basically lasted 0.002s and gave me no performance information at all. It seems to me that this /proc/sys/kernel/kptr_restrict parameter needs to be set to 0 in order to be able to collect any usage. I've been reading the page:

https://software.intel.com/en-us/vtune-amplifier-help-enabling-linux-kernel-analysis

As I'm aware that the modification requires sudo privilege, hence I've been emailing the HPC admin people, the answer is no based on the reasoning that "the HPC is used for compute and analysis rather than kernel-level code optimisation, they cannot recompile the kernel or install arbitrary kernel modules in an ad-hoc fashion". 

I'm a bit confused atm as on VTune amplifier user guide, it doesn't mention anything related to recompiling kernel or install kernel modules? I'm quite new to this, not sure if I missed anything? Alternatively, for my application purposes as stated at the beginning of the thread, does user-mode sampling suffice? Not sure if I've included enough detail but pls let me know if any further information needed?

Thanks!

 

Missing debug symbols with Intel Fortran and optimization

$
0
0

Hello everyone, 

I am trying to profile my application with VTune Amplifier 2018.

If I compile my application with GCC everything works fine. However with Intel fortran and optimizations higher or equal to -O1 VTune fails to find the debug symbols, i.e. compiling with

-g -O2

fails to find the debug symbols, and so cannot display the source code of the functions, while using

-g -O0 

works fine. If I collect data with the optimized code, then recompile with -g -O0 and then re-resolve the collected data, debug symbols are found correctly, however it is quite inconvenient. 

How can I get both an optimized executable and correct debug symbols compiling with intel fortran?

.NET Core source code analysis with Intel® VTune™ Amplifier 2019

$
0
0

Dear VTune users,

Last year we enabled .NET Core performance profiling with Intel® VTune™ Amplifier 2018 including profiling Just-In-Time (JIT) compiled .NET Core code on Microsoft Windows* and Linux* operating systems. I am excited to share .NET Core-specific enhancements we added in Intel® VTune™ Amplifier 2019: improved source code analysis for .NET Core applications and enabled profiling a remote Linux target with analyzing the results on a Windows host.

Read more details in MSDN .NET blog - https://blogs.msdn.microsoft.com/dotnet/2018/10/23/net-core-source-code-analysis-with-intel-vtune-amplifier

Regards, Denis Pravdin.


VTune Profiling for User Image on ADB device

$
0
0

 Hi

I am using VTune version : VTune 2018 Update 3

I  want to do profile for some apps on ADB device flashed by User Celadon Image on KBL-NUC.

I am doing VTune profile for target system : Android device (ADB) , Target Type: Attach to Process, Analysis Type: Basic Hotspot

I am getting an error: 

Cannot attach to the target application due to an error returned by the 'run-as' utility. Suggestion: Make sure your Android image is installed correctly.

Also, when I switched to Target type: Profile System, and choose Analysis Type: Advanced Hotspot:

I am getting error:

This analysis type requires either an access to system-wide monitoring in the Linux perf subsystem or installation of the VTune Amplifier drivers (see the "Sampling Drivers" help topic for further details). Please set the /proc/sys/kernel/perf_event_paranoid value to 0 or less to continue without installing the drivers.

But when I do profile for Userdebug Image of Celadon flashing on KBL-NUC , and doing the above step on the same device. I do not see any error and successfully able to produce the result.

So can you please tell me, how to get  fixed these error while doing for User Image flashed on device. 

Thanks in advance!!

Problems getting remote sampling to work with VTune Amplifier 2019

$
0
0

I have recently tried using VTune Amplifier 2019 to perform analysis of an application using the remote Linux SSH capability. When trying to set up e.g. Hotspot analysis for the remote target, I get the following errors:

  • This analysis type is not applicable to the system because VTune Amplifier cannot recognize the processor. If this is a new Intel processor, please check for an updated version of VTune Amplifier. If this is an unreleased Intel processor, please contact Online Service Cetner for an NDA product package.
  • This analysis type is not applicable to the current machine microarchitecture.
  • Cannot enable Hardware Event-based Sampling due to a problem with the driver (sep*/sepdrv*). Check that the driver is running and the driver group is in the current user group list. See the "Sampling Drivers" help topic for further details.
  • Cannot enable advanced capabilities for Hardware Event-based Sampling due to a problem with the driver (vtss/vtssp). Check that the driver is running and the driver group is in the current user group list. See the "Sampling Drivers" help topic for further details.

The remote system was setup by following the instructions here https://software.intel.com/en-us/vtune-amplifier-install-guide-linux-ins... and adding my username to the 'vtune' group after having set up passwordless SSH access between the two. The remote system is an Intel(R) Xeon(R) CPU E5-2687W v4 @ 3.00GHz system running Centos7.5, and the output of

/opt/intel/vtune_amplifier_2019/bin64/amplxe-self-checker.sh

on both machines warns about some (to my mind) uninteresting issues:

amplxe: Warning: Cannot locate debugging information for file `/lib64/libc.so.6'.

amplxe: Warning: Cannot locate debugging information for file `/lib64/libpthread.so.0'.

amplxe: Warning: To enable hardware event-base sampling, VTune Amplifier has disabled the NMI watchdog timer. The watchdog timer will be re-enabled after collection completes.

amplxe: Warning: Cannot locate debugging information for file `/usr/lib64/libc-2.17.so'.

amplxe: Warning: Function and source-level analysis for the Linux kernel will not be possible since neither debug version of the kernel nor kernel symbol tables are found. See the Enabling Linux Kernel Analysis topic in the product online help for instructions.

 

and finishes with the line

The system is ready to be used for performance analysis with Intel VTune Amplifier.

The output of lsmod | egrep 'vtss|pax|sep' is:

vtsspp                384857  0

sep5                  856640  0

socperf3               33124  1 sep5

pax                    13820  0

 

This is using the vtune_amplifier_2019.0.2.570779 package.

 

Does anybody know what I could be doing wrong here?

VTune Enhancement Request: Make Source/Asm Browser Standalone

$
0
0

I'm sorry I could not find any place to file an official enhancement request, so I'm posting this to the forum.  If there is another place more appropriate to file this, please let me know and I'll refile it there.

Title: VTune Enhancement Request: Make Source/Asm Browser Standalone

Why I want this:
As I optimize the code I'm working on, I do a lot of analysis of the assembly generated by the C++ compiler when using highly optimized settings like -O3.  With these settings the code has a tendency to spread making it difficult to know what asm lines belong to what source lines.  The source browser in vtune is the only decent gui I've seen that makes identifying the code for a particular line dead simple to see.  Below is the kind of view I'm talking about - Notice how easy it is to see what happened to that particular line of code when the optimizing compiler was done with it. 
(Image also post below - had trouble inlining the image properly - so it may not have worked)

Why existing solutions are not good enough:
I've tried using many other tools, but nothing comes close to what this browser can do.  Tools like objdump and addr2line have the information, but are confusing and very difficult to look through.  It is difficult to really see the big picture like the image above shows.  And things like Godbolt can compile code and show you the asm, but don't really work with larger codebases and still won't help you with the association of line of source to line of asm like above. 

So far, the source code viewer in vtune is the only tool I've seen that has the ability to quickly show the connection between a line of source and the asm that it becomes.  It really is a fantastic tool for quickly viewing and understanding the assembly that is generated.

The Problem with vtune right now:
Unfortunately pulling up the vtune source browser has some limitations that are very bothersome:

  • You need to create a project and run a profiling run before you can view any code at all
  • When you do a profile, you can only call up the code for the functions and files that appear in the results tables for the run type you did.
  • There's no way to simply open a particular file or function that you want to see - you have to find the function in the gui list wherever it might be.

What I'm asking for:
I'd want to have some way to call up the source browser on arbitrary source files in my code base.  Right now I can only see the files for which the vtune gui lists a function that has been profiled and only after I've actually run some kind of profiled run.  Even then, I can only really open the browser for functions that are listed in the gui.  I want the ability to tell the browser to open up file xxx.cpp using compiled file xxx.o or libyyy.so, and I want to be able to do this even if I have not run any profiling runs.

How it might work:
I could see this working in a few different ways:

  • A standalone linux command line tool bundled with VTune (maybe called amplxe-srcview) where you can specify the file and/or the binary and it pulls up a new linux window with the clickable src/asm view in it.  If there is a way to auto-detect the source file, this would be a bonus.
  • A clickable button in the amplxe-gui window that could pull up a list of all the files associated with the currently loaded binary on the  current project.  Then I select the file I want to see and it opens the browser in another window or tab.  I want to be able to do this even if I have not run any code yet.  It already knows my binary and my source path, so I should not have to run.
  • A separate tool completely divorced from Vtune that can be downloaded and used separately.

 

Can VTune 2019 profile Python code with Numba decorators?

$
0
0

I am using VTune to profile some Python codes for getting "GFLOPS". 

The baseline code is written with Numpy and the optimized code is in Numba (with @njit and @vectorize decorator). The Numba code is about 8 times faster than the Numpy baseline, however, vTune shows that  Numpy and Numba achieve the same "GFLOPS".

I just want to make sure that can the latest vTune report  "GFLOPS" correctly for Numba Python code or not?

Is there any benchmark or example code about profiling Python​ Numba​ with vTune?

 

Thanks and regards

no call stack information

$
0
0

I created a new project, chose the "configure analysis", chose the executable file and after work I saw in the tab Bottom-up [Unknown] -> [No call stack information] . i am use debug build executable. how can i fix it? 

Performance difference in between Windows and Linux using intel compiler: looking at the assembly

$
0
0

I am running a program on both Windows and Linux (x86-64). It has been compiled with the same compiler (Intel Parallel Studio XE 2017) with the same options, and the Windows version is 3 times faster than the Linux one. The culprit is a call to std::erf which is resolved in the Intel math library for both cases (by default, it is linked dynamically on Windows and statically on Linux but using dynamic linking on Linux gives the same performance).

Here is a simple program to reproduce the problem.

#include <cmath>
#include <cstdio>

int main() {
  int n = 100000000;
  float sum = 1.0f;

  for (int k = 0; k < n; k++) {
    sum += std::erf(sum);
  }

  std::printf("%7.2f\n", sum);
}

When I profile this program using vTune, I find that the assembly is a bit different in between the Windows and the Linux version. Here is the call site (the loop) on Windows

Block 3: "vmovaps xmm0, xmm6" call 0x1400023e0 <erff> Block 4: inc ebx "vaddss xmm6, xmm6, xmm0""cmp ebx, 0x5f5e100" jl 0x14000103f <Block 3>

And the beginning of the erf function called on Windows

Block 1: push rbp "sub rsp, 0x40""lea rbp, ptr [rsp+0x20]""lea rcx, ptr [rip-0xa6c81]""movd edx, xmm0""movups xmmword ptr [rbp+0x10], xmm6""movss dword ptr [rbp+0x30], xmm0""mov eax, edx""and edx, 0x7fffffff""and eax, 0x80000000""add eax, 0x3f800000""mov dword ptr [rbp], eax""movss xmm6, dword ptr [rbp]""cmp edx, 0x7f800000" ...

On Linux, the code is a bit different. The call site is:

Block 3 "vmovaps %xmm1, %xmm0""vmovssl %xmm1, (%rsp)" callq 0x400bc0 <erff> Block 4 inc %r12d "vmovssl (%rsp), %xmm1""vaddss %xmm0, %xmm1, %xmm1"<-------- hotspot here "cmp $0x5f5e100, %r12d" jl 0x400b6b <Block 3>

and the beginning of the called function (erf) is:

"movd %xmm0, %edx""movssl %xmm0, -0x10(%rsp)"<-------- hotspot here "mov %edx, %eax""and $0x7fffffff, %edx""and $0x80000000, %eax""add $0x3f800000, %eax""movl %eax, -0x18(%rsp)""movssl -0x18(%rsp), %xmm0""cmp $0x7f800000, %edx" jnl 0x400dac <Block 8> ...

I have shown the 2 points where the time is lost on Linux.

Does anyone understand assembly enough to explain me the difference of the 2 codes and why the Linux version is 3 times slower?

Recipe: How to profile the applications in Amazon Web Services* (AWS) EC2 Instances

$
0
0

Dear VTune users,

Cloud computing becomes increasingly important. Learn more about setting up a VM instance in AWS for performance profiling with Intel® VTune™ Amplifier. Pay attention that hardware-based analysis types are not available at the moment. We are looking forward to get any feedback from you.

Regards, Denis Pravdin


Does remote analysis automatically install VTune on remote machine?

$
0
0

I've got VTune 2019 Update 1 installed on my Mac and would like to profile a program running on a remote Linux server via ssh.

Do I need to specifically install VTune on the remote linux machine? At first I thought I did, but on page 4 of the post-installation steps it mentions "the automatic installation on the remote Linux system ...". I'm thinking that it will copy over what it needs to the remote machine? Can someone confirm?

Thanks!

-Tony

 

Vtune 2019 update1 GPU OpenCL gpu-hotspots profiling failed: "GpuNewKernelLibInitError"

$
0
0

Hi team, I got an error when start Vtune 2019 update1 gpu-hotspots profiling: "GpuNewKernelLibInitError".

Anyone can help ?

Following is my log:

root@cmpl:~/workspace/opencl/intel_ocl_gemm_linux/GEMM# amplxe-cl  -collect gpu-hotspots -- ./gemm
amplxe: Error: %GpuNewKernelLibInitError

 

===============

OS: ubuntu 16.04

And my hardware:

CPU:

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per core:    2
Core(s) per socket:    4
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 158
Model name:            Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
Stepping:              9
CPU MHz:               3674.382
CPU max MHz:           4200.0000
CPU min MHz:           800.0000
BogoMIPS:              7200.00
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              8192K
NUMA node0 CPU(s):     0-7

GPU:Intel® HD Graphics 630

 

OpenCL driver: Intel OpenCL 2.0

 

 

 

KVM guest OS crash with DPDK workload

$
0
0

Hi,

I'm trying to use VTune Amplifier to profile an OVS-DPDK virtual switch that runs on a target Linux host along with a KVM guest running a DPDK L2 forwarding application (testpmd). Each time I start a Memory Access or Micro-architecture profiling session, the KVM guest OS crashes with a kernel panic: BUG: unable to handle kernel paging request at xxxx.
My setup is as follows:
-VTune Amplifier 2019 (build 570779)
-sampling drivers on target Linux host: sepdk_v5_575421
-target Linux OS: Ubuntu 16.04 64bit, kernel version: 4.15.0-39-generic
-guest Linux OS: Ubuntu 18.04 64bit, kernel version 4.15.0-34-generic
-OVS 2.8.1
-QEMU emulator version 2.9.1
-24-core server, all cores except core #0 are isolated
-VM backed by 1G huge pages, two vNICs connected to vhost-user ports.

It is worth noting that
-the crash occurs whatever process is profiled by VTune
-VTune works fine when profiling the DPDK application running on bare-metal (no OVS switch or VM)

Any idea/suggestion ?

 

 

 

Vtune cannot map collect source line numbers to assembly code.

line-by-line profiling of opencl code still does not work in VTune 2019

$
0
0

I just installed vtune 2019, but it appears that vtune amplifier still can not give me line-by-line timing information for an OpenCL code like it did in 2016 version.

this issue was previously reported in this thread for vtune2018

https://software.intel.com/en-us/forums/intel-vtune-amplifier/topic/746813

now, if I run my code and view the bottom-up view, I can see my kernel, but it hows "Dynamic code" and "unknown source file". see attachment 1. Double clicking on this kernel name gives me the assembly, which is not very helpful to optimize the code (in vtune2016, it shows the timing for each source code in the .cl file).

if I go to Caller/Callee tab, double clicking on the kernel, it does open a tab with the host code (but not the .cl kernel source code). see the 2nd attachment.

can anyone tell me how I can tell vtune my kernel source code (mcx_core.cl) so that it can show me the timing info for each source code line?

thanks

Viewing all 1347 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>