Quantcast
Channel: Intel® VTune™ Profiler (Intel® VTune™ Amplifier)
Viewing all 1347 articles
Browse latest View live

Vtune lost part of cpu data

$
0
0

I've run spec2006 benchmarks with Vtune. Somtimes, Vtune lost part of cpu event data but bandwidth data was ok.

I ran app by command line like this:/opt/intel/vtune_amplifier_xe_2016.3.0.463186/bin64/amplxe-cl  -c memory-access -knob analyze-mem-objects=true -knob mem-object-size-min-thres=1024 -knob sampling-interval=1 -data-limit=0 /cpu2006/benchspec/CPU2006/471.omnetpp/run/run_base_ref_amd64-m64-gcc42-nn.0000/omnetpp_base.amd64-m64-gcc42-nn omnetpp.ini 

The result was:

 the CPU time is too small and there was no cpu idle or spin etc.

 

 

 

There is no core data after ~200sec but bandwidth data and uncore events. I think that Vtune lost data because everything was ok when I retested same benchmark. 

I want to know why that problem occur and how to prevent that. 

Thanks!

Thread Topic: 

Question

VTune 2017: Option to "Expand All" in context menu seems to be missing

$
0
0

Hello, I started using VTune Amplifier XE 2017 now - downloaded and tried the Update 1.

One of the first things that I noticed:
The option "Expand All" seems to be gone in the context menu that appears when right-clicking into the function listing.

In my case I had done "Basic Hotspot" analysis and opened the "Top-down Tree" view.
The context menu only shows "Collapse All" and "Expand Selected Rows" now - previously there had always been an "Expand All".

This was very helpful for me to quickly get to the proper hotspot that I needed to investigate further.
Sometimes this is nested somewhere several levels deep (10+).
Now it is a very tedious task to open all branches until enough folds have been opened.

Am I missing something?
Is there any alternative now to open all folds at once?

Best Regards,
Michael

 

Thread Topic: 

Question

What should I check when Vtune reports zero CPU usage and time?

$
0
0

Hi,
Am profiling my application program which uses both MPI and OpenMP (Both from Intel Parallel Studio)
I use the following command to run the profiler:

mpiexec.hydra -genvall -n 32 -machinefile ./machines -gtool "amplxe-cl -c hotspots -r {DirectoryPath}:0=node-wide" {BinaryFile} {Arguments ...}

The results show (The program ran for 2 minutes):

CPU Time:           0.000
Average CPU Usage:  0.0

What should I check first? Can any one give me advice on it?

Thanks

KNL link line for itt_pause(), itt_resume()

$
0
0

The following link line fails to resolve references to __ittpause and __itt_resume:

mpiicc -std=c99 -debug inline-debug-info -O3 -xMIC-AVX512 -fPIC -fno-alias -ansi
_alias -fp-model fast=2 -qopenmp -mkl -qopt-report=4 -restrict -I/opt/intel/2017
/vtune_amplifier_xe_2017.0.2.478468/include -L/opt/intel/2017/vtune_amplifier_xe
_2017.0.2.478468/lib64/  -o inviscid_rk inviscid_v3_deleaved_c.o main.o  -littno
tify

 

Does it mean there is no support for these old VTune API functions here?

I don't know the sysadmin on this remote system so don't know exactly how things were installed there.

The task at hand is to determine whether OpenMP functions are spending much time in a specific omp for loop in a parallel region which scales well with problem size but runs slowly with a problem size which keeps about 32 threads out of the 64 busy.  The original code timed the loop by restricting the timer to thread ID 0.  I suspect #omp pragma restrict (or maybe master) may give more meaningful timing; something seems strange about the timing where there isn't sufficient work to keep all threads active.  Still there seems to be too much time spent there, and -collect hpc-performance reports high serial time overall.

I could imagine that some more modern features of VTune might be more suitable, but I don't find enough detail in documentation.

 

Zone: 

Thread Topic: 

How-To

VTune's handling of in-use performance counters

$
0
0

Hi,

When running VTune Amplifier XE 2017 Beta Update 1 in a VMware Workstation x86 VM with virtual performance counters enabled, we sometimes see the following types of warnings in the log:

2016-10-10T15:22:46.434-04:00| vcpu-1| I125: VPMC: The guest wrote to the event selector of in-use virtual performance counter 0, which is disallowed.

2016-10-10T15:22:46.435-04:00| vcpu-1| I125: VPMC: The guest wrote to the event selector of in-use virtual performance counter 0, which is disallowed.

2016-10-10T15:22:46.436-04:00| vcpu-1| I125: VPMC: The guest wrote to the event selector of in-use virtual performance counter 0, which is disallowed.

VMware's virtual x86 performance counter implementation aims to expose virtual counters that aren't available by marking them "In-use" according to Intel's whitepaper on cooperative PMU sharing guidelines: https://software.intel.com/en-us/articles/performance-monitoring-unit-gu...

VMware's virtual x86 performance counter implementation drops writes from the guest operating system to virtual counters that are marked as in-use to not corrupt the real PMU HW, and issues these warnings. From these logs, it appears that VTune attempts to use virtual performance counters that are marked in-use. Does anyone have any specific knowledge about what VTune's policy is for counters that it finds as in-use (enabled) by other software?

Thanks,

Taylor

 

Thread Topic: 

Question

Roofline model within VTune Amplifier 2017 and SDE

$
0
0

Hello everyone,

I'm working on machines Intel core i5 6400 (2.7GHz, Windows 8.1, 8GB RAM) and Intel core i7-5930K (3.5 GHz, Windows 8.1, 32 GB RAM)

I need to build a roofline model for my application. For that purpose I use Intel Amplifer 2017 and Intel SDE.

When using SDE I measure Arithmetic Intensity counting mem-read and mem-write and the total number of GFLOP counting elements_fp_<...>

I use Intel Ampilfier 2017 to measure max bandwidth (from HPC Performance Characterization test results (for my machine it shows to be 17GB/s)).

However, when I profile mkl benchmark (matrix multiplication, sgemm function) I get different values of FLOP calculated by Intel Amplifier and Intel SDE. For Intel Amplifier 2017 the result is twice greater then for SDE. (It is not the case for STREAM benchmark, where I get approximately the same values!) Moreover, the arithmetic intensity calculated with SDE and GFLOPS calculated either by SDE or Amplifier give me the point which is well out of roofline model limits.

Is there any particular issues when using Intel Amplifier/SDE for mkl library functions?

Could you please let me know if I'm using Intel SDE and Amplifier in the right way to estimate FLOPs, Arithmetic Intensity and max bandwidth?

Kind regards,

Sofya

Zone: 

Thread Topic: 

Help Me

socwatch error (entry point not found)

$
0
0

Hi 

I got this error when running socwatch from command prompt :

socwatch entry point not found

the procedure entry point getsystemtimepreciseasfiletime could not be located in the dynamic link library kernel32.dll.

See the attched screen shot.

I am using vtune 2017 with windows 7

Regards,

Naif

AttachmentSize
Downloadimage/jpegCapture.jpg23.38 KB

Zone: 

Thread Topic: 

Bug Report

VTune 2015 (15.4.0) crashes profiled application when collection starts

$
0
0

When I try to start capturing data for an application that is already running (attach), the application I'm running crashes and core dumps as soon as I press the start button in the GUI.  I've also tried running the command shown if I press the [Command Line...] button -- the result is the same.  I'm just doing simple hotspot analysis without changing any of the default parameters.

The error that is printed on for my application is:

ERROR: ld.so: object '/$LIB/bash_ld_preload.so' from /etc/ld.so.preload cannot be preloaded: ignored.

AMPLXE_TPSSCOLLECTOR[37143]: module_map_common988: (start != 0 && end != 0 && start < end) : BUG! :
Assertion failed: module_map_common988: (start != 0 && end != 0 && start < end) : BUG! : . Please contact the technical support. Segmentation fault (core dumped)

 

 

Thread Topic: 

Bug Report

How to measure a single core power consumption?

$
0
0

Hi

I am performing some experiments to investigate the power consumption for an exe file running on a single core of muticore cpus (haswell,broadwell).

But the problem that i can read only the cpu package power consumption while i need a single core power consumption, Can any one give me any idea how to solve this issue ?

Thanks,

Naif

Zone: 

Results Understanding - Naive Question

$
0
0

My application I want to speedup performs element-wise processing of large array (about 1e8 elements).
​The processing procedure for each element is very simple and I suspect that bottleneck could be not CPU but DRAM bandwidth.
​So I decided to study one-threaded version at first.

I have got the following result
One Thread SummaryOne Thread PlatformAs far as I understand the Summary Page, the situation is not very good.
​The paper https://software.intel.com/en-us/articles/finding-your-memory-access-performance-bottlenecks says that the reason is so-called false sharing. But I do not use multithreading, all processing is performed by  just one thread.
​From the other hand according to Platform Page DRAM Bandwidth is not bottleneck.

​So my question what is the reason of bad memory metrics values?

​Thank you
 

Thread Topic: 

Help Me

New tutorial available on analyzing hybrid OpenMP+MPI applications

$
0
0

Discover how to use Intel® Parallel Studio to tune hybrid applications by reviewing MPI utilization inefficiencies and balancing thread load levels.

This tutorial uses the sample heart_demo and guides you through basic steps required to analyze hybrid OpenMP* and MPI code for inefficiencies using MPI Performance Snapshot, Intel® Trace Analyzer and Collector, and Intel® VTune™ Amplifier XE. You will learn how to:

  • Build an application using the MPI library and Intel® C++ compiler.
  • Run the MPI Performance Snapshot tool to get a high-level overview of performance optimization opportunities.
  • Run Intel Trace Analyzer and Collector to identify MPI-bound code.
  • Analyze the communication pattern of the source code.
  • Run the HPC Performance Characterization Analysis with Intel VTune Amplifier XE to locate vectorization and parallelism issues in the sample code.
  • Compare results before and after optimization.

Check out the tutorial here: Analyzing an OpenMP* and MPI Application.

Trigger-based event multiplexing in Vtune

$
0
0

Hi, 

I'm experimenting with the trigger-based event multiplexing capability in Vtune. What I want is to use one event (e.g. instruction retired) as a trigger to sample another event (e.g. cycles). I'm using the command line tool aplxe-runss that comes with the vtune_amplifier_xe package. Here is my command:

./amplxe-rnss -event-config="INST_RETIRED.ANY:sa=100000000","CPU_CLK_UNHALTED.REF_TSC" -event-mux -event-mux-alg=trigger -event-mux-trigger="INST_RETIRED.ANY" -- ./myApplication

I was hoping by specify the sample after value(e.g. 100000000) for the triggering event, I would get a sample of the other events every 100000000 instructions. But when I report the summary in the end:

Event summary
-------------
Hardware Event Type       Hardware Event Count:Self  Hardware Event Sample Count:Self  Events Per Sample
------------------------  -------------------------  --------------------------------  -----------------
INST_RETIRED.ANY                       132000000000                              1320  100000000        
CPU_CLK_UNHALTED.REF_TSC               208908313362                            104454  2000003

This shows me the sample of the two events are done independently, the trigger does not work.

So what the correct way here ?

Thanks,

Ziqiang      

 

Can VTune 2017 work under Windows 10 VSM?

$
0
0

I get an error about enabling vPMU, and I found some documentation that says hyper-v must be uninstalled, but I'm running windows 10 enterprise with VSM (virtualization-based security using hyper-v) and I'd really rather not disable this amazing leap in windows security if there is any way to get vtune working under this configuration.

Vtune crash with simple python application

$
0
0

I get a crash (segfault with no error log dumped) whenever I run vtune over a python application.  Here's a simple script that triggers the behavior  on my machine:

import numpy as np
import numpy.random as ra
import numpy.linalg as la

if __name__ == "__main__":
  n = 10000
  A = ra.random((n,n))
  b = ra.random((n,))

  c = np.dot(A,b)
  print(la.norm(c))

For example:

amplxe-cl -collect hotspots -- python test.py

Results in:

amplxe: Collection started. To stop the collection, either press CTRL-C or enter from another console window: amplxe-cl -r /home/messner/projects/neml/debug/r000hs -command stop.
250170.317306
amplxe: Warning: Cannot stop posix timer: __NR_timer_settime() system call returned -1.
amplxe: Collection stopped.
amplxe: Using result path `/home/messner/projects/neml/debug/r000hs'
amplxe: Executing actions 19 % Resolving module symbols                        
amplxe: Warning: Cannot locate file `test.py'.
amplxe: Executing actions 21 % Resolving information for `libc-dynamic.so'     
amplxe: Warning: Cannot locate debugging symbols for file `/opt/intel/vtune_amplifier_xe_2017.0.2.478468/lib64/pinruntime/libc-dynamic.so'.
amplxe: Executing actions 22 % Resolving information for `libc-dynamic.so'     
amplxe: Warning: Cannot locate debugging symbols for file `/opt/intel/vtune_amplifier_xe_2017.0.2.478468/lib64/libtpsstool.so'.
amplxe: Executing actions 22 % Resolving information for `libtatlas.so.3'      Segmentation fault

So the script runs, but there's some problem collecting the profiling results.

If I run in the debugger I get the very unhelpful:

...

Detaching after fork from child process 38459.
amplxe: Executing actions 19 % Resolving module symbols                        
amplxe: Warning: Cannot locate file `test.py'.
amplxe: Executing actions 21 % Resolving information for `libdl.so.2'          
amplxe: Warning: Cannot locate debugging symbols for file `/opt/intel/vtune_amplifier_xe_2017.0.2.478468/bin64/pinbin'.
amplxe: Executing actions 21 % Resolving information for `libtpsstool.so'      
amplxe: Warning: Cannot locate debugging symbols for file `/opt/intel/vtune_amplifier_xe_2017.0.2.478468/lib64/pinruntime/libpin3dwarf.so'.
amplxe: Executing actions 21 % Resolving information for `type_check.py'       
amplxe: Warning: Cannot locate debugging symbols for file `/opt/intel/vtune_amplifier_xe_2017.0.2.478468/lib64/libtpsstool.so'.
amplxe: Executing actions 22 % Resolving information for `arraysetops.py'      
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffd9e2a700 (LWP 38455)]
0x00007fffea84e963 in ?? ()
   from /opt/intel/vtune_amplifier_xe_2017.0.2.478468/bin64/../lib64/../lib64/../lib64/libamplxe_ism_core_3.34.s

Any suggestions?

Thread Topic: 

Bug Report

Results Interpreting

$
0
0

 

My application processes in a short loop huge amount of data. At now the application is single threaded and I am trying to speed up single threaded as far as possible before moving to multiple thread

General Exploration Summary page says

Elapsed Time: 10.346s
    Clockticks: 25,370,400,000
    Instructions Retired: 21,055,200,000
    CPI Rate: 1.205
    MUX Reliability: 0.917
    Front-End Bound: 3.6%
    Bad Speculation: 0.2%
    Back-End Bound: 77.2%
        Memory Bound: 46.8%
            L1 Bound: 0.0%
            L2 Bound: 4.3%
            L3 Bound: 0.8%
                Contested Accesses: 0.0%
                Data Sharing: 0.0%
                L3 Latency: 15.3%
                SQ Full: 16.2%
            DRAM Bound: 32.4%
                Memory Bandwidth: 26.2%
                Memory Latency: 73.8%
                    LLC Miss: 76.4%
            Store Bound: 0.0%
                Store Latency: 21.1%
                False Sharing: 0.0%
                Split Stores: 0.0%
                DTLB Store Overhead: 0.2%
        Core Bound: 30.3%
            Divider: 0.0%
            Port Utilization: 23.5%
                Cycles of 0 Ports Utilized: 43.5%
                Cycles of 1 Port Utilized: 16.2%
                Cycles of 2 Ports Utilized: 12.6%
                Cycles of 3+ Ports Utilized: 8.9%
                    Port 0: 18.4%
                    Port 1: 16.2%
                    Port 2: 21.5%
                    Port 3: 22.2%
                    Port 4: 3.3%
                    Port 5: 20.5%
    Retiring: 19.0%
        General Retirement: 19.0%
            FP Arithmetic: 30.6%
                FP x87: 0.0%
                FP Scalar: 0.0%
                FP Vector: 30.6%
            Other: 69.4%
        Microcode Sequencer: 0.0%
            Assists: 0.0%
    Total Thread Count: 1
    Paused Time: 3.071s

As far as I see the issues is at Back-End, i.e. all instructions are fetched from DRAM by Front-End but CPU idles on stage where these instructions are executed. The code is consecutive, without conditions. I see that the code is DRAM bound and, especially, Memory Latency  bound.

Does it mean that it is impossible to speedup the code because I limited by DRAM parameters?

Zone: 

Thread Topic: 

Question

Memory access and NUMA

$
0
0

Hi,

I wrote this short program to understand how memory access analysis and NUMA works. I am running a Dual-Xeon E5-2660v4. I am surprised to see that the QPI is heavily used despite the fact that my program is using the first touch policy. Can anyone explain me why there is so much traffic here?

int main() {
  int n = 1000000000;

  double* a = new double[n];
  double* b = new double[n];
  double* c = new double[n];
#pragma omp parallel for
  for (int k = 0; k < n; ++k) {
      a[k] = 0.0;
      b[k] = 0.0;
      c[k] = 0.0;
  }

#pragma omp parallel for
  for (int k = 0; k < n; ++k) {
      a[k] = b[k] + c[k];
  }

  delete[] c;
  delete[] b;
  delete[] a;

  return 0;
}

 

Multithreading issue associated with heap contention, but only seen on specific machine type

$
0
0

Hello,

I asked this question already in the Intel processor forum, but was referred to this forum. This is regarding a multithreading issue associated with heap contention, that only shows on a machine with 2x E5-2650V3 processor (each 10 hardware cores, i.e. total of 20 cores). The code below scales very well on a similar machine, but with 2x 8 core processors. However, with the specific machine type, Vtune amplifier indicates cache misses and the code runs about 10 times slower, while there are zero cache misses with the other machine. I tried this also using new/delete as well as HeapAlloc under Windows, etc.

I understand that this would cause some issues associated with the heap lock. However, I don't understand why this would work ok on one machine, but not on the other. Is there a way to find out more details, where / why the cache misses happen?

std::vector<std::future<void>> futures;
     for (auto iii = 0; iii != 40; iii++) {
         futures.push_back(std::async([]() {
             for (auto i = 0; i != 100000 / 40; i++) {
                 const int size = 10;
                 for (auto k = 0; k != size * size; k++) {
                     double* matrix1 = (double*)malloc(100 * sizeof(double));
                     double* matrix2 = (double*)malloc(100 * sizeof(double));
                     double* matrix3 = (double*)malloc(100 * sizeof(double));

                     for (auto i = 0; i != size; i++) {
                         for (auto j = 0; j != size; j++) {
                             matrix1[i * size + j] = std::rand() / RAND_MAX;
                             matrix2[i * size + j] = std::rand() / RAND_MAX;
                         }
                     }
                     double sum = 0;
                     for (auto i = 0; i != size; i++) {
                         for (auto j = 0; j != size; j++) {
                             for (auto k = 0; k != size; k++) {
                                 sum += matrix1[i * size + k] * matrix2[k * size + j];
                             }
                             matrix3[i * size + j] = sum;
                             sum = 0;
                         }
                     }
                     free(matrix1);
                     free(matrix2);
                     free(matrix3);
                 }
             }
         }));
     }
     for (auto& entry : futures)
       entry.wait();

Zone: 

Thread Topic: 

Question

evaluation license

$
0
0

hello,

I installed vtune in CLI mode; when i run it with an application, i get the following error:

 

amplxe: Warning: Skipped generation of report `summary': no valid license can be found (Could not find the Intel product license file. Suggestion: Please check if: (1) the environment variable INTEL_LICENSE_FILE points to the correct Intel license file directory and (2) this directory contains a valid license (.lic) file for this Intel product. Internal error code: `-76'.).

 

 

How can i get the license for the evaluation? I didnt see such a link in download page.

Thanks in advance.

cinar.

Frequent irql_not_less_or_equal BSODs, VTune Amplifier XE 2017 Update 1 (build 486011)

$
0
0

Trying to profile an Unreal Engine 4-based application, suffering from rather frequent blue screens that are rendering this hopeless.

Unfortunately I can't get anything from the dump because I can't find kernel symbols (apparently I'm not alone https://social.msdn.microsoft.com/Forums/en-US/65db21ea-4c5b-4f24-ab26-0908479c977d/debug-symbols-for-4dac3b582a9147ecaed2644cb165222b1?forum=windbg ), so I'm not sure what else to say.

3: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

IRQL_NOT_LESS_OR_EQUAL (a)
An attempt was made to access a pageable (or completely invalid) address at an
interrupt request level (IRQL) that is too high.  This is usually
caused by drivers using improper addresses.
If a kernel debugger is available get the stack backtrace.
Arguments:
Arg1: ffffe60fc9f58310, memory referenced
Arg2: 0000000000000002, IRQL
Arg3: 0000000000000000, bitfield :
	bit 0 : value 0 = read operation, 1 = write operation
	bit 3 : value 0 = not an execute operation, 1 = execute operation (only on chips which support this level of status)
Arg4: fffff802da49f9be, address which referenced memory

Debugging Details:
------------------

***** Kernel symbols are WRONG. Please fix symbols to do analysis.

*************************************************************************
***                                                                   ***
***                                                                   ***
***    Either you specified an unqualified symbol, or your debugger   ***
<SNIP>
***                                                                   ***
*************************************************************************

DUMP_CLASS: 1

DUMP_QUALIFIER: 401

BUILD_VERSION_STRING:  14393.447.amd64fre.rs1_release_inmarket.161102-0100

SYSTEM_MANUFACTURER:  Gigabyte Technology Co., Ltd.

SYSTEM_PRODUCT_NAME:  To be filled by O.E.M.

SYSTEM_SKU:  To be filled by O.E.M.

SYSTEM_VERSION:  To be filled by O.E.M.

BIOS_VENDOR:  American Megatrends Inc.

BIOS_VERSION:  F1

BIOS_DATE:  10/24/2012

BASEBOARD_MANUFACTURER:  Gigabyte Technology Co., Ltd.

BASEBOARD_PRODUCT:  Z77-HD4

BASEBOARD_VERSION:  x.x

ADDITIONAL_DEBUG_TEXT:
You can run '.symfix; .reload' to try to fix the symbol path and load symbols.

WRONG_SYMBOLS_TIMESTAMP: 5819bd1f

WRONG_SYMBOLS_SIZE: 820000

FAULTING_MODULE: fffff802da410000 nt

DEBUG_FLR_IMAGE_TIMESTAMP:  5819bd1f

DUMP_TYPE:  1

BUGCHECK_P1: ffffe60fc9f58310

BUGCHECK_P2: 2

BUGCHECK_P3: 0

BUGCHECK_P4: fffff802da49f9be

READ_ADDRESS: *************************************************************************
***                                                                   ***
***                                                                   ***
***    Either you specified an unqualified symbol, or your debugger   ***
<SNIP>
***                                                                   ***
*************************************************************************
Unable to get size of nt!_MMPTE - probably bad symbols
 ffffe60fc9f58310

CURRENT_IRQL:  0

FAULTING_IP:
nt!KiCheckForKernelApcDelivery+fe
fffff802`da49f9be 498b4a30        mov     rcx,qword ptr [r10+30h]

CPU_COUNT: 8

CPU_MHZ: d4b

CPU_VENDOR:  GenuineIntel

CPU_FAMILY: 6

CPU_MODEL: 3a

CPU_STEPPING: 9

CPU_MICROCODE: 0,0,0,0 (F,M,S,R)  SIG: 1B'00000000 (cache) 0'00000000 (init)

ANALYSIS_SESSION_HOST:  TBS-BRADY

ANALYSIS_SESSION_TIME:  11-16-2016 14:17:51.0735

ANALYSIS_VERSION: 10.0.14321.1024 amd64fre

LAST_CONTROL_TRANSFER:  from fffff802da565629 to fffff802da55a510

STACK_TEXT:
ffff9481`76a06558 fffff802`da565629 : 00000000`0000000a ffffe60f`c9f58310 00000000`00000002 00000000`00000000 : nt!KeBugCheckEx
ffff9481`76a06560 fffff802`da563c07 : 00000000`00000000 00000000`0004b000 ffff9481`76a06700 fffff802`da47a146 : nt!setjmpex+0x3ee9
ffff9481`76a066a0 fffff802`da49f9be : ffffe60f`c9f58000 fffff804`dfdf1786 00000000`00000282 ffffe60f`c9f0e000 : nt!setjmpex+0x24c7
ffff9481`76a06830 fffff802`da49f8e1 : ffffe60f`b7e47620 fffff802`da4ce500 ffffe60f`00000000 00000000`00000000 : nt!KiCheckForKernelApcDelivery+0xfe
ffff9481`76a068c0 fffff802`da4ad057 : ffffe60f`c3f5b740 fffff802`da8deb28 fffff802`da74bab8 fffff802`da74bc98 : nt!KiCheckForKernelApcDelivery+0x21
ffff9481`76a068f0 fffff802`da8cc1a3 : ffffe60f`b7e47300 ffff9481`76a06980 00000000`00000000 fffff802`da8e199d : nt!FsRtlAcquireHeaderMutex+0x237
ffff9481`76a06940 fffff802`da8c76be : 00000000`00000000 ffffe60f`b7e47300 00000000`00000000 ffffe60f`c3f5b740 : nt!KeUserModeCallback+0x5b3
ffff9481`76a06a80 fffff802`da84c2a0 : ffffe60f`b7e47300 00000000`00000000 ffffe60f`b7e47300 00000000`00000000 : nt!FsRtlGetFileSize+0x19d2
ffff9481`76a06ac0 fffff802`da565193 : ffffe60f`b7e47300 ffff9481`76a06b80 00000000`00000010 ffffe60f`bd412710 : nt!IoOpenDeviceRegistryKey+0x148
ffff9481`76a06b00 00007ffb`34ec58c4 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!setjmpex+0x3a53
000000ec`804cf6c8 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x00007ffb`34ec58c4


STACK_COMMAND:  kb

THREAD_SHA1_HASH_MOD_FUNC:  949bc9ed31f6f65b82927745a91c72095706179f

THREAD_SHA1_HASH_MOD_FUNC_OFFSET:  44b1e75a0f8bfc03454e521c838a99d1b55f7d87

THREAD_SHA1_HASH_MOD:  bc100a5647b828107ac4e18055e00abcbe1ec406

FOLLOWUP_IP:
nt!KiCheckForKernelApcDelivery+fe
fffff802`da49f9be 498b4a30        mov     rcx,qword ptr [r10+30h]

FAULT_INSTR_CODE:  304a8b49

SYMBOL_STACK_INDEX:  3

SYMBOL_NAME:  nt_wrong_symbols!5819BD1F820000

FOLLOWUP_NAME:  MachineOwner

BUGCHECK_STR:  5819BD1F

EXCEPTION_CODE: (NTSTATUS) 0x5819bd1f - <Unable to get error code text>

EXCEPTION_CODE_STR:  5819BD1F

EXCEPTION_STR:  WRONG_SYMBOLS

PROCESS_NAME:  ntoskrnl.wrong.symbols.exe

IMAGE_NAME:  ntoskrnl.wrong.symbols.exe

MODULE_NAME: nt_wrong_symbols

BUCKET_ID:  WRONG_SYMBOLS_X64_14393.447.amd64fre.rs1_release_inmarket.161102-0100_TIMESTAMP_161102-101703

DEFAULT_BUCKET_ID:  WRONG_SYMBOLS_X64_14393.447.amd64fre.rs1_release_inmarket.161102-0100_TIMESTAMP_161102-101703

PRIMARY_PROBLEM_CLASS:  WRONG_SYMBOLS

FAILURE_BUCKET_ID:  WRONG_SYMBOLS_X64_14393.447.amd64fre.rs1_release_inmarket.161102-0100_TIMESTAMP_161102-101703_5819BD1F_nt_wrong_symbols!5819BD1F820000

TARGET_TIME:  2016-11-16T18:59:48.000Z

OSBUILD:  14393

OSSERVICEPACK:  0

SERVICEPACK_NUMBER: 0

OS_REVISION: 0

SUITE_MASK:  272

PRODUCT_TYPE:  1

OSPLATFORM_TYPE:  x64

OSNAME:  Windows 10

OSEDITION:  Windows 10 WinNt TerminalServer SingleUserTS

OS_LOCALE:

USER_LCID:  0

OSBUILD_TIMESTAMP:  2016-11-02 06:17:03

BUILDDATESTAMP_STR:  161102-0100

BUILDLAB_STR:  rs1_release_inmarket

BUILDOSVER_STR:  10.0.14393.447.amd64fre.rs1_release_inmarket.161102-0100

ANALYSIS_SESSION_ELAPSED_TIME: ffc

ANALYSIS_SOURCE:  KM

FAILURE_ID_HASH_STRING:  km:wrong_symbols_x64_14393.447.amd64fre.rs1_release_inmarket.161102-0100_timestamp_161102-101703_5819bd1f_nt_wrong_symbols!5819bd1f820000

FAILURE_ID_HASH:  {d82425fb-28f9-fe3c-99c4-cbc6653270b1}

Followup:     MachineOwner
---------

With driver verifier, starting a project in VTune Amplifier triggers an immediate bugcheck due to vtss.sys

 

4: kd> !analyze -v
*******************************************************************************
*                                                                             *
*                        Bugcheck Analysis                                    *
*                                                                             *
*******************************************************************************

DRIVER_PAGE_FAULT_BEYOND_END_OF_ALLOCATION (d6)
N bytes of memory was allocated and more than N bytes are being referenced.
This cannot be protected by try-except.
When possible, the guilty driver's name (Unicode string) is printed on
the bugcheck screen and saved in KiBugCheckDriver.
Arguments:
Arg1: ffffaa0a0f4e9000, memory referenced
Arg2: 0000000000000000, value 0 = read operation, 1 = write operation
Arg3: fffff8026d76f9a9, if non-zero, the address which referenced memory.
Arg4: 0000000000000000, (reserved)

Debugging Details:
------------------

***** Kernel symbols are WRONG. Please fix symbols to do analysis.

*************************************************************************
***                                                                   ***
***                                                                   ***
***    Either you specified an unqualified symbol, or your debugger   ***
<snip>
***                                                                   ***
*************************************************************************

DUMP_CLASS: 1

DUMP_QUALIFIER: 401

BUILD_VERSION_STRING:  14393.447.amd64fre.rs1_release_inmarket.161102-0100

SYSTEM_MANUFACTURER:  Gigabyte Technology Co., Ltd.

SYSTEM_PRODUCT_NAME:  To be filled by O.E.M.

SYSTEM_SKU:  To be filled by O.E.M.

SYSTEM_VERSION:  To be filled by O.E.M.

BIOS_VENDOR:  American Megatrends Inc.

BIOS_VERSION:  F1

BIOS_DATE:  10/24/2012

BASEBOARD_MANUFACTURER:  Gigabyte Technology Co., Ltd.

BASEBOARD_PRODUCT:  Z77-HD4

BASEBOARD_VERSION:  x.x

ADDITIONAL_DEBUG_TEXT:
You can run '.symfix; .reload' to try to fix the symbol path and load symbols.

WRONG_SYMBOLS_TIMESTAMP: 5819bd1f

WRONG_SYMBOLS_SIZE: 820000

FAULTING_MODULE: fffff802b1617000 nt

DEBUG_FLR_IMAGE_TIMESTAMP:  5819bd1f

DUMP_TYPE:  1

BUGCHECK_P1: ffffaa0a0f4e9000

BUGCHECK_P2: 0

BUGCHECK_P3: fffff8026d76f9a9

BUGCHECK_P4: 0

READ_ADDRESS: *************************************************************************
***                                                                   ***
***                                                                   ***
***    Either you specified an unqualified symbol, or your debugger   ***
<snip>
***                                                                   ***
*************************************************************************
Unable to get size of nt!_MMPTE - probably bad symbols
 ffffaa0a0f4e9000

FAULTING_IP:
vtss+f9a9
fffff802`6d76f9a9 668b02          mov     ax,word ptr [rdx]

MM_INTERNAL_CODE:  0

CPU_COUNT: 8

CPU_MHZ: d4b

CPU_VENDOR:  GenuineIntel

CPU_FAMILY: 6

CPU_MODEL: 3a

CPU_STEPPING: 9

CPU_MICROCODE: 0,0,0,0 (F,M,S,R)  SIG: 1B'00000000 (cache) 0'00000000 (init)

CURRENT_IRQL:  0

ANALYSIS_SESSION_HOST:  TBS-BRADY

ANALYSIS_SESSION_TIME:  11-16-2016 15:11:22.0217

ANALYSIS_VERSION: 10.0.14321.1024 amd64fre

LAST_CONTROL_TRANSFER:  from fffff802b17b2a47 to fffff802b1761510

STACK_TEXT:
ffff9780`95905e98 fffff802`b17b2a47 : 00000000`00000050 ffffaa0a`0f4e9000 00000000`00000000 ffff9780`95906190 : nt!KeBugCheckEx
ffff9780`95905ea0 fffff802`b16bf5da : 00000000`00000000 00000000`00000000 ffff9780`95906190 ffff9780`959061e8 : nt!memset+0x453c7
ffff9780`95905f90 fffff802`b176aafc : 00000000`00000000 fffff802`b16ab54b ffffaa0a`00000001 fffff802`b1939d00 : nt!RtlRbRemoveNode+0x866a
ffff9780`95906190 fffff802`6d76f9a9 : fffff802`6d76fda0 ffff9780`959063ea ffff9780`95906768 fffff3f9`fcfe7aa0 : nt!setjmpex+0x23bc
ffff9780`95906328 fffff802`6d76fda0 : ffff9780`959063ea ffff9780`95906768 fffff3f9`fcfe7aa0 fffff3f9`fcfe7f38 : vtss+0xf9a9
ffff9780`95906330 fffff802`6d7700bb : 00700066`006d005c 0065002e`0070006d ffff0000`00650078 00000000`00012354 : vtss+0xfda0
ffff9780`95906630 fffff802`b1ae5ca3 : fffff802`b19528c8 ffff8184`26d32080 ffff8184`1de4d040 00000000`00000000 : vtss+0x100bb
ffff9780`95906660 fffff802`b1afb829 : 00000000`0000000a ffff8184`265a8d10 ffff8184`26d32080 00007ffc`eba4cfff : nt!NtFindAtom+0x703
ffff9780`959066c0 fffff802`b1ad91e0 : ffff8184`1c78ddb0 00000000`00000000 ffff8184`271af800 ffff9780`959068c0 : nt!MmCopyVirtualMemory+0x1e89
ffff9780`95906820 fffff802`b1ad6b4f : ffff9780`95906900 ffff9780`00000008 ffff8184`19a77080 00000000`00000001 : nt!NtMapViewOfSection+0x2980
ffff9780`959069a0 fffff802`b176c193 : 00000000`0000003c ffff8184`19b1bcc0 000001d9`28105ec0 000001d9`28105e01 : nt!NtMapViewOfSection+0x2ef
ffff9780`95906a90 00007ffc`ef405364 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : nt!setjmpex+0x3a53
0000006c`94ace6c8 00000000`00000000 : 00000000`00000000 00000000`00000000 00000000`00000000 00000000`00000000 : 0x00007ffc`ef405364


STACK_COMMAND:  kb

THREAD_SHA1_HASH_MOD_FUNC:  18d8a631bd219a5904ed130829606979a2544e24

THREAD_SHA1_HASH_MOD_FUNC_OFFSET:  a47170e0300cde5ec6347f38fa16cc5931ab8a02

THREAD_SHA1_HASH_MOD:  46ffa982c1396e062fb0c183a98d68f0f84bb3df

FOLLOWUP_IP:
vtss+f9a9
fffff802`6d76f9a9 668b02          mov     ax,word ptr [rdx]

FAULT_INSTR_CODE:  66028b66

SYMBOL_STACK_INDEX:  4

FOLLOWUP_NAME:  MachineOwner

BUGCHECK_STR:  5819BD1F

EXCEPTION_CODE: (NTSTATUS) 0x5819bd1f - <Unable to get error code text>

EXCEPTION_CODE_STR:  5819BD1F

EXCEPTION_STR:  WRONG_SYMBOLS

PROCESS_NAME:  ntoskrnl.wrong.symbols.exe

IMAGE_NAME:  ntoskrnl.wrong.symbols.exe

MODULE_NAME: nt_wrong_symbols

SYMBOL_NAME:  nt_wrong_symbols!5819BD1F820000

BUCKET_ID:  WRONG_SYMBOLS_X64_14393.447.amd64fre.rs1_release_inmarket.161102-0100_TIMESTAMP_161102-101703

DEFAULT_BUCKET_ID:  WRONG_SYMBOLS_X64_14393.447.amd64fre.rs1_release_inmarket.161102-0100_TIMESTAMP_161102-101703

PRIMARY_PROBLEM_CLASS:  WRONG_SYMBOLS

FAILURE_BUCKET_ID:  WRONG_SYMBOLS_X64_14393.447.amd64fre.rs1_release_inmarket.161102-0100_TIMESTAMP_161102-101703_5819BD1F_nt_wrong_symbols!5819BD1F820000

TARGET_TIME:  2016-11-16T20:00:40.000Z

OSBUILD:  14393

OSSERVICEPACK:  0

SERVICEPACK_NUMBER: 0

OS_REVISION: 0

SUITE_MASK:  272

PRODUCT_TYPE:  1

OSPLATFORM_TYPE:  x64

OSNAME:  Windows 10

OSEDITION:  Windows 10 WinNt TerminalServer SingleUserTS

OS_LOCALE:

USER_LCID:  0

OSBUILD_TIMESTAMP:  2016-11-02 06:17:03

BUILDDATESTAMP_STR:  161102-0100

BUILDLAB_STR:  rs1_release_inmarket

BUILDOSVER_STR:  10.0.14393.447.amd64fre.rs1_release_inmarket.161102-0100

ANALYSIS_SESSION_ELAPSED_TIME: 46b11

ANALYSIS_SOURCE:  KM

FAILURE_ID_HASH_STRING:  km:wrong_symbols_x64_14393.447.amd64fre.rs1_release_inmarket.161102-0100_timestamp_161102-101703_5819bd1f_nt_wrong_symbols!5819bd1f820000

FAILURE_ID_HASH:  {d82425fb-28f9-fe3c-99c4-cbc6653270b1}

Followup:     MachineOwner
---------

 

Zone: 

Thread Topic: 

Bug Report

VTune corrupts the file system

$
0
0

I am having a lot of trouble with VTune on Linux lately. I've been running VTune 2017.1 on Broadwell-EP with different operating systems (Ubuntu 16.04, CentOS 7, Debian 8) and after several days of automated tests with VTune, at some point the file system becomes corrupt.

A year ago I had a similar issue with VTune 2015/2016 and Ubuntu 14.04 on IvyBridge-EP. The solution back then was to run the automated tests with VTune in RHEL7.

Since I got new hardware, a new version of VTune and a new version of Ubuntu, I was hoping that the issue would disappear but this isn't the case. Since the file system becomes completely corrupt, I cannot debug the issue.

I do hotspot analysis with VTune. I get VTune to start the nightly build of the application in test mode and collect some data. Has anybody experienced this?

Thread Topic: 

Bug Report
Viewing all 1347 articles
Browse latest View live


<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>