Hi,
I ran a HPCPerformance analysis(vtune 2020u0) on intel8280 (RHEL7.6) with default settings as -
time mpirun -np $SLURM_NPROCS -ppn $SLURM_NTASKS_PER_NODE $OPTS amplxe-cl -collect hpc-performance -data-limit 0 -result-dir result_hpcperf -- ${APP_INSTALL_ROOT}/appname.exe
the analysis part
vtune: Executing actions 0 % ........ vtune: Executing actions 100 % done
took around 45 minutes and "result_hpcperf.nodeXX" directory had around 20G data.
Q1: If my linux kernel version is 3.10.0-957.el7.x86_64 then what will be the default sampling interval ?
Q2: If i reduce the sampling interval for an analysis by half, (by rough estimate) how much elapsed time and output data should i expect for the vtune analysis+report generation part ?
- I was expecting that if the sampling interval is halved (default 1ms -> 0.5ms ) , then the analysis & result generation should take around 90 minutes and i was expecting data of around 40-50 GB. Please let me know if my assumptions are incorrect.
Q3: Also, If i reduce the sampling interval for an analysis by half, then (in general based on your observations with this tool) how much accuracy in output data metrics can i expect ?
As per this article (CPU sampling interval, ms field) , i assumed the default sampling interval should be 1ms, and i reran HPC performance analysis by setting sampling-interval to 0.5 ms as -
time mpirun -np $SLURM_NPROCS -ppn $SLURM_NTASKS_PER_NODE $OPTS amplxe-cl -collect hpc-performance -data-limit 0 -result-dir result_hpcperf -knob sampling-interval=0.5 -- ${APP_INSTALL_ROOT}/appname.exe
the last statement to appear in the stdout was -
vtune: Executing actions 0 %
and around 11 hours ave elapsed since then and around 150G of data has been generated in results directory.
within the results directory ( find . -printf "%T+\t%p\n" | sort) i saw that the last file was changed around 11 hours ago , and that file has following contents -
[user@headnode01 hpcperf_char_00003]$ cat result_hpcperf.node3/config/log.cfg <?xml version='1.0' encoding='UTF-8'?> <bag xmlns:int="http://www.w3.org/2001/XMLSchema#int" xmlns:long="http://www.w3.org/2001/XMLSchema#long"> <message_entry_t int:status="2" cap="Data collection completed successfully" msg="" long:timeStamp="1586803953480"/> <message_entry_t int:status="2" cap="Data collection completed successfully" msg="" long:timeStamp="1586803953542"/> <message_entry_t int:status="2" cap="Data collection completed successfully" msg="" long:timeStamp="1586803953687"/> <message_entry_t int:status="2" cap="Data collection completed successfully" msg="" long:timeStamp="1586803953748"/> <message_entry_t int:status="2" cap="Data collection completed successfully" msg="" long:timeStamp="1586803954281"/> <message_entry_t int:status="1" cap="Data collection completed with warnings" msg="Please see warning messages for details. " long:timeStamp="1586809230671"> <message msg="Analyzing data in the node-wide mode. The hostname (node61) will be added to the result path/name." int:severity="1"/> <message msg="Peak bandwidth measurement started." int:severity="1"/> <message msg="Peak bandwidth measurement finished." int:severity="1"/> <message msg="To enable hardware event-base sampling, VTune Profiler has disabled the NMI watchdog timer. The watchdog timer will be re-enabled after collection completes." int:severity="2"/> <message msg="Collection started." int:severity="1"/> <message msg="Collection stopped." int:severity="1"/> </message_entry_t> </bag>
also, on the compute node (node3) i checked the running processes via top command -
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 127588 root 20 0 4128520 82480 3308 R 100.0 0.0 563:13.50 sep 10 root 20 0 0 0 0 S 6.2 0.0 0:22.52 rcu_sched 1 root 20 0 56068 8276 2620 S 0.0 0.0 0:26.51 systemd
Here also , it seems that the sep command(/driver)has been running since ~9hours with no memory utilization. Not sure if the application/sep driver is running fine. Is there a way to confirm (via system logs/sep driver logs) if the application is running fine?
It would be very helpful for me if i could get an estimate of the time to be taken by this analysis to finish in my scenario?
- Asking as i will adjust the "walltime" for my vtune jobs on my cluster accordingly.
Please let me know if i can provide more information from my end to help you with answers to my queries.