Quantcast
Channel: Intel® VTune™ Profiler (Intel® VTune™ Amplifier)
Viewing all articles
Browse latest Browse all 1347

Examining the serialized memory access effect, in multi-threaded softwares

$
0
0

hello everyone,

 

I am working on a multi-threaded video encoder application (x265).

I need to prove that, while increasing the number of threads can improve the total run-time, after certain number of thread (cores), it will cause in insufficient memory resources. that is to say, concurrent memory accesses from different cores, will lead to a queue of request at the DRAM, so the delay from the memory can affect the performance.

1- what do you think is the best method to get these results?

2- I have performed y tests with 2,4, and 8 threads (cores) on my machine (intel ivy bridge i7), on memory access analysis mode. But while the "Memory Latency" factor in vtune starts to increase (2threads: 0.048, 4threads: 0.533, 8threads: 0.735), the "Average Latency (cycles)" remains almost constant (around 11 or 12). why do you think that happens? because I think the average latency should've increased due to longer DRAM access time. can anyone please tell me what "Average Latency (cycles)" and "Memory Latency" exactly are? does the average latency take into account the memory latency too?

 

 

thanks in advance,

Farhad


Viewing all articles
Browse latest Browse all 1347

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>