Quantcast
Channel: Clusters and HPC Technology
Viewing all articles
Browse latest Browse all 927

MPI Linpack from MKL, SSE4_2, turbo, and Skylake: SSE 4.2 threads run at the AVX2 turbo frequency

$
0
0

This is a follow-on from this related topic in the MKL Forum
https://software.intel.com/en-us/forums/intel-math-kernel-library/topic/782951

In that situation, there were a couple of version of Intel MPI 2017.* in which a couple of the threads in each MPI process running Linpack with SSE 4.2 instructions would run at the AVX-512 turbo frequency rather than at the non-AVX frequency.  (The other threads would all run at the non-AVX frequency, as expected).  I'm using the version of Linpack that comes with MKL; I find it at /opt/intel/compilers_and_libraries_2018.3.222/linux/mkl/benchmarks/mp_linpack/, for example.

That problem was avoided by using older or newer versions of MPI.

The current problem is more subtle, and is from, I suspect, Intel MPI using "light-weight" AVX-512 instructions on Skylake; perhaps for copying data.  The wikichip page https://en.wikichip.org/wiki/intel/frequency_behavior says that "light-weight" AVX-512 instructions will run at the AVX 2.0 frequency.

Typically it wouldn't be a problem for MPI communication to run at a lower frequency, as the core's frequency will drop from the non-AVX frequency to the AVX 2.0 frequency while executing these instructions, but will return to the non-AVX frequency soon after.

The problem, however, is in jitter-sensitive environments.  Even there it typically won't be a problem, because the MPI code will likely be in a section that is not sensitive to jitter; other cores should not be affected and the thread doing communication should soon return to its higher frequency..

However, Hewlett Packard Enterprise has a feature called Jitter Smoothing as part of its Intelligent System Tuning feature set (https://support.hpe.com/hpsc/doc/public/display?docId=a00018313en_us),  When this feature is set to "Auto" mode, the server will notice changes to the operating frequency of the cores, and will decrease the frequency of all cores on all processors accordingly.  The system soon gets to a stable state in which frequency changes no longer occur, with a resulting state of minimized jitter -- but at the lower frequency of the AVX 2.0 turbo frequency.

On my servers, I have this Jitter Smoothing feature enabled.  When I run, for example, the MPI version of Linpack with the environment variable MKL_ENABLE_INSTRUCTIONS=SSE4_2 on Skylake with turbo enabled, I see the operating frequency gradually drop from the non-AVX frequency to the AVX2 frequency.  This happens on all cores across the server, which is what Jitter Smoothing is expected to do. I do not see this happen when I run the OpenMP (non-MPI) version of Linpack; in that case all the CPUs stay at the non-AVX frequency for the duration of the workload.

As I said, I hypothesize that MPI is using some AVX-512 instructions, perhaps to copy data, and that's causing my performance problem.  If my hypothesis is correct, my question is if there is a way to tell Intel MPI to not use AVX instructions, similar to the MKL_ENABLE_INSTRUCTIONS environment variable that Intel MKL uses.

 


Viewing all articles
Browse latest Browse all 927

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>