I am compiling a scientific program package called Octopus 9.1 on a cluster by specifying BLAS library with
-L${MKL_DIR} -Wl,-Bstatic -Wl,--start-group -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -Wl,--end-group -Wl,-Bdynamic -lpthread -lm -ldl
BLACS with
-L${MKL_DIR} -Wl,-Bstatic -lmkl_scalapack_lp64
and SCALAPCK with
-L${MKL_DIR} -Wl,-Bstatic -lmkl_scalapack_lp64
All of those options and flags are what the Intel Link Line Advisor spits given my computer architecture. The compilers are openmpi's mpif90 and mpicc compiled with Intel 18.0.0 compilers. The program works fine, it runs fast, nothing to be worried except for a few segfault errors during test run which I suspect can be remedied by ulimit -s unlimited. But I would like to know why -lmkl_intel_lp64, -lmkl_sequential, -lmkl_core, as well as that blacs and scalapack libraries have to be statically linked? For instance, when those -Wl,-Bstatic and -Wl,-Bdynamic are removed, I got segfault runtime error for any calculation I launched. Looking at Octopus's manual, it doesn't say anything about which intel's library should be linked statically or dynamically, in fact those intel-advised compiler options are architecture-dependent. Moreover, if I switched compilers to MPICH which also wrap intels compiler (same version as before), the program runs significantly slower (in one calculation it was 50 seconds with openmpi vs. 1 hour with mpich) and -Wl,-Bstatic and -Wl,-Bdynamic options have to be absent otherwise segfault. This is really bugging me, how come a mere difference in MPI implementation can lead to such a huge difference in performance and linking behavior. Any thought on this?