Hi, I'm trying to run intel cluster checker (intel-clck-2019.3.5-025) and am getting an error in the hpl_cluster_performance module.
I've installed intel-mpi and intel-mkl both at version 2019.4-070 and then sourced:
source /opt/intel/compilers_and_libraries_2019.4.243/linux/bin/compilervars.sh intel64 source /opt/intel/compilers_and_libraries_2019.4.243/linux/mkl/bin/mklvars.sh intel64 source /opt/intel/compilers_and_libraries_2019.4.243/linux/mpi/intel64/bin/mpivars.sh
(as well as the relevant clckvars.sh)
If I run:
clck -f clck_nodes -l debug -F hpl_cluster_performance &> clck_debug.log
I get this:
<snip> openhpc-compute-0: [0] MPI startup(): libfabric version: 1.7.2a-impi openhpc-compute-0: openhpc-compute-0: openhpc-compute-0: stderr (540 bytes): openhpc-compute-0: Abort(1094799) on node 1 (rank 1 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack: openhpc-compute-0: MPIR_Init_thread(666)......: openhpc-compute-0: MPID_Init(922).............: openhpc-compute-0: MPIDI_NM_mpi_init_hook(719): OFI addrinfo() failed (ofi_init.h:719:MPIDI_NM_mpi_init_hook:No data available) <snip>
trying
export FI_PROVIDER=sockets
or
export FI_PROVIDER=tcp
as suggested in other threads here before running clck still gives the same error message.
Any suggestions please??