Quantcast
Channel: Clusters and HPC Technology
Viewing all articles
Browse latest Browse all 927

intel cluster checker OFI problem

$
0
0

Hi, I'm trying to run intel cluster checker  (intel-clck-2019.3.5-025) and am getting an error in the hpl_cluster_performance module.

I've installed intel-mpi and intel-mkl both at version 2019.4-070 and then sourced:

source /opt/intel/compilers_and_libraries_2019.4.243/linux/bin/compilervars.sh intel64
source /opt/intel/compilers_and_libraries_2019.4.243/linux/mkl/bin/mklvars.sh intel64
source /opt/intel/compilers_and_libraries_2019.4.243/linux/mpi/intel64/bin/mpivars.sh

(as well as the relevant clckvars.sh)

If I run:

clck -f clck_nodes -l debug -F hpl_cluster_performance &> clck_debug.log

I get this:

<snip>
openhpc-compute-0: [0] MPI startup(): libfabric version: 1.7.2a-impi
openhpc-compute-0:
openhpc-compute-0:
openhpc-compute-0: stderr (540 bytes):
openhpc-compute-0: Abort(1094799) on node 1 (rank 1 in comm 0): Fatal error in PMPI_Init: Other MPI error, error stack:
openhpc-compute-0: MPIR_Init_thread(666)......:
openhpc-compute-0: MPID_Init(922).............:
openhpc-compute-0: MPIDI_NM_mpi_init_hook(719): OFI addrinfo() failed (ofi_init.h:719:MPIDI_NM_mpi_init_hook:No data available)

<snip>

 

trying

export FI_PROVIDER=sockets

or

export FI_PROVIDER=tcp

as suggested in other threads here before running clck still gives the same error message.

Any suggestions please??


Viewing all articles
Browse latest Browse all 927

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>