hello,
I installed the Omni-path driver ( IntelOPA-Basic.RHEL74-x86_64.10.6.1.0.2.tgz ) on two identical KNL/F servers with Centos ( CentOS Linux release 7.4.1708 (Core) )
I executed the MPI Benchmark provided by intel using PSM2:
mpirun -PSM2 -host 10.0.0.5 -n 1 /opt/intel/impi/2018.1.163/bin64/IMB-MPI1 Sendrecv : -host 10.0.0.6 -n 1 /opt/intel/impi/2018.1.163/bin64/IMB-MPI1 Sendrecv
And the execution return the following error:
[silvio@phi05 ~]$ mpirun -PSM2 -host 10.0.0.5 -n 1 /opt/intel/impi/2018.1.163/bin64/IMB-MPI1 Sendrecv : -host 10.0.0.6 -n 1 /opt/intel/impi/2018.1.163/bin64/IMB-MPI1 Sendrecv
init_provider_list: using configuration file: /opt/intel/compilers_and_libraries_2018.1.163/linux/mpi/intel64/etc/tmi.conf
init_provider_list: valid configuration line: psm2 1.3 libtmip_psm2.so ""
init_provider_list: using configuration file: /opt/intel/compilers_and_libraries_2018.1.163/linux/mpi/intel64/etc/tmi.conf
init_provider_list: valid configuration line: psm 1.2 libtmip_psm.so ""
init_provider_list: valid configuration line: mx 1.0 libtmip_mx.so ""
init_provider_list: valid configuration line: psm2 1.3 libtmip_psm2.so ""
init_provider_list: valid configuration line: psm 1.2 libtmip_psm.so ""
init_provider_list: valid configuration line: mx 1.0 libtmip_mx.so ""
tmi_psm2_init: tmi_psm2_connect_timeout=180
init_provider_lib: using provider: psm2, version 1.3
tmi_psm2_init: tmi_psm2_connect_timeout=180
init_provider_lib: using provider: psm2, version 1.3
phi05.11971 Trying to connect to a HFI (subnet id - 0)on a different subnet - 1023
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 11971 RUNNING AT 10.0.0.5
= EXIT CODE: 134
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 11971 RUNNING AT 10.0.0.5
= EXIT CODE: 6
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
Intel(R) MPI Library troubleshooting guide:
https://software.intel.com/node/561764
===================================================================================
I search this message on google (Trying to connect to a HFI (subnet id - 0)on a different subnet - 1023 ) and the only reference is the following source code:
https://github.com/01org/opa-psm2/blob/master/ptl_ips/ips_proto_connect.c
How do i put the two fabrics in the same subnet?
When i change to Infiniband it works:
mpirun -IB -host 10.0.0.5 -n 1 /opt/intel/impi/2018.1.163/bin64/IMB-MPI1 Sendrecv : -host 10.0.0.6 -n 1 /opt/intel/impi/2018.1.163/bin64/IMB-MPI1 Sendrecv
#-----------------------------------------------------------------------------
# Benchmarking Sendrecv
# #processes = 2
#-----------------------------------------------------------------------------
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] Mbytes/sec
0 1000 17.79 17.79 17.79 0.00
1 1000 18.11 18.11 18.11 0.11
2 1000 18.05 18.05 18.05 0.22
4 1000 18.08 18.08 18.08 0.44
8 1000 18.05 18.05 18.05 0.89
16 1000 18.06 18.06 18.06 1.77
32 1000 18.99 18.99 18.99 3.37
64 1000 19.05 19.07 19.06 6.71
128 1000 19.20 19.20 19.20 13.33
256 1000 19.96 19.97 19.97 25.64
512 1000 20.22 20.22 20.22 50.63
1024 1000 20.38 20.39 20.39 100.44
2048 1000 24.70 24.71 24.70 165.78
4096 1000 25.98 25.98 25.98 315.31
8192 1000 55.57 55.59 55.58 294.75
16384 1000 61.89 61.90 61.90 529.33
32768 1000 112.95 113.01 112.98 579.89
65536 640 158.22 158.23 158.22 828.37
131072 320 297.40 297.50 297.45 881.16
262144 160 599.27 600.30 599.78 873.38
524288 80 31394.80 31489.45 31442.13 33.30
1048576 40 28356.10 28414.67 28385.39 73.81
2097152 20 31387.65 31661.40 31524.53 132.47
4194304 10 38455.80 40408.99 39432.39 207.59