Quantcast
Channel: Clusters and HPC Technology
Viewing all articles
Browse latest Browse all 927

MPI_Alltoall error when running more than 2 cores per node

$
0
0

We have 6 Intel(R) Xeon(R) CPU D-1557 @ 1.50GHz nodes, each containing 12 cores.  hpcc version 1.5.0 has been compiled with Intel's MPI and MLK.  We are able to run hpcc successfully when configuring mpirun for 6 nodes and 2 cores per node.  However, attempting to specify more than 2 cores per nodes (we have 12) causes the error "invalid error code ffffffff (Ring Index out of range) in MPIR_Alltoall_intra:204"

Any ideas as to what could be causing this issue?

The following environment variables have been set:
I_MPI_FABRICS=tcp
I_MPI_DEBUG=5
I_MPI_PIN_PROCESSOR_LIST=0,1,2,3,4,5,6,7,8,9,10,11

The MPI library version is:
Intel(R) MPI Library for Linux* OS, Version 2017 Update 3 Build 20170405 (id: 17193)

hosts.txt contains a list of 6 hostnames

The line below shows how mpirun is specified to execute hpcc on all 6 nodes, 3 cores per node:
mpirun -print-rank-map -n 18 -ppn 3  --hostfile hosts.txt  hpcc

INTERNAL ERROR: invalid error code ffffffff (Ring Index out of range) in MPIR_Alltoall_intra:204
Fatal error in PMPI_Alltoall: Other MPI error, error stack:
PMPI_Alltoall(974)......: MPI_Alltoall(sbuf=0x7fcdb107f010, scount=2097152, dtype=USER<contig>, rbuf=0x7fcdd1080010, rcount=2097152, dtype=USER<contig>, comm=0x84000004) failed
MPIR_Alltoall_impl(772).: fail failed
MPIR_Alltoall(731)......: fail failed
MPIR_Alltoall_intra(204): fail failed

Thanks!

 


Viewing all articles
Browse latest Browse all 927

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>