Dear All
Good afternoon
I successfully installed intel parallel studio 19 update 4 on my cluster based on Ubuntu 18.04 LTS
The cluster is composed by 4 nodes: a master and 3 other nodes where I run my calculations.
I am able to run calculations on the master only or on the nodes only or togheter.
But when I try to ask for master+ one of the nodes I receive this message error:
Abort(543240207) on node 7 (rank 7 in comm 0): Fatal error in PMPI_Bcast: Other MPI error, error stack:
PMPI_Bcast(452)...................: MPI_Bcast(buf=0x5b7cee0, count=10, MPI_INTEGER, root=7, comm=MPI_COMM_WORLD) failed
PMPI_Bcast(438)...................:
MPIDI_SHMGR_Gather_generic(391)...:
MPIDI_NM_mpi_bcast(161)...........:
MPIR_Bcast_intra_tree(227)........: Failure during collective
MPIR_Bcast_intra_tree(219)........:
MPIR_Bcast_intra_tree_generic(180): Failure during collective
And also when I run the MPI-Benchmarks as :
mpirun -hosts master,node1 -n 2 -ppn 1 /opt/intel/compilers_and_libraries_2019.4.243/linux/mpi/intel64/bin/IMB-MPI1
I receive this error message
Abort(609312527) on node 1 (rank 1 in comm 0): Fatal error in PMPI_Comm_split: Other MPI error, error stack:
PMPI_Comm_split(507)........................: MPI_Comm_split(MPI_COMM_WORLD, color=0, key=1, new_comm=0x6de6e4) failed
PMPI_Comm_split(489)........................:
MPIR_Comm_split_impl(167)...................:
MPIR_Allgather_intra_auto(145)..............: Failure during collective
MPIR_Allgather_intra_auto(141)..............:
MPIR_Allgather_intra_recursive_doubling(126):
MPIC_Sendrecv(344)..........................:
MPID_Isend(662).............................:
MPID_isend_unsafe(282)......................:
MPIDI_OFI_send_lightweight_request(106).....:
(unknown)(): Other MPI error
I tried also to install parallel studio 19 update 5 but the problem is still the same
All the best
Lorenzo