Hello,
When I want to test my HPC system in which has 33 node has 24 core each one in total 792 core and 370GB RAM for each node but I get following error as I secondly run mpiexec -f hosts2 -n 792 ./xhpl command. I had run this command before smoothly and got and output. Do you have any idea with this problem?
When first execution mpiexec -f hosts2 -n 792 ./xhpl command that produces 3.096e+04 Gflops value
By the way mpiexec -f hosts2 -n 480 ./xhpl command is working properly an produce output 2.136e+04 Gflops value
Thank you.
An explanation of the input/output parameters follows:
T/V : Wall time / encoded variant.
N : The order of the coefficient matrix A.
NB : The partitioning blocking factor.
P : The number of process rows.
Q : The number of process columns.
Time : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.
The following parameter values will be used:
N : 300288
NB : 224
PMAP : Row-major process mapping
P : 24
Q : 33
PFACT : Right
NBMIN : 4
NDIV : 2
RFACT : Crout
BCAST : 1ringM
DEPTH : 1
SWAP : Mix (threshold = 64)
L1 : transposed form
U : transposed form
EQUIL : yes
ALIGN : 8 double precision words
--------------------------------------------------------------------------------
- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be 1.110223e-16
- Computational tests pass if scaled residuals are less than 16.0
Abort(205610511) on node 516 (rank 516 in comm 0): Fatal error in PMPI_Comm_split: Other MPI error, error stack:
PMPI_Comm_split(507)................: MPI_Comm_split(MPI_COMM_WORLD, color=0, key=516, new_comm=0x7ffc61c8d818) failed
PMPI_Comm_split(489)................:
MPIR_Comm_split_impl(253)...........:
MPIR_Get_contextid_sparse_group(498): Failure during collective
Abort(876699151) on node 575 (rank 575 in comm 0): Fatal error in PMPI_Comm_split: Other MPI error, error stack:
PMPI_Comm_split(507)................: MPI_Comm_split(MPI_COMM_WORLD, color=0, key=575, new_comm=0x7ffec32d2c18) failed
PMPI_Comm_split(489)................:
MPIR_Comm_split_impl(253)...........:
MPIR_Get_contextid_sparse_group(498): Failure during collective