Quantcast
Channel: Clusters and HPC Technology
Viewing all articles
Browse latest Browse all 927

HPC Cluster HPL test error

$
0
0

Hello,

When I want  to test my HPC system in which has 33 node has 24 core each one in total 792 core and 370GB RAM for each node but I get following error as I secondly run  mpiexec -f hosts2 -n 792 ./xhpl  command. I had run this command before smoothly and got and output. Do you have any idea with this problem?

When first execution  mpiexec -f hosts2 -n 792 ./xhpl  command that produces 3.096e+04 Gflops value

By the way mpiexec -f hosts2 -n 480 ./xhpl command is working properly an produce output 2.136e+04 Gflops value

Thank you.

An explanation of the input/output parameters follows:
T/V    : Wall time / encoded variant.
N      : The order of the coefficient matrix A.
NB     : The partitioning blocking factor.
P      : The number of process rows.
Q      : The number of process columns.
Time   : Time in seconds to solve the linear system.
Gflops : Rate of execution for solving the linear system.

The following parameter values will be used:

N      :  300288 
NB     :     224 
PMAP   : Row-major process mapping
P      :      24 
Q      :      33 
PFACT  :   Right 
NBMIN  :       4 
NDIV   :       2 
RFACT  :   Crout 
BCAST  :  1ringM 
DEPTH  :       1 
SWAP   : Mix (threshold = 64)
L1     : transposed form
U      : transposed form
EQUIL  : yes
ALIGN  : 8 double precision words

--------------------------------------------------------------------------------

- The matrix A is randomly generated for each test.
- The following scaled residual check will be computed:
      ||Ax-b||_oo / ( eps * ( || x ||_oo * || A ||_oo + || b ||_oo ) * N )
- The relative machine precision (eps) is taken to be               1.110223e-16
- Computational tests pass if scaled residuals are less than                16.0

Abort(205610511) on node 516 (rank 516 in comm 0): Fatal error in PMPI_Comm_split: Other MPI error, error stack:
PMPI_Comm_split(507)................: MPI_Comm_split(MPI_COMM_WORLD, color=0, key=516, new_comm=0x7ffc61c8d818) failed
PMPI_Comm_split(489)................: 
MPIR_Comm_split_impl(253)...........: 
MPIR_Get_contextid_sparse_group(498): Failure during collective
Abort(876699151) on node 575 (rank 575 in comm 0): Fatal error in PMPI_Comm_split: Other MPI error, error stack:
PMPI_Comm_split(507)................: MPI_Comm_split(MPI_COMM_WORLD, color=0, key=575, new_comm=0x7ffec32d2c18) failed
PMPI_Comm_split(489)................: 
MPIR_Comm_split_impl(253)...........: 
MPIR_Get_contextid_sparse_group(498): Failure during collective
 

TCE Level: 

TCE Open Date: 

Monday, December 30, 2019 - 16:19

Viewing all articles
Browse latest Browse all 927

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>