Dear experts,
I compiled the IMB benchmark with intel19 and openmpi-4.0.3. All test run ok except IMB_EXT.
If I run IMB_EXT in one node, all is ok, but if I run it in 2 nodes, when it runs de benchmark Accumulate, several are ok but for example,
accumulate process = 64 mode Aggregate only execute this
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] defects
0 1000 0.00 0.01 0.01 0.00
4 1000 1.60 1.60 1.60 0.00
8 1000 1.40 1.41 1.41 0.00
16 1000 1.42 1.44 1.44 0.00
32 1000 1.45 1.46 1.46 0.00
64 1000 1.53 1.54 1.53 0.00
128 1000 1.52 1.52 1.52 0.00
256 1000 1.55 1.56 1.56 0.00
512 1000 1.51 1.52 1.52 0.00
1024 1000 1.66 1.68 1.68 0.00
2048 1000 1.59 1.59 1.59 0.00
4096 1000 2.48 2.49 2.49 0.00
8192 1000 2.50 2.51 2.50 0.00
16384 1000 3.68 3.69 3.69 0.00
The program doesn't finish, but it doesn't make anything
If I compile the test with CPPFLAGS=-DCHECK, I get errors in other accumulate test
#-----------------------------------------------------------------------------
# Benchmarking Accumulate
# #processes = 32
# ( 32 additional processes waiting in MPI_Barrier)
#-----------------------------------------------------------------------------
#
# MODE: NON-AGGREGATE
#
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] defects
0 100 7.66 7.69 7.68 0.00
0: Error Accumulate,size = 4,sample #0
Process 0: Got invalid buffer:
Buffer entry: 105.599998
pos: 0
Process 0: Expected buffer:
Buffer entry: 52.799999
4 100 117.85 117.95 117.90 1.00
0: Error Accumulate,size = 8,sample #0
Process 0: Got invalid buffer:
Buffer entry: 105.599998
pos: 0
Process 0: Expected buffer:
Buffer entry: 52.799999
.....
......
0: Error Accumulate,size = 4194304,sample #0
Process 0: Got invalid buffer:
Buffer entry: 105.499992
pos: 0
Process 0: Expected buffer:
Buffer entry: 52.799999
4194304 10 156021.70 194528.12 193324.60 0.00
#-----------------------------------------------------------------------------
# Benchmarking Accumulate
# #processes = 64
#-----------------------------------------------------------------------------
#
# MODE: AGGREGATE
#
#bytes #repetitions t_min[usec] t_max[usec] t_avg[usec] defects
0 1000 0.00 0.01 0.01 0.00
4 1000 1.60 1.60 1.60 0.00
8 1000 1.40 1.41 1.41 0.00
16 1000 1.42 1.44 1.44 0.00
32 1000 1.45 1.46 1.46 0.00
64 1000 1.53 1.54 1.53 0.00
128 1000 1.52 1.52 1.52 0.00
256 1000 1.55 1.56 1.56 0.00
512 1000 1.51 1.52 1.52 0.00
1024 1000 1.66 1.68 1.68 0.00
2048 1000 1.59 1.59 1.59 0.00
4096 1000 2.48 2.49 2.49 0.00
8192 1000 2.50 2.51 2.50 0.00
16384 1000 3.68 3.69 3.69 0.00
And in this test not errors but it stop.
Any idea about this problem???