Quantcast
Channel: Clusters and HPC Technology
Viewing all articles
Browse latest Browse all 927

Intel mpirun error - AI workload

$
0
0

Hi,

  I tried to run one of my workload model for training on a CentOs cluster for MPI analysis. Please find below the command used and the error is displayed below. Request your help in resolving the issue. 

Commands used 

mpiexec  –ppn 1 -- ./scripts/run_intelcaffe.sh --hostfile ~/mpd.hosts --solver models/intel_optimized_models/multinode/resnet50_8nodes_2s/solver.prototxt --network tcp --netmask enp175s0 --benchmark mpi

mpirun  –ppn 1 –l amplxe-cl -collect hotspots -k sampling-mode=hw -result-dir results -- ./scripts/run_intelcaffe.sh --hostfile ~/mpd.hosts --solver models/intel_optimized_models/multinode/resnet50_8nodes_2s/solver.prototxt --network tcp --netmask enp175s0 --benchmark mpi

I keep getting the following error. 

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 26 PID 72362 RUNNING AT node001
=   EXIT STATUS: 255
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 27 PID 72363 RUNNING AT node001
=   KILLED BY SIGNAL: 9 (Killed)
 


Viewing all articles
Browse latest Browse all 927

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>