Quantcast
Channel: Clusters and HPC Technology
Viewing all articles
Browse latest Browse all 927

need to type "Enter" ?

$
0
0

Hi, Everyone,

I am running my hybrid MPI/OpenMP jobs on 3-nodes Infiniband PCs Linux cluster. each node has one MPI process that has 15 OpenMP threads. This means my job runs with 3 MPI processes and each MPI process has 15 threads.

the hosts.txt file is given as follows:

coflowrhc4-5:1
coflowrhc4-6:1
coflowrhc4-7:1

 I wrote the following batch file as follows:

/************** batch file******************/

export CMG_LIC_HOST=rlmserv
export exe=/cmg/dingjun/imexLocal/imex_xsamg_dave.exe
export LD_LIBRARY_PATH=/cmg/dingjun/imexLocal/linux_x64/lib
export OMP_SCHEDULE=static,1
export KMP_AFFINITY=compact,0

export datadir=/cmg/dingjun/imexdatasets/7testproblems/mx1041_rb
cd /cmg/dingjun/imexdatasets/7testproblems/mx1041_rb
mpirun -machinefile hosts.txt ${exe} -fgmres -f ${datadir}/mx1041x105x10loa2_rb_xsamg.dat -log -jacdoms 16 -parasol 16 -o mx1041x105x10loa2_rb_xsamg_3MPI15threads_run7

export datadir=/cmg/dingjun/imexdatasets/7testproblems/mx521_rb
cd /cmg/dingjun/imexdatasets/7testproblems/mx521_rb
mpirun -machinefile hosts.txt ${exe} -fgmres -f ${datadir}/mx521x469x20_rb_xsamg.dat -log -jacdoms 16 -parasol 16 -o mx5211x469x20_rb_xsamg_3MPI15threads_run1

export datadir=/cmg/dingjun/imexdatasets/7testproblems/spe10_rb
cd /cmg/dingjun/imexdatasets/7testproblems/spe10_rb
mpirun -machinefile hosts.txt ${exe} -fgmres -f ${datadir}/spe10_rb_xsamg.dat -log -jacdoms 16 -parasol 16 -o spe10_rb_xsamg_3MPI15threads_run1

/************** end of batch file******************/

the Intel MPI version installed is lmpi5.0.3.048 and the problem occurred as follows:

Each time when MPIRUN finishes, I need to type the key "Enter" and then next MPIRUN began to run. Therefore, it is not very convenient for me to run jobs in the batch way. For example:

mpirun -machinefile hosts.txt ${exe} -fgmres -f ${datadir}/mx1041x105x10loa2_rb_xsamg.dat -log -jacdoms 16 -parasol 16 -o mx1041x105x10loa2_rb_xsamg_3MPI15threads_run7

when above job running on 3-nodes finishes, I need to enter the key "Enter" on the keyboard, then the next job:

mpirun -machinefile hosts.txt ${exe} -fgmres -f ${datadir}/mx521x469x20_rb_xsamg.dat -log -jacdoms 16 -parasol 16 -o mx5211x469x20_rb_xsamg_3MPI15threads_run1

begins to run. Otherwise, the PCs cluster is stuck and the above 2nd job would never begin to run.

Could you tell me what caused above problem? Thanks in advance.

I am looking forward to hearing from you.

  

 

 

 

 

 


Viewing all articles
Browse latest Browse all 927

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>