Hi,
I am having a huge trouble in submitting job remotely with the help of a PBS script in a HPC cluster (lscpu output in the login node is as bellow).
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Thread(s) per core: 1
Core(s) per socket: 8
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 45
Stepping: 7
CPU MHz: 2593.778
BogoMIPS: 5186.81
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 20480K
NUMA node0 CPU(s): 0-7
NUMA node1 CPU(s): 8-15
In our local server (2*8 core with hyper-threading 2*16 thread) when I am submitting the job (mpirun -np 6 -map-by node relion_refine_mpi ... -j 20) it's running fine with 20 thread and 6 processes. In our local server we use openmpi.
But when I am submitting the job to the above said cluster, with a pbs script (attached) the program "relion_refine_mpi" is not running. I have also attached the output files.
For your information, the program I am trying to run is a scientific program (Relion) which needs to compiled with openmpi. But in the cluster after compiling the software module, I can't use openmpi to run the job. I have to run the job through intel mpirun.
From the forrum of the developing group of the program, It's said that the cluster is not assigning any processes to the requested nodes. the mpirun command is not following the -np flag. I tried to give the value of the mpirun -np manually and from the $PBS_NODEFILE. None of them worked.
Please help me properly writing the PBS script as soon as you can. It's really important for my research work.