Hi!
I'm trying to run a software called LAMMPS, across nodes. As recommended in it's page I'm using 2 OpenMP threads and enough MPI processes to fill all cores:
https://lammps.sandia.gov/doc/Speed_intel.html
This is fine if I'm running it on one node. When I use, say 4 nodes, the process uses only 2 nodes.
How do I distribute it across nodes? This is what I've tried:
mpirun -machinefile $PBS_NODEFILE -n 64 -ppn 16 \
-genv OMP_NUM_THREADS=2 -genv I_MPI_PIN_DOMAIN=omp \
lmp -in in.lammps -suffix hybrid intel omp -package intel 0 omp 2
There are 32 cores per node, and so I'm trying to assign 16 MPI processes per node, so each may spawn 2 OMP threads. And `lmp` is the LAMMPS executable.
What am I doing wrong?