I've have errors using mpirun whitin a cpuset (regardles if the cset shield is activatet or not)
cset set -lr
cset:
Name CPUs-X MEMs-X Tasks Subs Path
------------ ---------- - ------- - ----- ---- ----------
root 0-431 y 0-11 y 4956 2 /
user 24-431 n 1-11 n 0 0 /user
system 0-23 n 0 n 0 0 /system
which mpirun
/opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/bin/mpirun
cset proc --move -p $$ /
mpirun -np 10 ./wrf.exe #PROPERLY WORKS
cset proc --move -p $$ /system
mpirun -np 10 ./wrf.exe #PROPERLY WORKS
cset proc --move -p $$ /user
mpirun -np 10 ./wrf.exe #ERROR!!!!
/opt/intel/compilers_and_libraries_2020.0.166/linux/mpi/intel64/bin/mpirun: line 103: 343504 Segmentation fault (core dumped) mpiexec.hydra "$@" 0<&0
The error happens also in this way:
cset proc --exec -s /user mpirun -- -np 10 ./wrf.exe
The fact the error happens only in the /user cpuset is quite strange, isn'nt it?
After all cpuset /user doesn't differ much from cpust /system wher mpirun work properly!
The error happens whichever -np is, also without -np flags.
Can anybody help me?
thanks from Italy,
Emanuele Lombardi
ifort (IFORT) 19.1.0.166 20191121
Intel(R) MPI Library for Linux* OS, Version 2019 Update 6 Build 20191024 (id: 082ae5608)
SLES15SP1
HP Superdome Flex (ex SGI UV)
topology
System type: Superdome Flex
System name: tiziano
Serial number: CZ20040JWV
12 Blades
432 CPUs (online: 0-431)
12 Nodes
2230 GB Memory Total
1 Co-processor
2 Fibre Channel Controllers
4 Network Controllers
1 SATA Storage Controller
1 USB Controller
1 VGA GPU
2 RAID Controllers
BTW I had the same error in 2013 as you can see from
https://software.intel.com/en-us/forums/intel-clusters-and-hpc-technolog...