Dear Support,
I am currently running on RedHat Linux 6.2 64-bit with Intel compilers 12.1.0 and Intel MPI 4.0.3.008 over Qlogic Infiniband QDR (PSM). I am also using Intel Trace Analyzer and Collector 8.0.3.007.
I am trying to debug an MPI problem when running on a large number of cores (>6000) and I compile my application with "-check_mpi". My application is mixed FORTRAN, C, and C++ and most MPI calls are in FORTRAN.
I launch my MPI job with the below options:
mpiexec.hydra -env I_MPI_FABRICS tmi -env I_MPI_TMI_PROVIDER psm -env I_MPI_DEBUG 5 .......
As soon as I launch the application the trace collector crashes with the below error:
[0] Intel(R) Trace Collector ERROR: cannot create socket: socket(): Too many open files
[32] Intel(R) Trace Collector ERROR: connection closed by peer #0, receiving remaining 8 of 8 bytes failed
It works fine on a less number of cores but I need to debug on a large number of cores beyond 6000 cores since that's when my application starts giving me problems with MPI.
Any suggestions on how to overcome this limitation? Is there a way to have the trace collector run over Infiniband instead of TCP sockets?
Thank you for your help.
Mohamad Sindi
EXPEC Advanced Research Center
Saudi Aramco