Hi,
We have a small cluster (head node + 4 nodes with 16 cores) using Intel infiniband. This cluster is under CentOS 6.6 (with the kernel of CentOS 6.5).
On this cluster Intel Parallel Studio XE 2015 is installed. I_MPI_FABRICS is set per default to "tmi" only.
When I start a job (using torque+maui) on several nodes, for example this one:
#!/bin/bash #PBS -N IMB-MPI1_intelmpi #PBS -l walltime=2:00:00 #PBS -l nodes=3:ppn=4 cd $PBS_O_WORKDIR export I_MPI_FABRICS=tmi mpirun IMB-MPI1
The job is running fine without any problem.
Now I start a job on a node:
#!/bin/bash #PBS -N IMB-MPI1_intelmpi #PBS -l walltime=2:00:00 #PBS -l nodes=1:ppn=16 cd $PBS_O_WORKDIR export I_MPI_FABRICS=tmi mpirun IMB-MPI1
This job does not start and I get this message:
can't open /dev/ipath, network down tmi fabric is not available and fallback fabric is not enabled
Is it normal?
If I set as default I_MPI_FABRICS=dapl, I don't have this problem at all.
How can I solve that?
Best regards,
Guillaume