Quantcast
Channel: Clusters and HPC Technology
Viewing all articles
Browse latest Browse all 927

mpiifort running fine on some nodes and showing "open_hca: device mlx4_0 not found" for others

$
0
0

Dear all,

Using mpiifort on a cluster results in : "open_hca: device mlx4_0 not found" for some group nodes while for others there is no error and mpiifort runs perfectly fine. All the nodes have the same hardware/software configuration. I already had a look at the similar topic at :

https://software.intel.com/en-us/forums/intel-clusters-and-hpc-technology/topic/393416

And applied the proposed solution of commenting the ofa-v2-mlx4_0-1 and ofa-v2-mlx4_0-2 lines in /etc/dat.conf, but it did not solve the issue.

Would you have any idea of what might be wrong ? I attach the error log as well as ibstat output if it can help :

$ ibstat

CA 'mlx4_0'
        CA type: MT4099
        Number of ports: 1
        Firmware version: 2.31.5050
        Hardware version: 1
        Node GUID: 0xf45214030090c050
        System image GUID: 0xf45214030090c053
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 56
                Base lid: 1
                LMC: 0
                SM lid: 1
                Capability mask: 0x0251486a
                Port GUID: 0xf45214030090c051
                Link layer: InfiniBand

Many thanks in advance,

EdrisseDownloadtext/plainDownload


Viewing all articles
Browse latest Browse all 927

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>