Quantcast
Channel: Clusters and HPC Technology
Viewing all articles
Browse latest Browse all 927

Intel MPI fails to load libmpi_lustre.so

$
0
0

Hello,

under which circumstances might we see the following error:

[3] ERROR - ADIO_Init(): Can't load libmpi_lustre.so library: libmpi_lustre.so: cannot open shared object file: No such file or directory
[2] ERROR - ADIO_Init(): Can't load libmpi_lustre.so library: libmpi_lustre.so: cannot open shared object file: No such file or directory

This is from a reduced test case where only ranks 2 and 3 out of 0-3 open a file with MPI_File_open. At this point the above message is printed and the job is aborted. We run on RHEL6 x86_64.

When tracing the executable with strace, I can see that it tries to load libmpi_lustre.so from various directories, but not the one that Intel MPI is installed to, which is also part of the executable's RPATH:

$  ../libtool --mode=execute objdump -x pio_write | grep RPATH
  RPATH                /sw/rhel6-x64/intel/intel-14.0.3/lib/intel64:/sw/rhel6-x64/netcdf/netcdf_c-4.3.2-parallel-impi-intel14/lib:/sw/rhel6-x64/hdf5/hdf5-1.8.14-parallel-impi-intel14/lib:/sw/rhel6-x64/netcdf/parallel_netcdf-1.6.0-impi-intel14/lib:/sw/rhel6-x64/grib_api/grib_api-1.13.0-intel14/lib:/sw/rhel6-x64/sys/libaec-0.3.2-intel14/lib:/home/dkrz/k202069/opt/cdi-x64-linux-intel14-impi/lib:/home/dkrz/k202069/opt/ppm-x64-linux-intel14-impi/lib:/home/dkrz/k202069/opt/yaxt-x64-linux-intel14-impi/lib:/sw/rhel6-x64/hdf4/hdf-4.2.10-intel14/lib:/sw/rhel6-x64/intel/impi/4.1.3.049/intel64/lib:/opt/intel/mpi-rt/4.1

strace excerpt:

open("/home/dkrz/k202069/Documents/work/dkrz/build/cdi-x64-linux-intel-impi/src/.libs/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/home/dkrz/k202069/opt/ppm-x64-linux-intel14-impi/lib/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/home/dkrz/k202069/opt/yaxt-x64-linux-intel14-impi/lib/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/sw/rhel6-x64/grib_api/grib_api-1.13.0-intel14/lib/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/sw/rhel6-x64/netcdf/netcdf_c-4.3.2-parallel-impi-intel14/lib/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/sw/rhel6-x64/hdf4/hdf-4.2.10-intel14/lib/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/sw/rhel6-x64/hdf5/hdf5-1.8.14-parallel-impi-intel14/lib/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/sw/rhel6-x64/sys/libaec-0.3.2-intel14/lib/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat(3, {st_mode=S_IFREG|0644, st_size=54243, ...}) = 0
mmap(NULL, 54243, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f392e366000
close(3)                                = 0
open("/lib64/tls/x86_64/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/lib64/tls/x86_64", 0x7fffb78e3e10) = -1 ENOENT (No such file or directory)
open("/lib64/tls/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/lib64/tls", {st_mode=S_IFDIR|0555, st_size=4096, ...}) = 0
open("/lib64/x86_64/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/lib64/x86_64", 0x7fffb78e3e10)   = -1 ENOENT (No such file or directory)
open("/lib64/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/lib64", {st_mode=S_IFDIR|0555, st_size=12288, ...}) = 0
open("/usr/lib64/tls/x86_64/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/tls/x86_64", 0x7fffb78e3e10) = -1 ENOENT (No such file or directory)
open("/usr/lib64/tls/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/tls", {st_mode=S_IFDIR|0555, st_size=4096, ...}) = 0
open("/usr/lib64/x86_64/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/usr/lib64/x86_64", 0x7fffb78e3e10) = -1 ENOENT (No such file or directory)
open("/usr/lib64/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory)
stat("/usr/lib64", {st_mode=S_IFDIR|0755, st_size=36864, ...}) = 0
munmap(0x7f392e366000, 54243)           = 0
write(2, "[3] ERROR - ADIO_Init(): ", 25) = 25

So one can see that

/sw/rhel6-x64/intel/impi/4.1.3.049/intel64/lib

is not in the list of directories tried, but everything seems to be in place there:

  ls -l /sw/rhel6-x64/intel/impi/4.1.3.049/intel64/lib/libmpi_lustre.so*
lrwxrwxrwx 1 someuser somegroup    20 2014-08-29 13:43:04 /sw/rhel6-x64/intel/impi/4.1.3.049/intel64/lib/libmpi_lustre.so -> libmpi_lustre.so.4.1
lrwxrwxrwx 1 someuser somegroup    20 2014-08-29 13:43:04 /sw/rhel6-x64/intel/impi/4.1.3.049/intel64/lib/libmpi_lustre.so.4.0 -> libmpi_lustre.so.4.1
-rwxrwxr-x 1 someuser somegroup 52279 2014-03-03 09:51:58 /sw/rhel6-x64/intel/impi/4.1.3.049/intel64/lib/libmpi_lustre.so.4.1

So my question is: how can I make Intel MPI try the correct place to load libmpi_lustre.so from?

Regards, Thomas


Viewing all articles
Browse latest Browse all 927

Trending Articles