Hello,
under which circumstances might we see the following error:
[3] ERROR - ADIO_Init(): Can't load libmpi_lustre.so library: libmpi_lustre.so: cannot open shared object file: No such file or directory [2] ERROR - ADIO_Init(): Can't load libmpi_lustre.so library: libmpi_lustre.so: cannot open shared object file: No such file or directory
This is from a reduced test case where only ranks 2 and 3 out of 0-3 open a file with MPI_File_open. At this point the above message is printed and the job is aborted. We run on RHEL6 x86_64.
When tracing the executable with strace, I can see that it tries to load libmpi_lustre.so from various directories, but not the one that Intel MPI is installed to, which is also part of the executable's RPATH:
$ ../libtool --mode=execute objdump -x pio_write | grep RPATH RPATH /sw/rhel6-x64/intel/intel-14.0.3/lib/intel64:/sw/rhel6-x64/netcdf/netcdf_c-4.3.2-parallel-impi-intel14/lib:/sw/rhel6-x64/hdf5/hdf5-1.8.14-parallel-impi-intel14/lib:/sw/rhel6-x64/netcdf/parallel_netcdf-1.6.0-impi-intel14/lib:/sw/rhel6-x64/grib_api/grib_api-1.13.0-intel14/lib:/sw/rhel6-x64/sys/libaec-0.3.2-intel14/lib:/home/dkrz/k202069/opt/cdi-x64-linux-intel14-impi/lib:/home/dkrz/k202069/opt/ppm-x64-linux-intel14-impi/lib:/home/dkrz/k202069/opt/yaxt-x64-linux-intel14-impi/lib:/sw/rhel6-x64/hdf4/hdf-4.2.10-intel14/lib:/sw/rhel6-x64/intel/impi/4.1.3.049/intel64/lib:/opt/intel/mpi-rt/4.1
strace excerpt:
open("/home/dkrz/k202069/Documents/work/dkrz/build/cdi-x64-linux-intel-impi/src/.libs/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory) open("/home/dkrz/k202069/opt/ppm-x64-linux-intel14-impi/lib/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory) open("/home/dkrz/k202069/opt/yaxt-x64-linux-intel14-impi/lib/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory) open("/sw/rhel6-x64/grib_api/grib_api-1.13.0-intel14/lib/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory) open("/sw/rhel6-x64/netcdf/netcdf_c-4.3.2-parallel-impi-intel14/lib/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory) open("/sw/rhel6-x64/hdf4/hdf-4.2.10-intel14/lib/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory) open("/sw/rhel6-x64/hdf5/hdf5-1.8.14-parallel-impi-intel14/lib/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory) open("/sw/rhel6-x64/sys/libaec-0.3.2-intel14/lib/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY) = 3 fstat(3, {st_mode=S_IFREG|0644, st_size=54243, ...}) = 0 mmap(NULL, 54243, PROT_READ, MAP_PRIVATE, 3, 0) = 0x7f392e366000 close(3) = 0 open("/lib64/tls/x86_64/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory) stat("/lib64/tls/x86_64", 0x7fffb78e3e10) = -1 ENOENT (No such file or directory) open("/lib64/tls/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory) stat("/lib64/tls", {st_mode=S_IFDIR|0555, st_size=4096, ...}) = 0 open("/lib64/x86_64/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory) stat("/lib64/x86_64", 0x7fffb78e3e10) = -1 ENOENT (No such file or directory) open("/lib64/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory) stat("/lib64", {st_mode=S_IFDIR|0555, st_size=12288, ...}) = 0 open("/usr/lib64/tls/x86_64/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory) stat("/usr/lib64/tls/x86_64", 0x7fffb78e3e10) = -1 ENOENT (No such file or directory) open("/usr/lib64/tls/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory) stat("/usr/lib64/tls", {st_mode=S_IFDIR|0555, st_size=4096, ...}) = 0 open("/usr/lib64/x86_64/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory) stat("/usr/lib64/x86_64", 0x7fffb78e3e10) = -1 ENOENT (No such file or directory) open("/usr/lib64/libmpi_lustre.so", O_RDONLY) = -1 ENOENT (No such file or directory) stat("/usr/lib64", {st_mode=S_IFDIR|0755, st_size=36864, ...}) = 0 munmap(0x7f392e366000, 54243) = 0 write(2, "[3] ERROR - ADIO_Init(): ", 25) = 25
So one can see that
/sw/rhel6-x64/intel/impi/4.1.3.049/intel64/lib
is not in the list of directories tried, but everything seems to be in place there:
ls -l /sw/rhel6-x64/intel/impi/4.1.3.049/intel64/lib/libmpi_lustre.so* lrwxrwxrwx 1 someuser somegroup 20 2014-08-29 13:43:04 /sw/rhel6-x64/intel/impi/4.1.3.049/intel64/lib/libmpi_lustre.so -> libmpi_lustre.so.4.1 lrwxrwxrwx 1 someuser somegroup 20 2014-08-29 13:43:04 /sw/rhel6-x64/intel/impi/4.1.3.049/intel64/lib/libmpi_lustre.so.4.0 -> libmpi_lustre.so.4.1 -rwxrwxr-x 1 someuser somegroup 52279 2014-03-03 09:51:58 /sw/rhel6-x64/intel/impi/4.1.3.049/intel64/lib/libmpi_lustre.so.4.1
So my question is: how can I make Intel MPI try the correct place to load libmpi_lustre.so from?
Regards, Thomas