RESCOMP Archives

March 2006

RESCOMP@LISTSERV.MIAMIOH.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Reply To:
Research Computing Support <[log in to unmask]>, Robin <[log in to unmask]>
Date:
Sun, 5 Mar 2006 18:50:50 -0500
Content-Type:
text/plain
Parts/Attachments:
text/plain (133 lines)
Stephen Wright wrote:

>I Googled libimf.so; the first dozen hits were about this particular
>problem.  Apparently the Intel 9.0 compilers specify libimf.so after
>libm.so when linking, whereas it should be the other way around. Various
>fixes are recommended, such as using Intel 8.1 compilers instead or
>explicitly specifying the order of -limf and -lm on the command line. I'll
>try again, see how it works, and let you know.
>
>Maybe the sysadmin can fix this in the compiler configuration?
>  
>
This is related to the LD_LIBRARY_PATH not being passed by SSH during a 
non-interactive pasword-less SSH (mpirun_ssh).

I modified bash startup script so that all default modules is loaded by 
passwd-less *non-interactive* SSH session (as well).
This will make all your apps (that is dynamicly linked) to use Intel 9's 
lib - since Intel 9 is the default module for everyone.
This will get rid of the error message and makes all your apps link to 
Intel 9's lib. You won't see this error message again.
We had very brief discussion on the issue on Thursday/Friday.

The core issue here is to make mpirun_ssh 'pass' the appropriate PATH, 
LD_LIBRARY_PATH so that it's consistent w/ Modules.
This is so that not just the *default* PATH and LD_LIBRARY_PATH got used.

This is after talking to 2 different Cisco guys, Dave, and thinking 
about the issues for a while. This is about mantaining the consistency
between env variables (PATH, LD_LIBRARY_PATH) and our 'Modules' setup.
If we were to not unnecessarily burden users, we ought to have a wrapper 
script (/usr/local/topspin/mpi/mpich/bin/mpirun_ssh.wrap.sh).
If you use the wrapper script, it will make SSH pass in appropriate PATH 
and LD_LIBRARY_PATH (in a way that is consistent w/ Module) for you.
This is about setting a sane framework for handling *multiple* versions 
of apps and multiple versions of lib - what Module is for.

When all (agree on the problem - it's rather subtle) and agree on the 
solution , we can move the wrapper script to mpirun_ssh so that it's 
transparent to users.
Otherwise, we should move on to a different solution.

I accidentally hit 'Send' on the previous email; please pardon 
unfinished sentences. By default, Linux does not include current directory
in PATH env.

Thanks,
Robin


>- Seve
>
>  
>
>>3. Problem:  I tried to run the following PBS job:
>>
>>    
>>
>>>#PBS -l nodes=1:ppn=2
>>>#PBS -l walltime=06:00:00
>>>#PBS -j oe
>>>set -x
>>>date
>>>cd $TMPDIR
>>>
>>># copy the executable
>>>cp ${HOME}/mnm/mnm3mpi .
>>>cp ${HOME}/mnm/interval .
>>>
>>>ls
>>>
>>># run the executable
>>>mpirun_ssh -np 2 -hostfile $PBS_NODEFILE mnm3mpi > trial-0000.out
>>>
>>>date
>>>
>>># copy the output
>>>cp ./trial-0000.out ${HOME}/mnm
>>>      
>>>
>>Here's the output file showing an error:
>>
>>    
>>
>>>++ date
>>>Sun Mar  5 14:18:54 EST 2006
>>>++ cd /tmp/pbs.738.mulnx31
>>>++ cp /home/wrightse/mnm/mnm3mpi .
>>>++ cp /home/wrightse/mnm/interval .
>>>++ ls
>>>interval
>>>mnm3mpi
>>>++ mpirun_ssh -np 2 -hostfile /var/spool/PBS/aux/738.mulnx31 mnm3mpi
>>>/usr/bin/env: mnm3mpi/usr/bin/env: mnm3mpi: No such file or directory
>>>: No such file or directory
>>>++ date
>>>Sun Mar  5 14:18:54 EST 2006
>>>++ cp ./trial-0000.out /home/wrightse/mnm
>>>      
>>>
>>Suspecting that the path didn't include the '.' directory, I changed the
>>mpirun command to use './mnm3mpi' rather than 'mnm3mpi', which gave me the
>>following error instead:
>>
>>    
>>
>>>++ date
>>>Sun Mar  5 14:22:40 EST 2006
>>>++ cd /tmp/pbs.739.mulnx31
>>>++ cp /home/wrightse/mnm/mnm3mpi .
>>>++ cp /home/wrightse/mnm/interval .
>>>++ ls
>>>interval
>>>mnm3mpi
>>>++ mpirun_ssh -np 2 -hostfile /var/spool/PBS/aux/739.mulnx31 ./mnm3mpi
>>>./mnm3mpi: error while loading shared libraries: libimf.so: cannot open
>>>shared object file: No such
>>>file or directory
>>>./mnm3mpi: error while loading shared libraries: libimf.so: cannot open
>>>shared object file: No such
>>>file or directory
>>>++ date
>>>Sun Mar  5 14:22:40 EST 2006
>>>++ cp ./trial-0000.out /home/wrightse/mnm
>>>      
>>>
>>Any ideas what's going wrong here?
>>
>>Steve
>>
>>    
>>

ATOM RSS1 RSS2