RESCOMP Archives

March 2006

RESCOMP@LISTSERV.MIAMIOH.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
"Stephen E. Wright" <[log in to unmask]>
Reply To:
Research Computing Support <[log in to unmask]>, Stephen E. Wright
Date:
Sun, 5 Mar 2006 17:17:09 -0500
Content-Type:
text/plain
Parts/Attachments:
text/plain (146 lines)
This time I linked with -limg and -lm in that order.  Also, given Robin's 
comments on environment, I qsub'd the job with -V.  The error is now:

>++ mpirun_ssh -np 2 -hostfile /var/spool/PBS/aux/749.mulnx31 ./mnm3mpi
>/usr/bin/X11/xauth:  error in locking authority file 
>/home/wrightse/.Xauthority
>accept: Resource temporarily unavailable
>Terminating processes..
>read: Success
>/var/spool/PBS/mom_priv/jobs/749.mulnx31.SC: line 14: 18537 Segmentation 
>fault      mpirun_ssh -np 2
>  -hostfile $PBS_NODEFILE ./mnm3mpi

The executable doesn't involve graphics or displays, so why is there an 
Xauth problem?

-- Steve

At 04:43 PM 3/5/2006, Robin wrote:
> > I tried some parallel code over the weekend, but ran into some problems.
> >
> > 1. Suggestion:  'mpicc' isn't understood by my makefile; 'which'
> > identifies
> > it as an alias to 'mpicc.i', which does work in a makefile.  Maybe modules
> > could make this transparent.
>
>I would possibly discuss this w/ all later about this one.
>I believe that we ought to create a symlink for mpicc -> mpicc.i,
>rather than using alias.
>
> > 2. Suggestion:  Loading mpi-topspin via 'modules' should add the mpich
> > include-path to the INCLUDE environment variable, so the user doesn't have
> > to hunt down its location.
>
>
>
> > 3. Problem:  I tried to run the following PBS job:
>
>mpirun_ssh uses SSH behind the scene.
>
>SSH does not pass environment variables.
>
>That is if you do 'module add mrbayes'. Your shell's PATH env will have
>/software/mrbayes/mrbayes-3.1.2. If you do, ssh c1-1 'echo $PATH', all
>other compute nodes (naturally) does not have
>'/software/mrbayes/mrbayes-3.1.2' on its PATH by default (w/o you
>explicitly passing it in).
>
>We have 2 choices:
>1) Ask users to pass in all the appropriate env variables
>2) Use a wrapper script to pass in all the needed details.
>    Yes.. other labs have wrapper scripts (or they could ask the
>    users to pass it on themselves), etc.
>
>    OSC uses mpiexec (as a wrapper script). I've written
>    /usr/local/topspin/mpi/mpich/bin/mpirun_ssh.wrap.sh, just to see
>    if this is workable. If that works and others , we
>    can deploy the wrapper script as mpirun_ssh.
>
>    In all default Linux, PATH almost never includes current directory.
>    For your own compiled apps (not managed via module, and thus
>    PATH), you either need to specify
>
>
>
> >>./mnm3mpi: error while loading shared libraries: libimf.so: cannot open
> >>shared object file: No such
> >>file or directory
>
>I just modified (over the weekend) the startup script so that bash shell
>will also load all the *default* modules.
>
>
> >
> >>#PBS -l nodes=1:ppn=2
> >>#PBS -l walltime=06:00:00
> >>#PBS -j oe
> >>set -x
> >>date
> >>cd $TMPDIR
> >>
> >># copy the executable
> >>cp ${HOME}/mnm/mnm3mpi .
> >>cp ${HOME}/mnm/interval .
> >>
> >>ls
> >>
> >># run the executable
> >>mpirun_ssh -np 2 -hostfile $PBS_NODEFILE mnm3mpi > trial-0000.out
> >>
> >>date
> >>
> >># copy the output
> >>cp ./trial-0000.out ${HOME}/mnm
> >
> > Here's the output file showing an error:
> >
> >>++ date
> >>Sun Mar  5 14:18:54 EST 2006
> >>++ cd /tmp/pbs.738.mulnx31
> >>++ cp /home/wrightse/mnm/mnm3mpi .
> >>++ cp /home/wrightse/mnm/interval .
> >>++ ls
> >>interval
> >>mnm3mpi
> >>++ mpirun_ssh -np 2 -hostfile /var/spool/PBS/aux/738.mulnx31 mnm3mpi
> >>/usr/bin/env: mnm3mpi/usr/bin/env: mnm3mpi: No such file or directory
> >>: No such file or directory
> >>++ date
> >>Sun Mar  5 14:18:54 EST 2006
> >>++ cp ./trial-0000.out /home/wrightse/mnm
> >
> > Suspecting that the path didn't include the '.' directory, I changed the
> > mpirun command to use './mnm3mpi' rather than 'mnm3mpi', which gave me the
> > following error instead:
> >
> >>++ date
> >>Sun Mar  5 14:22:40 EST 2006
> >>++ cd /tmp/pbs.739.mulnx31
> >>++ cp /home/wrightse/mnm/mnm3mpi .
> >>++ cp /home/wrightse/mnm/interval .
> >>++ ls
> >>interval
> >>mnm3mpi
> >>++ mpirun_ssh -np 2 -hostfile /var/spool/PBS/aux/739.mulnx31 ./mnm3mpi
> >>./mnm3mpi: error while loading shared libraries: libimf.so: cannot open
> >>shared object file: No such
> >>file or directory
> >>./mnm3mpi: error while loading shared libraries: libimf.so: cannot open
> >>shared object file: No such
> >>file or directory
> >>++ date
> >>Sun Mar  5 14:22:40 EST 2006
> >>++ cp ./trial-0000.out /home/wrightse/mnm
> >
> > Any ideas what's going wrong here?
> >
> > Steve
> >

_________________________________________
Stephen E. Wright, Associate Professor
Department of Mathematics & Statistics
Miami University, Oxford, OH  45056
ph: (513) 529-1837 , fax: (513) 529-1493

ATOM RSS1 RSS2