Subject: | |
From: | |
Reply To: | |
Date: | Fri, 10 Mar 2006 10:40:51 -0500 |
Content-Type: | text/plain |
Parts/Attachments: |
|
|
All,
I looked at the epilogue script to clean up left over processes.
We have several choices to make this work:
(I) While writing the epilogue script (which is basically a script to
kill processes). I need to identify which process to kill.
The only unique identifier and reliable identifier I could use is
the PBS Job ID.
I need to check if the process is associated with the Job ID and
then kill it.
As such, I would need to use the wrapper script to transparently
pass PBS Job ID to all processes. That way,
the epilogue script can tell which processes to kill. I just updated
the wrapper script to include PBS Job ID.
(II) mpiexec is designed to have a very tight integration with Torque
(PBS) and supposedly will clean left-over processes up.
mpiexec has been installed at ' /software/mpiexec/0.80.
It compiled fine with PBSPro, however there have been 2 reports on
mpiexec homepage
http://www.osc.edu/~pw/mpiexec/index.php
that it doesn't work well with PBSpro.
Torque's and PBSPro's codebase has forked for some time now. Their
codebase will continue to be more different
as time moves on. Unless that we plan to switch to Torque, I'm not
quite sure if mpiexec is suitable for PBSPro env.
mpiexec's behaviour is dependent on the PBS's libraries (I'm
guessing because it needs all the PBS's lib and source to compile)
(III) Do the PBSPro and MPI Integration differently.
Chapter 10, PBSPro manual.
It does seem to me that:
PBSPro provides alternate mpirun (yet another script/program) for
GigE ethernet that essentially does what my wrapper script to do (for
MPI-Infiniband).
It doesn't provide the alternate mpirun for Infiniband (that would
set the appropriate env variables). Otherwise, we can use their
script/program, and not mine.
(IV) Propose another solution.
In summary, choices are:
1) Use a simple mpirun_ssh wrapper script that would take care of
setting the appropriate.
2) Switch to Torque/mpiexec
3) Explore a different way to do MPI with PBSPro.
4) Propose another solutiion....
I've looked at it and it makes sense to have the wrapper script.
It's what PBSPro's capability, the way we have PBSPRo and MPI.
I looked at other solutions on the web (mpiclean). It would clean up
processes associated with the PBS_JOBID env variable: which would be
setup
via the wrapper script.
I looked at other solution on the web (mpiclean): it's also assuming
that the all the processes across the nodes assume the variable
PBS_JOBID.
So, we need to launch all the MPI sessions with PBS_JOBID (which my
wrapper script for MPI-IB needs to provide).
I've got the solution ready... just waiting on the decisions. It's
just a *simple* script - and it's necessary, because PBSPro doesn't
provide an alternate mpirun (for PBSPRo - MPI integration) for
MPI-Infiniband. Otherwise, we can use PBSPro's wrapper; and not mine.
If this is taking too long to decide, we ought to setup a short
meeting and I'll provide a live job to show why and what is necessary.
Guys.. I look at the width and depth of the solutions available..
Thanks,
--------------
Robin
[log in to unmask]
513-529-1483
"Academia politics is the most vicious precisely because the stake is
so small" - Kissinger
|
|
|