RESCOMP Archives

March 2006

RESCOMP@LISTSERV.MIAMIOH.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Reply To:
Research Computing Support <[log in to unmask]>, Robin <[log in to unmask]>
Date:
Fri, 10 Mar 2006 10:40:51 -0500
Content-Type:
text/plain
Parts/Attachments:
text/plain (92 lines)
All,

I looked at the epilogue script to clean up left over processes.

We have several choices to make this work:

(I) While writing the epilogue script (which is basically a script to  
kill processes). I need to identify which process to kill.

	The only unique identifier and reliable identifier I could use is  
the PBS Job ID.
	I need to check if the process is associated with the Job ID and  
then kill it.

	As such, I would need to use the wrapper script to transparently  
pass PBS Job ID to all processes. That way,
	the epilogue script can tell which processes to kill. I just updated  
the wrapper script to include PBS Job ID.

(II) mpiexec is designed to have a very tight integration with Torque  
(PBS) and supposedly will clean left-over processes up.

	mpiexec has been installed at ' /software/mpiexec/0.80.

	It compiled fine with PBSPro, however there have been 2 reports on  
mpiexec homepage
	http://www.osc.edu/~pw/mpiexec/index.php
	that it doesn't work well with PBSpro.
	
	Torque's and PBSPro's codebase has forked for some time now. Their  
codebase will continue to be more different
	as time moves on. Unless that we plan to switch to Torque, I'm not  
quite sure if mpiexec is suitable for PBSPro env.
	
	mpiexec's behaviour is dependent on the PBS's libraries (I'm  
guessing because it needs all the PBS's lib and source to compile)

(III) Do the PBSPro and MPI Integration differently.
	
	Chapter 10, PBSPro manual.

	It does seem to me that:
	PBSPro provides alternate mpirun (yet another script/program) for  
GigE ethernet that essentially does what my wrapper script to do (for  
MPI-Infiniband).

	It doesn't provide the alternate mpirun for Infiniband (that would  
set the appropriate env variables). Otherwise, we can use their  
script/program, and not mine.

(IV) Propose another solution.

In summary, choices are:
1) Use a simple mpirun_ssh wrapper script that would take care of  
setting the appropriate.
2) Switch to Torque/mpiexec
3) Explore a different way to do MPI with PBSPro.
4) Propose another solutiion....

	I've looked at it and it makes sense to have the wrapper script.   
It's what PBSPro's capability, the way we have PBSPRo and MPI.

	I looked at other solutions on the web (mpiclean). It would clean up  
processes associated with the PBS_JOBID env variable: which would be  
setup
via the wrapper script.

	I looked at other solution on the web (mpiclean): it's also assuming  
that the all the processes across the nodes assume the variable  
PBS_JOBID.
So, we need to launch all the MPI sessions with PBS_JOBID (which my  
wrapper script for MPI-IB needs to provide).

	I've got the solution ready... just waiting on the decisions.  It's  
just a *simple* script - and it's necessary, because PBSPro doesn't  
provide an alternate mpirun (for PBSPRo - MPI integration) for
MPI-Infiniband. Otherwise, we can use PBSPro's wrapper; and not mine.

	If this is taking too long to decide, we ought to setup a short  
meeting and I'll provide a live job to show why and what is necessary.

	Guys.. I look at the width and depth of the solutions available..

Thanks,
--------------
Robin
[log in to unmask]
513-529-1483

"Academia politics is the most vicious precisely because the stake is  
so small" - Kissinger

ATOM RSS1 RSS2