RESCOMP Archives

February 2006

RESCOMP@LISTSERV.MIAMIOH.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Reply To:
Research Computing Support <[log in to unmask]>, Robin <[log in to unmask]>
Date:
Sun, 12 Feb 2006 12:43:09 -0500
Content-Type:
text/plain
Parts/Attachments:
text/plain (39 lines)
Jaime E. Combariza wrote:

>1 - The head node has conflicts with the openmanage software (Dell
>software that runs on the head node and monitors hardware). We can
>'uninstall' this package but we have not gotten an 'okay' from Dell. Dell
>has been notified and is consulting with their support team and are
>supposed to get back to us as soon as possible. If not done by Monday noon
>I will contact Dell one more time.
>  
>
Just venting..

Time after time, in many cases, be it Dell or Platform they just failed 
to identify
whatever problems there are. In the end, Miami is the one who isolated 
where the root of the problems are.

Dell/Platform failed to recognize roll issues with Intel compiler. 
Recalling what Dell/Platform said about the crashes,
they may as well say "the sun rises in the east and sets in the west".

We did the harder part. They just stand there receiving the results of 
Miami's investigation.

>Possible solutions (not tested with several users on the system).
>
>1 - do not run any openmp jobs.
>2 - do NOT kill any running jobs (on the head node) may not be easy to
>implement. I have killed several serial jobs and the head node did not
>crash but I will not trust this node.
>3 - compile and run (interactively) tests on compute nodes. However, batch
>jobs need to be submitted from the head node.
>  
>
All the crashes seem to point to the very existence of OpenManage.

Just venting,
Robin

ATOM RSS1 RSS2