Subject: | |
From: | |
Reply To: | |
Date: | Sun, 12 Feb 2006 12:43:09 -0500 |
Content-Type: | text/plain |
Parts/Attachments: |
|
|
Jaime E. Combariza wrote:
>1 - The head node has conflicts with the openmanage software (Dell
>software that runs on the head node and monitors hardware). We can
>'uninstall' this package but we have not gotten an 'okay' from Dell. Dell
>has been notified and is consulting with their support team and are
>supposed to get back to us as soon as possible. If not done by Monday noon
>I will contact Dell one more time.
>
>
Just venting..
Time after time, in many cases, be it Dell or Platform they just failed
to identify
whatever problems there are. In the end, Miami is the one who isolated
where the root of the problems are.
Dell/Platform failed to recognize roll issues with Intel compiler.
Recalling what Dell/Platform said about the crashes,
they may as well say "the sun rises in the east and sets in the west".
We did the harder part. They just stand there receiving the results of
Miami's investigation.
>Possible solutions (not tested with several users on the system).
>
>1 - do not run any openmp jobs.
>2 - do NOT kill any running jobs (on the head node) may not be easy to
>implement. I have killed several serial jobs and the head node did not
>crash but I will not trust this node.
>3 - compile and run (interactively) tests on compute nodes. However, batch
>jobs need to be submitted from the head node.
>
>
All the crashes seem to point to the very existence of OpenManage.
Just venting,
Robin
|
|
|