RESCOMP Archives

December 2006

RESCOMP@LISTSERV.MIAMIOH.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
jaime combariza <[log in to unmask]>
Reply To:
Research Computing Support <[log in to unmask]>, jaime combariza <[log in to unmask]>
Date:
Mon, 18 Dec 2006 08:52:48 -0500
Content-Type:
text/plain
Parts/Attachments:
text/plain (68 lines)
Henry:

Two things:

1 - at the end of your script you have a "~". Please delete it. It is 
giving you the /home/wanx is a directory message

2 - A Signal 7 (Bus Error) is usually a memory error (Bad memory 
Access). It was given by process 70 which was possibly
in rack 2. Since this error is not consistent it is difficult to say 
if the program is trying to access more memory than is
available or if there is a problem with memory at some nodes. I am 
cc'ing the rescomp group so they can run some tests.

Frank, could you check if we are having memory problems at some 
nodes? I am guessing rack 2. Henry was running
a parallel job with 128 processes. Most of the nodes on rack one were 
empty so it probably used nodes c-1-1 to c2-?
(c1-2 and possibly c1-1 were being used).



Thanks



At 11:33 AM 12/16/2006, you wrote:
>Dear Jaime:
>
>Most time my jobs can go through. But sometimes I may have the 
>following errors:
>
>=============================================================
>mpiexec: Warning: tasks 0-69,71-127 died with signal 9 (Killed).
>mpiexec: Warning: task 70 died with signal 7 (Bus error).
>
>real    0m2.310s
>user    0m0.007s
>sys     0m0.031s
>/var/spool/PBS/mom_priv/jobs/20969.mulnx37.SC: line 14: /home/wanx: 
>is a directory
>
>=============================================================
>
>mpiexec: Warning: tasks 0-67,69-127 died with signal 9 (Killed).
>mpiexec: Warning: task 68 died with signal 7 (Bus error).
>
>real    0m5.974s
>user    0m0.009s
>sys     0m0.050s
>/var/spool/PBS/mom_priv/jobs/20971.mulnx37.SC: line 14: /home/wanx: 
>is a directory
>=============================================================
>
>I am not sure whether this is common or not.
>
>Best,
>
>-Henry


_______
Jaime E. Combariza, Ph.D.
Assistant Director Research Computing
http://www.muohio.edu/researchcomputing
Miami University
(513) 529-5080  

ATOM RSS1 RSS2