RESCOMP Archives

February 2006

RESCOMP@LISTSERV.MIAMIOH.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
"Jaime E. Combariza" <[log in to unmask]>
Reply To:
Research Computing Support <[log in to unmask]>, Jaime E. Combariza
Date:
Fri, 17 Feb 2006 18:18:54 -0500
Content-Type:
text/plain
Parts/Attachments:
text/plain (57 lines)
That is a good test. I must admit I have not used rack 3 at all.
I ran the cpi test code on rack 1 (c1-1 to c1-32, c2-1 .. c2-32, and
c4-1 ... 4-32, except c4-2 and c4-19) and it ran fine. It did not run
on rack 3.

You can see the log files in ~combarje/CPI CPI-3 CPI-4

And just after Steve left !!!!!!!!!!@@@@@@@@@$$$$$$$$


> Hi Robin:
> 	From the host list, I found that both 4-2 and 4-19 is down.
> 	And also, when I try to launch the simple cpi job at
> /home/xudong/hpl/bin/Linux_ATHLON_VSIPL like:
>
> /usr/local/topspin/mpi/mpich/bin/mpirun_ssh -np 81 -paramfile paramfile
> -hostfile list1 ./cpi
>
> 	I have seen that error message like:
>
> 6] Abort: Got an asynchronous event: VAPI_PORT_ERROR
> (VAPI_EV_SYNDROME_NONE) at line 362 in file mpid/vapi/viainit.c
>
> 	come out, usually this indicate a bad infiniband card or something wrong
> inside infiniband switcher.
>
> 	I have a list for the bad node for my 80 nodes testing, and the list is
> growing after that, this is a incomplete list:
>
> 3-1
> 3-2
> 3-7
> 3-8
> 3-9
> 3-15
> 3-19
> 3-21
> 3-22
> 	The bad host list is really too big, so I thought maybe something wrong
> in the switcher, we need to make sure
> all the infiniband card operate well before we goto any linpack testing.
>
> thanks
>
> xudong
>


-- 
Jaime E. Combariza
Assistant Director Research Computing
Academic Technology Services
[log in to unmask]
(513) 529-5080
Miami University
Oxford, Ohio 45056

ATOM RSS1 RSS2