Subject: | |
From: | |
Reply To: | |
Date: | Fri, 17 Feb 2006 18:18:54 -0500 |
Content-Type: | text/plain |
Parts/Attachments: |
|
|
That is a good test. I must admit I have not used rack 3 at all.
I ran the cpi test code on rack 1 (c1-1 to c1-32, c2-1 .. c2-32, and
c4-1 ... 4-32, except c4-2 and c4-19) and it ran fine. It did not run
on rack 3.
You can see the log files in ~combarje/CPI CPI-3 CPI-4
And just after Steve left !!!!!!!!!!@@@@@@@@@$$$$$$$$
> Hi Robin:
> From the host list, I found that both 4-2 and 4-19 is down.
> And also, when I try to launch the simple cpi job at
> /home/xudong/hpl/bin/Linux_ATHLON_VSIPL like:
>
> /usr/local/topspin/mpi/mpich/bin/mpirun_ssh -np 81 -paramfile paramfile
> -hostfile list1 ./cpi
>
> I have seen that error message like:
>
> 6] Abort: Got an asynchronous event: VAPI_PORT_ERROR
> (VAPI_EV_SYNDROME_NONE) at line 362 in file mpid/vapi/viainit.c
>
> come out, usually this indicate a bad infiniband card or something wrong
> inside infiniband switcher.
>
> I have a list for the bad node for my 80 nodes testing, and the list is
> growing after that, this is a incomplete list:
>
> 3-1
> 3-2
> 3-7
> 3-8
> 3-9
> 3-15
> 3-19
> 3-21
> 3-22
> The bad host list is really too big, so I thought maybe something wrong
> in the switcher, we need to make sure
> all the infiniband card operate well before we goto any linpack testing.
>
> thanks
>
> xudong
>
--
Jaime E. Combariza
Assistant Director Research Computing
Academic Technology Services
[log in to unmask]
(513) 529-5080
Miami University
Oxford, Ohio 45056
|
|
|