RESCOMP Archives

February 2006

RESCOMP@LISTSERV.MIAMIOH.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Xudong Yu <[log in to unmask]>
Reply To:
Research Computing Support <[log in to unmask]>, Xudong Yu <[log in to unmask]>
Date:
Fri, 17 Feb 2006 16:59:23 -0500
Content-Type:
text/plain
Parts/Attachments:
text/plain (30 lines)
Hi Robin:
	From the host list, I found that both 4-2 and 4-19 is down.
	And also, when I try to launch the simple cpi job at /home/xudong/hpl/bin/Linux_ATHLON_VSIPL like:

/usr/local/topspin/mpi/mpich/bin/mpirun_ssh -np 81 -paramfile paramfile -hostfile list1 ./cpi

	I have seen that error message like:

6] Abort: Got an asynchronous event: VAPI_PORT_ERROR (VAPI_EV_SYNDROME_NONE) at line 362 in file mpid/vapi/viainit.c

	come out, usually this indicate a bad infiniband card or something wrong inside infiniband switcher.

	I have a list for the bad node for my 80 nodes testing, and the list is growing after that, this is a incomplete list:

3-1
3-2
3-7
3-8
3-9
3-15
3-19
3-21
3-22
	The bad host list is really too big, so I thought maybe something wrong in the switcher, we need to make sure
all the infiniband card operate well before we goto any linpack testing.

thanks

xudong

ATOM RSS1 RSS2