LISTSERV - CLEANACCESS Archives - LISTSERV.MIAMIOH.EDU

CLEANACCESS Archives

April 2009

CLEANACCESS@LISTSERV.MIAMIOH.EDU

LISTSERV Archives

CLEANACCESS Home

CLEANACCESS April 2009

Subscribe or Unsubscribe

Search Archives

Options:

Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message:

[<< First] [< Prev] [Next >] [Last >>]

Topic:

[<< First] [< Prev] [Next >] [Last >>]

Author:

[<< First] [< Prev] [Next >] [Last >>]

Subject:

Re: CAM GUI unavailable, HA not detecting the failure

From:

Jeremy Wood <[log in to unmask]>

Reply To:

Cisco Clean Access Users and Administrators <[log in to unmask]>

Date:

Tue, 14 Apr 2009 16:13:58 -0400

Content-Type:

text/plain

Parts/Attachments:

text/plain (52 lines)

Actually I have seen that too, we have a pair of CAMs and 2 pairs of
CASs (Inband and OOB). It's only happened twice and each time I
noticed that the java process was at 99.9%. Since then I wrote a
little shell script and dropped it into cron.weekly.

<SCRIPT>
#!/bin/bash

# First, check that the DB is running...
BASEDB="psql -h 127.0.0.1 controlsmartdb postgres -A -q -t"

${BASEDB} -c "" >& /tmp/dbtest;
TEMP=`wc -l /tmp/dbtest | awk -F ' ' '{print $1}'`

if [ ! "0" == "${TEMP}" ]; then
   rm -f /tmp/dbtest >& /dev/null
    exit -1
fi

/perfigo/common/bin/fostate.sh > /tmp/fostate

# DB is running, are we active?
if grep -q ^My\ node\ is\ active /tmp/fostate
        then su - postgres -c "vacuumdb -h 127.0.0.1 -a -f"
        else exit
fi
</SCRIPT>

I wasn't sure if doing this on the standby DB was a good thing so I
had to check fostate first. If you're not running HA you really just
need the "su" line in a file. I borrowed the DB check from the nightly
backup cron too...thanks Cisco :)

--Jeremy

On Mon, Apr 13, 2009 at 16:19, Matt Grover <[log in to unmask]> wrote:
> We are running into an issue where the active CAM becomes unavailable and HA
> does not failover to the standby CAM.  We get notice of the problem because
> users are no longer able to login through their CAS server.  They get the
> login page but they are unable to get a successful login.  Behind the scenes
> it is occurring because the active CAM is no longer processing the
> authentication requests.  Once we receive this problem report attempts to
> open the web GUI on the service IP and direct IP of the active CAM fail.
> For fail over to occur we have to manually cause the failover to happen,
> usually by dropping the network connection on the active(but faulty) CAM.  A
> service restart on the faulty CAM will restore it to normal operation.  So
> far logs have failed to show anything of merit regarding the failure.  This
> has occurred under 4.5.0 and 4.5.1.
>
> Has anyone seen this type of issue?
>

ATOM RSS1 RSS2

LISTSERV.MIAMIOH.EDU