CLEANACCESS Archives

October 2005

CLEANACCESS@LISTSERV.MIAMIOH.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
Michael Grinnell <[log in to unmask]>
Reply To:
Perfigo SecureSmart and CleanMachines Discussion List <[log in to unmask]>
Date:
Mon, 10 Oct 2005 22:26:46 -0400
Content-Type:
text/plain
Parts/Attachments:
text/plain (100 lines)
Jason,

I'm very worried by your email as we're planning on upgrading to  
3.5.6 tomorrow, and we've made very similar modifications to our  
managers.  We've had other odd issues with manager failover in the  
past (and currently), but nothing quite like what you describe.  In  
troubleshooting it with TAC, they sent me this:

... messages similar to "SEVERE: Failed to execute SQL on remote  
database" will appear in /perfigo/logs/perfigo-log0.log.0 file if  
synchronization fails.

Do you see anything there?

Also, I've had problems in the past when CCA (either for an upgrade  
or some other purpose) needed to drop the database and recreate it  
with a new schema due to additional users and permissions set on  
user_info, etc. In these cases, however, the error messages from  
postgres have always been pretty clear about what the problem was,  
and would generally refuse to allow the CCA script to run until I  
fixed it.  I take it from your description that your problem is a  
little more advanced than that, but just in case,

1) I think you said it was, but have you verified that the database  
was created successfully on the problem manager?
2) I think you said that you removed the read-only user from both  
managers, is this right?
3) Have you verified that all of the "grant foo on bar to read-only"  
permissions have been removed on both managers?

Thanks,

Michael Grinnell
Network Security Administrator
The American University
e-mail: [log in to unmask]

On Oct 10, 2005, at 8:48 PM, Jason Richardson wrote:

> Hi all, ever since upgrading our two CAS and CAM servers from  
> 3.5.3.1 to
> 3.5.5, and the agent to 3.5.8, (the Cisco SE that we trust to give us
> good advice was not comfortable with 3.5.6 or 3.5.8 yet), we have been
> unable to get our CAMs to sync the database.  We have two for HA,  
> but we
> have only been running with our primary since last Wed. AM when we
> completed the upgrade.  I've pasted my tech's explanation of the issue
> below.  Please let us know if you have experienced the same or  
> anything
> like it because we have pretty much exceeded the Cisco L2's knowledge
> that has been working with us.  The current status is that the back-up
> CAM has been reinstalled, but it will not sync with the primary  
> because
> it hangs on a non-existent postgres user group named "read_only".  The
> accounts that we created were read only but they have been removed.
>
> TIA,
>
> ---
> Jason Richardson
> Manager, IT Security and Client Development
> Enterprise Systems Support
> Northern Illinois University
>
>
> We had a bit of a meltdown with the backup CAM. We upgraded to version
> 3.5.5 last Wednesday and after the patch the failover stopped syncing
> with the main database. Our upgrade happened at about 5 AM Wednesday
> morning and the backup had a copy of the database until 5:11 AM. The
> standby was still sending the heartbeat, just the data wasn't in  
> sync. I
> had made some changes to the CAMs a while back to allow read only  
> access
> to the database, but after the upgrade all the changes had reverted to
> original configuration.
>
> What I had done before the upgrade:
> Addedd IP addresses to pg_hba.conf to allow access to the database
> Created read-only account so as not to use the admin account.
>
> With these changes, the main and failover were syncing fine until the
> upgrade. Thursday I realized that the changes I had made had been
> reverted to defaults so I added them back in. After doing so, I was  
> able
> to read the data in the backup and noticed that there was no data  
> since
> 5:11 AM Wednesday morning.
>
> Our Network Engineers contacted Cisco and were told that because of
> what I had done, they were unable to help and therefore need to
> re-install the standby. This is where we are now.
>
> I would really like to know what may have caused this loss of
> communication between the databases. I'm fairly positive the changes I
> made would not have done it as it was syncing fine after I had made
> those and the problem arose after the upgrade which set it to  
> defaults.
>
>

ATOM RSS1 RSS2