CLEANACCESS Archives

October 2005

CLEANACCESS@LISTSERV.MIAMIOH.EDU

Options: Use Monospaced Font
Show Text Part by Default
Show All Mail Headers

Message: [<< First] [< Prev] [Next >] [Last >>]
Topic: [<< First] [< Prev] [Next >] [Last >>]
Author: [<< First] [< Prev] [Next >] [Last >>]

Print Reply
Subject:
From:
"Rajesh Nair (rajnair)" <[log in to unmask]>
Reply To:
Perfigo SecureSmart and CleanMachines Discussion List <[log in to unmask]>
Date:
Mon, 10 Oct 2005 19:10:08 -0700
Content-Type:
text/plain
Parts/Attachments:
text/plain (117 lines)
Hi Jason,

I am from Cisco and I know about this case.  I don't think this has
anything to do with the upgrade to ver 3.5.5. 

I believe what happened on your CAMs is that the package that you
installed (pgadmin or something similar) modified the database system
tables.  For instance, one of the differences I noticed was that there
were entries in pg_group whereas in a "unmodified" CAM, there are no
entries in the pg_group.  

The package that you installed on the CAM modified NOT ONLY the system
tables but also modified the CCA database (controlsmartdb).  Try the
following on your CAM that is up and running:
# psql -h 127.0.0.1 controlsmartdb postgres
controlsmartdb=# \dp

What you will see is that each of the tables has some additional access
privilege information.  This information, as you can see, has been
written into the controlsmartdb database. 

Hence, what happens is that when the inactive box is rebooted/restarted
and gets the database snapshot from its peer for synchronization, the
database snapshot will contain information about these privileges (e.g.
it will contain instructions to grant read access to <foo_table_name>
for the "ReadOnly" group).  However, such access privilege information
is not valid on this inactive machine because those groups (e.g.
ReadOnly) do not exist for whatever reason.  You will have to consult
the documentation for the package you installed for more information. 

I suspect that the only issue would be the group information and the
entries in the pg_group system table.  However, I am unsure as I don't
know what package was installed and may not be able to comment about the
package because we won't necessarily know how it functions. 

Hence, what the Cisco TAC engineer told you is entirely reasonable from
his/her point-of-view.  Since they don't know what package you
installed, nor could they be expected to know how that package affects
the system, they would not be able to hazard any suggestions.  

I can offer a suggestion - but please note that this is only a
suggestion and may not work at all and should be taken with more than a
pinch of salt because I am totally unfamiliar with the package you tried
to install on the CAM.  :-) Sorry, I have to provide the disclaimer up
front. 

I suspect that if you try to replicate the entries from pg_group on the
working system to the pg_group on the inactive system and then try the
failover, it might work.  If the only issue with the database restore is
that the appropriate pg_group entries (i.e. ReadOnly) are not available,
this might work.  However, you might very well run into other issues
(i.e. other changes to pg_catalog system that I am unaware of at this
point) and this might only be the first one. 

Please let me know how things proceed.

Regards and hope this helps,
-Rajesh.

-----Original Message-----
From: Perfigo SecureSmart and CleanMachines Discussion List
[mailto:[log in to unmask]] On Behalf Of Jason Richardson
Sent: Monday, October 10, 2005 5:49 PM
To: [log in to unmask]
Subject: Problems with database sync between CAMs after upgrade to ver.
3.5.5

Hi all, ever since upgrading our two CAS and CAM servers from 3.5.3.1 to
3.5.5, and the agent to 3.5.8, (the Cisco SE that we trust to give us
good advice was not comfortable with 3.5.6 or 3.5.8 yet), we have been
unable to get our CAMs to sync the database.  We have two for HA, but we
have only been running with our primary since last Wed. AM when we
completed the upgrade.  I've pasted my tech's explanation of the issue
below.  Please let us know if you have experienced the same or anything
like it because we have pretty much exceeded the Cisco L2's knowledge
that has been working with us.  The current status is that the back-up
CAM has been reinstalled, but it will not sync with the primary because
it hangs on a non-existent postgres user group named "read_only".  The
accounts that we created were read only but they have been removed.

TIA,

---
Jason Richardson
Manager, IT Security and Client Development Enterprise Systems Support
Northern Illinois University

 
We had a bit of a meltdown with the backup CAM. We upgraded to version
3.5.5 last Wednesday and after the patch the failover stopped syncing
with the main database. Our upgrade happened at about 5 AM Wednesday
morning and the backup had a copy of the database until 5:11 AM. The
standby was still sending the heartbeat, just the data wasn't in sync. I
had made some changes to the CAMs a while back to allow read only access
to the database, but after the upgrade all the changes had reverted to
original configuration. 
 
What I had done before the upgrade: 
Addedd IP addresses to pg_hba.conf to allow access to the database
Created read-only account so as not to use the admin account. 
 
With these changes, the main and failover were syncing fine until the
upgrade. Thursday I realized that the changes I had made had been
reverted to defaults so I added them back in. After doing so, I was able
to read the data in the backup and noticed that there was no data since
5:11 AM Wednesday morning. 
 
Our Network Engineers contacted Cisco and were told that because of what
I had done, they were unable to help and therefore need to re-install
the standby. This is where we are now. 
 
I would really like to know what may have caused this loss of
communication between the databases. I'm fairly positive the changes I
made would not have done it as it was syncing fine after I had made
those and the problem arose after the upgrade which set it to defaults.
 

ATOM RSS1 RSS2