2008-11-21 21:08:00
If one or more of the replicas are out of sync login attempts by users may fail, assuming that the BoKS client on the server in question was looking at the out-of-sync BoKS replica. Other nasty stuff may also occur.
Standard procedure is to follow these steps:
All commands are run in a BoKS shell, on the master server unless specified otherwise.
# /opt/boksm/sbin/boksadm -S boksdiag list
Since last pckt
The amount of minutes/seconds since the BoKS master
last sent a communication packet to the respective
replica server. This amount should never exceed more
than a couple of minutes.
Since last fail
The amount of days/hours/minutes since the BoKS
master was last unable to update the database on the
respective replica server. If an amount of a couple of
hours is listed you'll know that the replica server had a
recent failure.
Since last sync
Shows the amount of days/hours/minutes since BoKS last
sent a database update to the respective replica server.
Last status
Yes indeed! The last known status of the replica server in
question. OK means that the server is running perfectly
and that updates are received. Loading means that the
server was just restarted and is still loading the database
or any updates. Down indicates that the replica server is
down or even dead.
This should be pretty self-explanatory. Read the /var/opt/boksm/boks_errlog file on both the master and the replicas to see if you can detect any errors there. If the log file doesn't mention something about the hosts involved you should be able to find the cause of the problem pretty quickly.
Keon> boksdiag download -force $hostname
This will push a database update to the replica. Perform another boksdiag list to see if it worked. Re-read the BoKS error log file to see if things have cleared up.
Keon> ps -ef | grep -i drainmast
This should show two drainmast processes running. If there aren't you should see errors about this in the error logs and in Tivoli.
Keon> Boot -k
Keon> ps -ef | grep -i boks (kill any remaining BoKS processes)
Keon> Boot
Check to see if the two drainmast processes stay up. Keep checking for at least two minutes. If one of them crashes again, try the following:
Check to see that /opt/boksm/lib/boks_drainmast is still linked to boks_drainmast_d, which should be in the same directory. Also check to see that boks_drainmast_d is still the same file as boks_drainmast_d.nonstripped.
If it isn't, copy boks_drainmast_d to boks_drainmast_d.orig and then copy the non-stripped version over the boks_drainmast_d. This will allow you to create a core file which is useful to TFS Technology.
Keon> Boot -k
Keon> Boot
Keon> ls -al /core
Check that the core file was just created by boks_drainmast_d.
Keon> Boot -k
Keon> cd /var/opt/boksm/data
Keon> tar -cvf masterspool.tar master_spool
Keon> rm master_spool/*
Keon> Boot
Things should now be back to normal. Send both the tar file and the core file to TFS Technology (support@tfstech.com).
Keon> boksdiag fque -master
If any messages are stuck there is most likely still something wrong with the drainmast processes. You may want to try and reboot the BoKS master software. Do NOT reboot the master server! Reboot the software using the Boot command. If that doesn't help, perform the troubleshooting tips from step 4.
Verify that the BoKS communication between the master and the replica itself is up and running.
Keon> cadm -l -f bcastaddr -h $replica.
If this doesn't work, re-check the error logs on the client and proceed with step 7.
On the replica system run:
Keon> hostkey
Take the output from that command and run the following on the master:
Keon> dumpbase | grep $hostkey
If this doesn't return the configuration for the replica server, the keys have become unsynchronized. If you make any changes you will need to restart the BoKS processes, using the Boot command.
Keon> dumpbase | grep RNAME | grep $replica
The TYPE field in the definition of the replica should be set to 261. Anything else is wrong, so you need to update the configuration in the BoKS database. Either that or have SecOPS do it for you.
On the replica system, review the settings in /etc/opt/boksm/ENV.
If all of the above fails you should really get cracking with the debugger. Refer to the appropriate chapter of this manual for details.
kilala.nl tags: boks, sysadmin,
View or add comments (curr. 0)
All content, with exception of "borrowed" blogpost images, or unless otherwise indicated, is copyright of Tess Sluijter. The character Kilala the cat-demon is copyright of Rumiko Takahashi and used here without permission.