2009-08-04 22:01:00
The BoKS infrastructure is pretty much rock solid and will not let you down under normal circumstances. However, "normal" doesn't always happen so it's good to prepare for a disaster. What happens if you lose a replica or two? What happens if the BoKS master server itself is dead? It pays to come prepared!
Luckily BoKS replica servers are pretty expendable. One needs at least one replica server per physical location, though it pays to have more than one. Moreover you may want to have a replica per section of your network.
By having a good amount of replica servers you won't be caught off guard by a network failure. Having a set of replicas per data center ensures that all your hosts will remain funcional, even if your WAN connections die. And having a replica per network section will allow you to keep operating, despite failure of backbone routers and such.
Should you ever feel the need to add more replica servers, then you can take the following step to create new ones. However, keep in mind that you'll need to be able to communicate with the master server, so this won't do you any good if the network's already dead.
First, modify the host record of your targeted client system through the BoKS GUI. Change the host type from UNIXBOKSHOST to BOKSREPLICA. Then, on the client system perform the following commands.
# /opt/boksm/sbin/boksadm -S
BoKS> vi $BOKS_etc/ENV #set SHM_SIZE to 16000
BoKS> convert -v server
Stopping daemons...
Setting BOKSINIT=server in ENV file...
Restarting daemons...
Conversion from client to replica done.
BoKS> Boot -k
BoKS> Boot
Finally, also restart the BoKS master software. Running "boksdiag list" should now show the new replica server, which is probably still loading its copy of the database.
Without a working master server the BoKS infrastructure will keep on functioning. However, it is impossible to make any changes to the database and thus it's a good idea to restore your master as soon as possible. It's a good idea to promote a replica to master status if you think it'll take you more than a few hours (a day?) to fix the server.
Log in to your chosen replica and perform the following actions. Start off by checking the boks_errlog file to see if the replica itself isn't broken.
$ /opt/boksm/sbin/boksadm -S
BoKS> tail -30 /var/opt/boksm/boks_errlog
...
...
BoKS> convert –v master
Stopping daemons...
Setting BOKSINIT=master in ENV file...
Restarting daemons...
Conversion from replica to master done.
BoKS> boksdiag list
SERVER SINCE LAST SINCE LAST SINCE LAST COUNT LAST
REPHOST5 00:49 523D 5:19:20 04:49 1853521 OK
REPHOST4 00:49 136D 22:21:35 04:49 526392 OK
REPHOST3 00:49 04:50 726768 OK
REPHOST2 00:49 107D 5:05:33 04:49 425231 OK
REPHOST 02:59 02:13 11:44 148342 DOWN
BoKS> boksdiag sequence
...
T7 13678d 8:33:46 5053 (5053)
...
T9 13178d 11:05:23 7919 (7919)
...
T15 13178d 11:03:16 1865 (1865)
...
Now log in to the remaining replica servers and compare the output of the "boksdiag sequence" commands. Alternatively you can run the check_boks_replication script to automate the process. Either way, none of the replicas should either be ahead of the new master, nor should it lag too far behind. If you do find that the replication is broken we'll need to proceed with troubleshooting.
Assuming that you will not be using your new master server permanently you will want to go back to your original BoKS master at some point in time. Let's assume that you've repaired whatever damage there was and that the system is now ready to resume its duty.
It's crucial that the original master be converted to a client system before booting it up fully. Perform the following in single user mode.
$ /opt/boksm/sbin/boksadm -S
BoKS> convert –v client
Stopping daemons...
Setting BOKSINIT=client in ENV file...
Restarting daemons...
Conversion from master to client done.
BoKS> cd /var/opt/boksm/data
BoKS> rm *.dat
BoKS> rm sequence
You may now boot the original master server into multi-user mode and let it rejoin the BoKS infrastructure as a client. Afterwards, convert it into a replica server per the instructions in the first paragraph of this page.
Once the original master server has become a fully functioning replica server you may start thinking about dismantling the temporary master. This process will actually be quite similar to what we've done before. Basically you:
kilala.nl tags: boks, sysadmin,
View or add comments (curr. 0)
All content, with exception of "borrowed" blogpost images, or unless otherwise indicated, is copyright of Tess Sluijter. The character Kilala the cat-demon is copyright of Rumiko Takahashi and used here without permission.