BoKS troubleshooting: login and communications issues on the client

2008-11-21 21:04:00

The basics: verifying the proper functioning of a BoKS client

These easy steps will show you whether your new client is working like it should.

  1. Check the boks_errlog in $BOKS_var.
  2. Run cadm -l -f bcastaddr -h $client from the BoKS master (in a BoKS shell).
  3. Try to login to the new client.

If all three steps go through without error your systems is as healthy as a very healthy good thing... or something.



You can't log in to a BoKS client

Most obviously we can't do our work on that particular server and neither can our customers. Naturally this is something that needs to be fixed quite urgently!

  1. Check BoKS transaction log.
  2. Check if you can log in.
  3. Check BoKS communications
  4. Check bcastaddr and bremotever files.
  5. Check BoKS port number.
  6. Check node keys
  7. Check BoKS error logs.
  8. Debug servc process on replica server or relevant process on client.

All commands are run in a BoKS shell, on the master server unless specified otherwise.



1. Check BoKS transaction log.

Keon> cd /var/opt/boksm/data Keon> grep $user LOG | bkslog -f - -wn

This should give you enough output to ascertain why a certain user cannot login. If there is no output at all, do the following:

Keon> cd /var/junkyard/bokslogs Keon> for file in `ls -lrt | tail -5 | awk '{print $9}'`

> do

> grep $user $file | bkslog -f - -wn

> done

If this doesn't provide any output, perform step 2 as well to see if us sys admins can login.



2. Check if you can log in.

Pretty self-explanatory, isn't it? Try if you can log in yourself.



3. Check BoKS communications

Keon> cadm -l -f bcastaddr -h $client



4. Check bcastaddr and bremotever files.

Login to the client through its console port.

Keon> cat /etc/opt/boksm/bcastaddr

Keon> cat /etc/opt/boksm/bremotever

These two files should match the same files on another working client. Do not use a replica or master to compare the files. These are different over there. If you make any changes you will need to restart the BoKS processes using the Boot command.



5. Check BoKS port number.

On the client and master run:

Keon> getent services boks

This should return the same value for the BoKS base port. If it doesn't either check /etc/services or NIS+. If you make any changes you will need to restart the BoKS processes using the Boot command.



6. Check node keys

On the client system run:

Keon> hostkey

Take the output from that command and run the following on the master:

Keon> dumpbase | grep $hostkey

If this doesn't return the definition for the client server, the keys have become unsynchronized. Reset them and restart the BoKS client software. If you make any changes you will need to restart the BoKS processes using the Boot command.



7. Check BoKS error logs.

This should be pretty self-explanatory. Read the /var/opt/boksm/boks_errlog file on both the master and the client to see if you can detect any errors there. If the log file doesn't mention something about the hosts involved you should be able to find the cause of the problem pretty quickly.



8. Debug servc process on replica server or relevant process on client.

If all of the above fails you should really get cracking with the debugger. Refer to the appropriate chapter of this manual for details (see chapter: SCENARIO: Setting a trace within BoKS)

NOTE: If you need to restart the BoKS software on the client without logging in, try doing so using a remote management tool, like Tivoli.



The client queues are filling up or you can't communicate with the client

The whole of BoKS is still up and running and everything's working perfectly. The only client(s) that won't work are the one(s) that have stuck queues. The only way you'll find out about this is by running boksdiag fque -bridge which reports all of the queues which are stuck.

  1. Check if client is up and running.
  2. Check BoKS communications.
  3. Check node keys.
  4. Check BoKS error logs.

All commands are run in a BoKS shell, on the master server unless specified otherwise.



1. Check if client is up and running.

Keon> ping $client

Also ask your colleagues to see if they're working on the system. Maybe they're performing maintenance.



2. Check BoKS communications.

Keon> cadm -l -f bcastaddr -h $client



3. Check node keys.

On the client system run:

Keon> hostkey

Take the output from that command and run the following on the master:

Keon> dumpbase | grep $hostkey

If this doesn't return the definition for the client server, the keys have become unsynchronised. Reset them and restart the BoKS client software using the Boot command.



4. Check BoKS error logs.

This should be pretty self-explanatory. Read the /var/opt/boksm/boks_errlog file on both the master and the client to see if you can detect any errors there. If the log file doesn't mention something about the hosts involved you should be able to find the cause of the problem pretty quickly.

NOTE: What can we do about it?

If you're really desperate to get rid of the queue, do the following

Keon> boksdiag fque -bridge -delete $client-ip

At one point in time we thought it would be wise to manually delete messages from the spool directories. Do not under any circumstance touch the crypt_spool and master_spool directories in /var/opt/boksm. Really: DON'T DO THIS! This is unnecessary and will lead to troubles with BoKS.


kilala.nl tags: , ,

View or add comments (curr. 0)