Kilala.nl - Personal website of Thomas Sluyter

Unimportant background
Login
  RSS feed

About me

Blog archives

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

> Weblog

> Sysadmin articles

> Maths teaching

Hooray for Google's free projects

2017-05-11 21:04:00

A few weeks ago, I reopened commenting on this site after having it locked behind logins for years. Since then the amount of spam submissions have been growing steadily. Sucks, so I finally took the time to implement proper spam checking. Enter Google's free project reCaptcha. Of course I realize that, if something's free on the web, it probably means that I'm the product being sold. I'll have to poke around the code to see what it actually does :)

CodexWorld have a great tutorial on getting reCaptcha to work in a basic script. Took me less than an hour to get it all set up! Lovely!


kilala.nl tags: ,

View or add comments (curr. 3)

Getting with the times: website renovation

2017-01-19 22:18:00

It's been roughly eight years since I started work on KilalaCMS, the code that runs this website. She's served me well and I haven't had many headaches. Early on, Dick offered me lots of great help in sanitizing input, putting up at least some SQL injection protection. In the end it might not be much to look at, but she's mine :)

A few months back Dreamhost sent their customers who were still on PHP5.5 a warning that said version would soon be dropped from their servers. Thus, it was a warning to go check your code. Obviously KilalaCMS was behind the times, so I've now taken some time to adjust things here and there so it works in PHP7.0. I've also taken the liberty to default everything to HTTPS, using a free SSL cert from Lets Encrypt. Dreamhost took care of the latter part for me. Good service!

I may run into a bug or two, but so far things are looking good!

EDIT: Kudos by the way to Dreamhost for their tech support! As part of the reno, I'd decided to run an "sqlmap" test against my DEV site, to make sure I wasn't leaving SQLI in plain sight. After the first tentative probe, the server slammed the door on my nose! They've got their boxes set up quite nicely, to prevent attacks like these. Nice! Had a chat with their support people and we worked out a nice way for me to test, without affecting my site or any of the other folks hosted on my box. 


kilala.nl tags: , ,

View or add comments (curr. 0)

Nagios script: check_cnr

2009-09-14 22:05:00

This script is used to monitor the basic processes that go with Cisco's CNR (Network Registrar), which can be likened to a DHCP server. Cisco's Support Wiki described CNR as follows:

Cisco CNS Network Registrar is a full-featured DNS/DHCP system that provides scalable naming and addressing services for service provider and enterprise networks. Cisco CNS Network Registrar dramatically improves the reliability of naming and addressing services for enterprise networks. For cable ISPs, Cisco CNS Network Registrar provides scalable DNS and DHCP services and forms the basis of a DOCSIS cable modem provisioning system.

As said my script only checks the basics of CNR to ensure that the required daemons are running. It does not actually check any of the functionality, though at a later point in time it may be expanded to include this.


Usage of check_cnr

./check_cnr [-nagios|-tivoli] [-d -o FILE]
-nagios	Nagios output mode (default)
-tivoli	Tivoli output mode
-d	Debug mode
-o 	Output file for debug logging

Output

Depending on which mode you've selected the output of the script will differ slightly.

In Tivoli mode the output will be limited to a numerical value as the script is to be used as a "numeric script". 0 = OK, 1 = WARNING/UNKNOWN, 2= SEVERE. The exit code of the script will be identical to this value.

In Nagios mode the exit code of the script will be be similar to Tivoli's, with the exception that the value 3 portrays an unknown state. The output on stdout includes the service name and state (CNR OK/NOK) and a helpful error message.


Limitations


Download

Download check_cnr.sh
$ wc check_cnr.sh
189     666    4531 check_cnr.sh

$ cksum check_cnr.sh
4161895780 4531 check_cnr.sh

kilala.nl tags: , , ,

View or add comments (curr. 0)

The scope of variables in shell scripts

2008-01-01 00:00:00

Just today I ran into something shiny that peeked my interest. A shell script I'd written in Bash didn't work like I expected it to, with regards to the scope of a variable. I thought the incident was interesting enough to report, although I won't go into the whole scoping story too deeply.

What is basically boils down to is that there was a difference in the way two shells handle a certain situation. A difference that I didn't expect to be there. Not that exciting, but still very educational.


Scope?

Yeah. In most programming languages variables have a certain range within your program, within which they can be used. Some variables only exist within one subroutine, while other exist across the whole program or even across multiple parts of the whole.

In shell scripting things aren't that complicated, luckily. In most cases a variable that's set in one part of the script can be used in every other part of the script. There are some notable exceptions, one of which I ran into today without realising it.


The real code

My situation:

I have a command that outputs a number of lines, some of which I need. The lines that I'm interested in consist of various fields, two of which I need as variables. Depending on the value of one of these variables, a counter needs to be incremented.

I guess that sounds kinda complicated, so here's the real code snippet:

function check_transport_paths

{

TOTAL=`scstat -W | grep "Transport path:" | wc -l`

let COUNT=0



scstat -W | grep "Transport path:" | awk '{print $3" "$6}' | while read PATH STATUS

do

if [ $STATUS == "online" ]

then

let COUNT=$COUNT+1

fi

done



if [ $COUNT -lt 1 ]

then

echo "NOK - No transport paths online."

exit $STATE_CRITICAL

elif [ $COUNT -lt $TOTAL ]

then

echo "NOK - One or more transport paths offline."

exit $STATE_WARNING

fi

}


Where it goes wrong

While testing my script, I found out that $COUNT would never retain the value it gained in the while-loop. This of course led to the script always failing the check. After some fiddling about, I found out that the problem lay in the use of the while loop: it was being used that the end of a pipe.

To illustrate, the following -does- work.

let COUNT=0

while read i

do

let COUNT=$COUNT+$i

echo $COUNT

done



echo "Total is $COUNT."

This leads to the following output.

$ ./baka.sh

1

1

2

3

3

6

4

10

^D

Total is 10.

However, if I were to create a script called neko.sh that outputs the numbers one through four on seperate lines, which is then used in baka.sh... well... it doesn't work :D Regardez!

let COUNT=0

./neko.sh | while read i

do

let COUNT=$COUNT+$i

echo $COUNT

done



echo "Total is $COUNT."

This gives the following output

1

3

6

10

Total is 0.

Conclusions

After discussing the matter with two of my colleagues (one of them as puzzled as I was, and the other knowing what was going wrong) we came to the following conclusions.

This conclusion is supported by an example in the "Advanced Bash-scripting guide" by Mendel Cooper. In the following example an additional comment is made about the scoping of variables with redirected while loops. The comment warns that older shells branch a redirected while into a sub-shell, but also tells that Bash and Ksh this properly.

I guess our version of Bash is too old :3

Work around

A word of thanks

I'd like to thank my colleagues Dennis Roos and Tom Scholten for spending a spare hour with me, hacking at this problem. And I'd like to thank Ondrej Jombik for pointing out the fact that this article didn't make my conclusions very clear in its original version.


kilala.nl tags: , , ,

View or add comments (curr. 28)

Tivoli script: check_ntpconfig.sh

2007-08-30 11:46:00

This script was written at the time I was hired by T-Systems.

This script is an evolution of my earlier check_ntp_config. This time it's meant for use with Tivoli, although modifying it for use with Nagios is trivial. The script was written to be usable on at least five different Unices, though i've been having trouble with Darwin/OS X.

The script was tested on Red Hat Linux, Tru64, HP-UX, AIX and Solaris. Only Darwin seems to have problems.

Just like my other recent Nagios scripts, check_ntpconfig.sh comes with a debugging option. Set $DEBUG at the top of the file to anything larger than zero and the script will dump information at various stages of its execution.


#!/usr/bin/ksh
#
# NTP configuration check script for Tivoli.
# Written by Thomas Sluyter (nagiosATkilalaDOTnl)
# By request of T-Systems, CSS-CCTMO, the Netherlands
# Last Modified: 13-09-2007
# 
# Usage: ./check_ntp_config
#
# Description:
#   Well, there's not much to tell. We have no way of making sure that our 
# NTP clients are all configured in the right way, so I thought I'd make
# a Nagios check for it. ^_^ After that came this derivative Tivoli script.
#   You can change the NTP config at the top of this script, to match your
# own situation.
#
# Limitations:
#   This script should work fine on Solaris, HP-UX, AIX, Tru64 and some
# flavors of Linux. So far Darwin-compatibility has eluded me.
#
# Output:
#   If the NTP client config does not match what has been defined at the 
# top of this script, the script will echo $STATE_NOK. In this case, the 
# STATE variables contain a zero and a one, so you'll need to use a 
# "Numeric Script" monitor definition in Tivoli. Anything above zero is bad.
#
# Other notes:
#   If you ever run into problems with the script, set the DEBUG variable
# to 1. I'll need the output the script generates to do troubleshooting.
# See below for details.
#   I realise that all the debugging commands strewn throughout the script
# may make things a little harder to read. But in the end I'm sure it was
# well worth adding them. It makes troubleshooting so much easier. :3
#

### SETTING THINGS UP ###
PATH="/usr/bin:/usr/sbin:/bin:/sbin"
PROGNAME="./check_ntp_config"
STATE_NOK="1"
STATE_OK="0"

. /opt/Tivoli/lcf/dat/dm_env.sh >/dev/null 2>&1


### DEFINING THE NTP CLIENT CONFIGURATION AS IT SHOULD BE ###
NTPSERVERS="192.168.22.7 192.168.25.7 192.168.16.7"


### DEBUGGING SETUP ###
# Cause you never know when you'll need to squash a bug or two
DEBUG="1"

if [[ $DEBUG -gt 0 ]]
then
        DEBUGFILE="/tmp/thomas-debug.txt"
	if [[ -f $DEBUGFILE ]]
	then
            rm $DEBUGFILE >/dev/null 2>&1
	    [[ $? -gt 0 ]] && echo "Removing old debug file failed."
	    touch $DEBUGFILE
	fi
fi


### REQUISITE COMMAND LINE STUFF ###

print_usage() {
	echo ""
	echo "Usage: $PROGNAME"
}

print_help() {
	echo ""
	echo "NTP client configuration monitor plugin for Tivoli."
	echo ""
	echo "This plugin not developped by IBM."
	echo "Please do not e-mail them for support on this plugin, since"
	echo "they won't know what you're talking about :P"
	echo ""
	echo "For contact info, read the plugin itself..."
	echo ""
	print_usage
	echo ""
}

while test -n "$1" 
do
	case "$1" in
	  *) print_help; exit $STATE_OK;;
	esac
done


### DEFINING SUBROUTINES ###

function SetupEnv
{
    case $(uname) in
	Linux) 	CFGFILE="/etc/ntp.conf"; 
		IPCMD="host" 
		IPMOD="tail -1"
		NAMEMOD="tail -1"
		IPFIELD="4"
		NAMEFIELD="5" 
		GREP="egrep -e" ;;
	SunOS) 	CFGFILE="/etc/inet/ntp.conf"
		IPCMD="getent hosts"
		IPMOD=""
		NAMEMOD=""
		IPFIELD="1"
		NAMEFIELD="2"
		GREP="egrep -e" ;;
	Darwin) CFGFILE="/etc/ntp.conf"
		IPCMD="host"
		IPMOD=""
		NAMEMOD=""
		IPFIELD="4"
		NAMEFIELD="1"
		GREP="egrep -e" ;;
	AIX)    CFGFILE="/etc/ntp.conf"
		IPCMD="host"
		IPMOD=""
		NAMEMOD=""
		IPFIELD="3"
		NAMEFIELD="1"
		GREP="egrep -e" ;;
	HP-UX)  CFGFILE="/etc/ntp.conf"
		IPCMD="nslookup"
		IPMOD="grep ^\"Address\""
		NAMEMOD="grep ^\"Name\""
		IPFIELD="2"
		NAMEFIELD="2"
		GREP="egrep -e" ;;
	OSF1)   CFGFILE="/etc/ntp.conf"
		IPCMD="nslookup"
		IPMOD="grep ^\"Address\" | tail -1"
		NAMEMOD="grep ^\"Name\" |tail -1"
		IPFIELD="2"
		NAMEFIELD="2"
		GREP="egrep -e" ;;
	*) echo "Sorry. OS not supported."; exit 1 ;;
    esac

    FAULT=0

    if [[ $DEBUG -gt 0 ]]
    then
	echo "=== SETUP ===" >> $DEBUGFILE
	echo "OS name is $(uname)" >> $DEBUGFILE
	echo "CFGFILE is $CFGFILE" >> $DEBUGFILE
	echo "IPCMD is $IPCMD" >> $DEBUGFILE
	echo "IPMOD is $IPMOD" >> $DEBUGFILE
	echo "NAMEMOD is $NAMEMOD" >> $DEBUGFILE
	echo "IPFIELD is $IPFIELD" >> $DEBUGFILE
	echo "NAMEFIELD is $NAMEFIELD" >> $DEBUGFILE
	echo "" >> $DEBUGFILE
	echo "NTPSERVERS is $NTPSERVERS" >> $DEBUGFILE
	echo "" >> $DEBUGFILE
    fi
} 

function ListInConf
{
    if [[ -z $NTPSERVERS ]]
    then
	echo "You haven't configured this monitor yet. Set \$NTPSERVERS."; exit 0
	[[ $DEBUG -gt 0 ]] && echo "NTPSERVERS variable not set." >> $DEBUGFILE
    else

    for HOST in $(echo $NTPSERVERS)
    do
    SKIPIP=0
    SKIPNAME=0

    if [[ $DEBUG -gt 0 ]]
    then
	echo "=== LISTINCONF ===" >> $DEBUGFILE
	echo "HOST is $HOST" >> $DEBUGFILE
	echo "" >> $DEBUGFILE
    fi

        if [[ -z $(echo $HOST | $GREP [a-z,A-Z]) ]]	    
        then
            IPADDRESS="$HOST"
	    TEST=$($IPCMD $HOST 2>/dev/null)

	    if [[ ( $? -eq 0 ) && ( -z $(echo $TEST | $GREP NXDOMAIN) ) ]] 
	    then
		[[ $DEBUG -gt 0 ]] && echo "TEST is $TEST" >> $DEBUGFILE
            	HOSTNAME=$($IPCMD $HOST 2>/dev/null | $NAMEMOD | cut -f$NAMEFIELD -d" " | cut -f1 -d.)
	    else
		[[ $DEBUG -gt 0 ]] && echo "TEST is $TEST" >> $DEBUGFILE
		HOSTNAME=""
	    fi

	    if [[ $HOSTNAME -eq "" ]]
	    then
	    	QUERY="$IPADDRESS"
	    	[[ $DEBUG -gt 0 ]] && echo "Skipping hostname verification" >> $DEBUGFILE
	    else
	    	QUERY="$HOSTNAME $IPADDRESS"	
	    	[[ $DEBUG -gt 0 ]] && echo "Checking both IP and name." >> $DEBUGFILE
	    fi
        else
            HOSTNAME="$HOST"
	    TEST=$($IPCMD $HOST 2>/dev/null)

	    if [[ ( $? -eq 0 ) && ( -z $(echo $TEST | $GREP NXDOMAIN) ) ]] 
	    then
		[[ $DEBUG -gt 0 ]] && echo "TEST is $TEST" >> $DEBUGFILE
            	IPADDRESS=$($IPCMD $HOST 2>/dev/null | $IPMOD | cut -f$IPFIELD -d" ")
	    else
		[[ $DEBUG -gt 0 ]] && echo "TEST is $TEST" >> $DEBUGFILE
		IPADDRESS=""
	    fi

	    if [[ $IPADDRESS -eq "" ]]
	    then
		QUERY="$HOSTNAME"
		[[ $DEBUG -gt 0 ]] && echo "Skipping IP address verification" >> $DEBUGFILE
	    else
		QUERY="$HOSTNAME $IPADDRESS"	
		[[ $DEBUG -gt 0 ]] && echo "Checking both IP and name." >> $DEBUGFILE
	    fi
        fi

    if [[ $DEBUG -gt 0 ]]
    then
	echo "IPADDRESS is $IPADDRESS" >> $DEBUGFILE
	echo "HOSTNAME is $HOSTNAME" >> $DEBUGFILE
	echo "" >> $DEBUGFILE
    fi

	for NAME in `echo $QUERY`
	do
       	    [[ -z $($GREP $NAME $CFGFILE | $GREP "server") ]] && let FAULT=$FAULT+1
	done

    done

    fi
}

function ConfInList
{
    NUMSERVERS=$($GREP ^"server" $CFGFILE | wc -l)

    if [[ $DEBUG -gt 0 ]]
    then
	echo "=== CONFINLIST ===" >> $DEBUGFILE
	echo "Number of \"server\" lines in $CFGFILE is $NUMSERVERS" >> $DEBUGFILE
	echo "" >> $DEBUGFILE
    fi

    if [[ $($GREP ^"server" $CFGFILE | wc -l) -gt 0 ]]
    then

	for HOST in $(cat $CFGFILE | $GREP ^"server" | awk '{print $2}')
	do
		if [[ $DEBUG -gt 0 ]]
		then
			echo "HOST is $HOST" >> $DEBUGFILE
			echo "" >> $DEBUGFILE
		fi
		if [[ -z $(echo $HOST | $GREP [a-z,A-Z]) ]]	    
		then
			IPADDRESS="$HOST"
	    		TEST=$($IPCMD $HOST 2>/dev/null)

			if [[ ( $? -eq 0 ) && ( -z $(echo $TEST | $GREP NXDOMAIN) ) ]] 
			then
			    [[ $DEBUG -gt 0 ]] && echo "TEST is $TEST" >> $DEBUGFILE
            		    HOSTNAME=$($IPCMD $HOST 2>/dev/null | $NAMEMOD | cut -f$NAMEFIELD -d" " | cut -f1 -d.)
			else
			    [[ $DEBUG -gt 0 ]] && echo "TEST is $TEST" >> $DEBUGFILE
			    HOSTNAME=""
	    		fi

			if [[ $HOSTNAME -eq "" ]]
			then
			    QUERY="$IPADDRESS"
			    echo "Skipping hostname verification" >> $DEBUGFILE
			else
			    QUERY="$HOSTNAME $IPADDRESS"	
			    [[ $DEBUG -gt 0 ]] && echo "Checking both IP and name." >> $DEBUGFILE
			fi
		else
			HOSTNAME="$HOST"
	    		TEST=$($IPCMD $HOST 2>/dev/null)

			if [[ ( $? -eq 0 ) && ( -z $(echo $TEST | $GREP NXDOMAIN) ) ]] 
			then
			    [[ $DEBUG -gt 0 ]] && echo "TEST is $TEST" >> $DEBUGFILE
            		    HOSTNAME=$($IPCMD $HOST 2>/dev/null | $IPMOD | cut -f$IPFIELD -d" ")
			else
			    [[ $DEBUG -gt 0 ]] && echo "TEST is $TEST" >> $DEBUGFILE
			    IPADDRESS=""
	    		fi

			if [[ $IPADDRESS -eq "" ]]
			then
				QUERY="$HOSTNAME"
				echo "Skipping IP address verification" >> $DEBUGFILE
			else
				QUERY="$HOSTNAME $IPADDRESS"	
				[[ $DEBUG -gt 0 ]] && echo "Checking both IP and name." >> $DEBUGFILE
			fi
		fi

		if [[ $DEBUG -gt 0 ]]
		then
			echo "IPADDRESS is $IPADDRESS" >> $DEBUGFILE
			echo "HOSTNAME is $HOSTNAME" >> $DEBUGFILE
			echo "" >> $DEBUGFILE
		fi

		for NAME in `echo $QUERY`
		do
		    [[ -z $(echo $NTPSERVERS | $GREP $NAME) ]] && let FAULT=$FAULT+1
		done

	done
    fi
}

### FINALLY, THE MAIN ROUTINE ###

SetupEnv

    if [[ $DEBUG -gt 0 ]]
    then
	echo "=== STARTING MAIN PHASE ===" >> $DEBUGFILE
	echo "" >> $DEBUGFILE
	echo "=== NTP CONFIG FILE ===" >> $DEBUGFILE
	cat $CFGFILE | grep -v ^"\#" >> $DEBUGFILE
	echo "" >> $DEBUGFILE
	echo "" >> $DEBUGFILE
    fi

ListInConf
ConfInList

# Nothing caused us to exit early, so we're okay.
if [[ $FAULT -gt 0 ]]
then
    echo "$STATE_NOK"
    exit $STATE_NOK
else
    echo "$STATE_OK"
    exit $STATE_OK
fi

kilala.nl tags: , , ,

View or add comments (curr. 0)

Nagios script: check_log2

2006-06-01 00:00:00

This script was written at the time I was hired by UPC / Liberty Global.

Improved log checker for Solaris, with state retention.

I found that the version of check_log included in the default monitor package doesn't work perfectly on Solaris: it needs a bit of tweaking... Which is what I've done for the script.

Also, I've added state retention. It's a bit of a hack, but hey! I needed a quick solution.

The original script sends a Critical when it detects the string you've queried the log file for, but it clears that same Critical immediately if the same message is not repeated once the monitor runs again. Meaning that, if there are no updates to your log file, the Critical will only be around until the next time the monitor runs.

Not very handy if the Critical occurs during the night.

This new version of the script creates a file called $oldlog.STATE in /usr/local/nagios/var (which should be 755, nagios:nagios), which contains the exit status for the last detected _changed_ status... If there are no changes detected in your log file, this old exit state is repeated.

The script has been tested on Solaris 8, Mac OS X 10.4 and Redhat ES3.

UPDATE 19/06/2006:

Cleaned up the script a bit and added some checks that are considered the Right Thing to do. Should have done this -way- earlier!

Also stomped out a few horrendous bugs! I'm very sorry for putting out such a buggy script earlier... If you've started using the script in your environment, please download the latest version. Thanks to Ali Khan for pointing out these mistakes.


#!/bin/bash
#
# Log file pattern detector plugin for Nagios
# Written by Ethan Galstad (nagios@nagios.org)
# Last Modified: 07-31-1999
# Updated by Thomas Sluyter (nagiosATkilalaDOTnl)
# Last Modified: 19-06-2006
#
# Usage: ./check_log2 -F log_file -O old_log_file -Q pattern
#
# Description:
#
# This plugin will scan a log file (specified by the log_file option)
# for a specific pattern (specified by the pattern option).  Successive
# calls to the plugin script will only report *new* pattern matches in the
# log file, since an copy of the log file from the previous run is saved
# to old_log_file.
#
# Output:
#
# On the first run of the plugin, it will return an OK state with a message
# of "Log check data initialized".  On successive runs, it will return an OK
# state if *no* pattern matches have been found in the *difference* between the
# log file and the older copy of the log file.  If the plugin detects any 
# pattern matches in the log diff, it will return a CRITICAL state and print
# out a message is the following format: "(x) last_match", where "x" is the
# total number of pattern matches found in the file and "last_match" is the
# last entry in the log file which matches the pattern.
#
# Notes:
#
# If you use this plugin make sure to keep the following in mind:
#
#    1.  The "max_attempts" value for the service should be 1, as this
#        will prevent Nagios from retrying the service check (the
#        next time the check is run it will not produce the same results).
#
#    2.  The "notify_recovery" value for the service should be 0, so that
#        Nagios does not notify you of "recoveries" for the check.  Since
#        pattern matches in the log file will only be reported once and not
#        the next time, there will always be "recoveries" for the service, even
#        though recoveries really don't apply to this type of check.
#
#    3.  You *must* supply a different old_file_log for each service that
#        you define to use this plugin script - even if the different services
#        check the same log_file for pattern matches.  This is necessary
#        because of the way the script operates.
#
#    4.  Changes to the script were made by Thomas Sluyter (nagios@kilala.nl).
#	 The first set of changes will allow the script to run properly on Solaris, which
#	 it did not do by default. The second set of changes will allow the following:
#	 * State retention. If a NOK was generated at point A in time and it is not repeated
# 	   at A+1, then an OK is sent to Nagios. Not something that you would like to happen.
#	   I've added the $oldlog.STATE trigger file which retains the last exitstatus. Should
# 	   there be no new lines added to the log, check_log will simply repeat the last state
#	   instead of give an OK.
#
# Examples:
#
# Check for login failures in the syslog...
#
#   check_log -F /var/log/messages -O /usr/local/nagios/var/check_log.badlogins.old -Q "LOGIN FAILURE"
#
# Check for port scan alerts generated by Psionic's PortSentry software...
#
#   check_log -F /var/log/messages -O /usr/local/nagios/var/check_log.portscan.old -Q "attackalert"
#

# Paths to commands used in this script.  These
# may have to be modified to match your system setup.

PATH="/usr/bin:/usr/sbin:/bin:/sbin"

PROGNAME=`basename $0`
PROGPATH=`echo $0 | sed -e 's,[\\/][^\\/][^\\/]*$,,'`

#. $PROGPATH/utils.sh
. /usr/local/nagios/libexec/utils.sh

print_usage() {
    echo "Usage: $PROGNAME -F logfile -O oldlog -Q query"
    echo "Usage: $PROGNAME --help"
}

print_help() {
    echo ""
    print_usage
    echo ""
    echo "Log file pattern detector plugin for Nagios"
    echo ""
    support
}

# Make sure the correct number of command line
# arguments have been supplied

if [ $# -lt 6 ]; then
    print_usage
    exit $STATE_UNKNOWN
fi

# Grab the command line arguments

exitstatus=$STATE_WARNING #default
while test -n "$1"; do
    case "$1" in
        --help)
            print_help
            exit $STATE_OK
            ;;
        -h)
            print_help
            exit $STATE_OK
            ;;
        -F)
            logfile=$2
            shift
            ;;
        -O)
            oldlog=$2
            shift
            ;;
        -Q)
            query=$2
            shift
            ;;
        *)
            echo "Unknown argument: $1"
            print_usage
            exit $STATE_UNKNOWN
            ;;
    esac
    shift
done

# If the source log file doesn't exist, exit

if [ ! -e $logfile ]; then
    echo "Log check error: Log file $logfile does not exist!"
    exit $STATE_UNKNOWN
    echo $STATE_UNKNOWN > $oldlog.STATE
fi

# If the oldlog file doesn't exist, this must be the first time
# we're running this test, so copy the original log file over to
# the old diff file and exit

if [ ! -e $oldlog ]; then
    cat $logfile > $oldlog
    if [ `tail -1 $logfile | grep -i $query | wc -l` -gt 0 ]
    then
        echo "Log check data initialized... Last line contained error message."
        echo $STATE_CRITICAL > $oldlog.STATE
	exit $STATE_CRITICAL
    else
        echo "Log check data initialized..."
        echo $STATE_OK > $oldlog.STATE
        exit $STATE_OK
    fi
fi

# A bug which was caught very late:
# If newlog is shorter than oldlog, the diff used below will return
# false positives for the query because the will be in $oldlog. Why?
# Because $oldlog is not rolled over / rotated, like $newlog. I need 
# to fix this in a kludgy way.

if [ `wc -l $logfile|awk '{print $1}'` -lt `wc -l $oldlog|awk '{print $1}'` ]
then
    rm $oldlog
    cat $logfile > $oldlog
    if [ `tail -1 $logfile | grep -i $query | wc -l` -gt 0 ]
    then
        echo "Log check data re-initialized... Last line contained error message."
        echo $STATE_CRITICAL > $oldlog.STATE
	exit $STATE_CRITICAL
    else
        echo "Log check data re-initialized..."
        echo $STATE_OK > $oldlog.STATE
        exit $STATE_OK
    fi
fi

# Everything seems fine, so compare it to the original log now

# The temporary file that the script should use while
# processing the log file.
if [ -x mktemp ]; then
    tempdiff=`mktemp /tmp/check_log.XXXXXXXXXX`
else
    tempdate=`/bin/date '+%H%M%S'`
    tempdiff="/tmp/check_log.${tempdate}"
    touch $tempdiff
fi

diff $logfile $oldlog > $tempdiff

if [ `wc -l $tempdiff|awk '{print $1}'` -eq 0 ]
then
     rm $tempdiff
     touch $oldlog.STATE
     exitstatus=`cat $oldlog.STATE`
     echo "LOG FILE - No status change detected. Status = $exitstatus"
     exit $exitstatus
fi

# Count the number of matching log entries we have
count=`grep -c "$query" $tempdiff`

# Get the last matching entry in the diff file
lastentry=`grep "$query" $tempdiff | tail -1`

rm -f $tempdiff
cat $logfile > $oldlog

if [ "$count" = "0" ]; then # no matches, exit with no error
    echo "Log check ok - 0 pattern matches found"
    exitstatus=$STATE_OK
else # Print total matche count and the last entry we found
#    echo "($count) $lastentry"
    echo "Log check NOK - $lastentry"
    exitstatus=$STATE_CRITICAL
    echo $STATE_CRITICAL > $oldlog.STATE
fi

exit $exitstatus


echo "Starting clean"
rm /tmp/foobar /usr/local/nagios/var/foobar*
/usr/local/nagios/libexec/check_log2 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -Q neko
echo $?
echo ""

echo "Starting normally"
echo "normal"
echo "normal" >> /tmp/foobar
/usr/local/nagios/libexec/check_log2 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -Q neko
echo $?
echo ""
echo "normal"
echo "normal" >> /tmp/foobar
/usr/local/nagios/libexec/check_log2 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -Q neko
echo $?
echo ""
echo "critical"
echo "neko" >> /tmp/foobar
/usr/local/nagios/libexec/check_log2 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -Q neko
echo $?
echo ""
echo "normal"
echo "baka" >> /tmp/foobar
/usr/local/nagios/libexec/check_log2 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -Q neko
echo $?
echo ""

echo "Log rotation with crit"
rm /tmp/foobar
/usr/local/nagios/libexec/check_log2 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -Q neko
echo $?
echo ""
echo "critical"
echo "neko" >> /tmp/foobar
/usr/local/nagios/libexec/check_log2 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -Q neko
echo $?
echo ""
echo "normal"
echo "baka" >> /tmp/foobar
/usr/local/nagios/libexec/check_log2 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -Q neko
echo $?
echo ""

echo "Normal log rotation"
rm /tmp/foobar
/usr/local/nagios/libexec/check_log2 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -Q neko
echo $?
echo ""
echo "normal"
echo "baka" >> /tmp/foobar
/usr/local/nagios/libexec/check_log2 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -Q neko
echo $?
echo ""
echo "normal"
echo "baka" >> /tmp/foobar
/usr/local/nagios/libexec/check_log2 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -Q neko
echo $?
echo ""


kilala.nl tags: , , ,

View or add comments (curr. 2)

Nagios script: check_log3

2006-06-01 00:00:00

This script was written at the time I was hired by KPN i-Diensten. It is reproduced/shared here with their permission.

Today I made an improved version of the Nagios monitor "check_log2", which is now aptly called "check_log3". It includes all the improvements I originally added to "check_log2", so you can simply use this as a drop-in replacement.

Version 3 of this script gives you the option to add a second query to the monitor.

The previous two incarnations of the script only allowed you to search for one query and would return a Critical if it was found. Now you can also add a query which will return in a Warning message as well. Goody! :3

1st of Feb, 2006:

Kyle Tucker pointed out that he had problems running this script with bash on Solaris. The changes he suggested have been worked into the newer version. Thanks Kyle :)

5th of Mar, 2006:

I finally got round to fix the script according to all the changes Kyle (and others) suggested. So here's another try! Right now I've tested the script on Red Hat, Mac OS X and Solaris, so it should be much better than before.

19th of June, 2006:

Cleaned up the script a bit and added some checks that are considered the Right Thing to do. Should have done this -way- earlier!

Also stomped out a few horrendous bugs! I'm very sorry for putting out such a buggy script earlier... If you've started using the script in your environment, please download the latest version. Thanks to Ali Khan for pointing out these mistakes.



#!/bin/bash
#
# Log file pattern detector plugin for Nagios
# Written by Ethan Galstad (nagios@nagios.org)
# Last Modified: 07-31-1999
# Heavily modified by Thomas Sluyter (nagiosATkilalaDOTnl)
# Last Modified: 19-06-2006
#
# Usage: ./check_log3 -F log_file -O old_log_file -C crit-pattern -W warn-pattern
#
# Description:
#
# This plugin will scan a log file (specified by the log_file option)
# for specific patterns (specified by the XXX-pattern options).  Successive
# calls to the plugin script will only report *new* pattern matches in the
# log file, since an copy of the log file from the previous run is saved
# to old_log_file.
#
# Output:
#
# On the first run of the plugin, it will return an OK state with a message
# of "Log check data initialized".  On successive runs, it will return an OK
# state if *no* pattern matches have been found in the *difference* between the
# log file and the older copy of the log file.  If the plugin detects any 
# pattern matches in the log diff, it will return a CRITICAL state and print
# out a message is the following format: "(x) last_match", where "x" is the
# total number of pattern matches found in the file and "last_match" is the
# last entry in the log file which matches the pattern.
#
# Notes:
#
# If you use this plugin make sure to keep the following in mind:
#
#    1.  The "max_attempts" value for the service should be 1, as this
#        will prevent Nagios from retrying the service check (the
#        next time the check is run it will not produce the same results).
#
#    2.  The "notify_recovery" value for the service should be 0, so that
#        Nagios does not notify you of "recoveries" for the check.  Since
#        pattern matches in the log file will only be reported once and not
#        the next time, there will always be "recoveries" for the service, even
#        though recoveries really don't apply to this type of check.
#
#    3.  You *must* supply a different old_file_log for each service that
#        you define to use this plugin script - even if the different services
#        check the same log_file for pattern matches.  This is necessary
#        because of the way the script operates.
#
#    4.  Changes to the script were made by Thomas Sluyter (cailin@kilala.nl).
#	 * The first set of changes will allow the script to run properly on Solaris, which
#	   it did not do by default. The second set of changes will allow the following:
#	 * State retention. In the original script, if a NOK was put into the log file
#	   at point A in time and it is not repeated at A+1, then an OK is sent to Nagios. 
# 	   Not something that you would like to happen.
#	      I've added the $oldlog.STATE trigger file which retains the last exitstatus. Should
# 	   there be no new lines added to the log, check_log will simply repeat the last state
#	   instead of give an OK.
#	      In order for this state retention to work properly your client system MUST
#	   HAVE THE DIRECTORY /USR/LOCAL/NAGIOS/VAR.
#        * Two queries. In the original script you could only enter one query which, when
#	   found, would result in  a Critical message being sent to Nagios. I've added the 
#	   possibility to add another query, which will result in a Warning message.
#	 * Bugfix: changed all instances of "crit-count" and "warn-count" to "critcount" and
#	   "warncount" after a tip from Kyle Tucker who ran into problems running this script
#	   with bash on Solaris.
#

# Paths to commands used in this script.  These
# may have to be modified to match your system setup.

PATH="/usr/bin:/usr/sbin:/bin:/sbin"

PROGNAME=`basename $0`
PROGPATH=`echo $0 | sed -e 's,[\\/][^\\/][^\\/]*$,,'`

#. $PROGPATH/utils.sh
. /usr/local/nagios/libexec/utils.sh

print_usage() {
    echo "Usage: $PROGNAME -F logfile -O oldlog -C CRITquery -W WARNquery"
    echo "Usage: $PROGNAME --help"
    echo "Usage: $PROGNAME --version"
}

print_help() {
    echo ""
    print_usage
    echo ""
    echo "Log file pattern detector plugin for Nagios"
    echo ""
    support
}

# Make sure the correct number of command line
# arguments have been supplied

if [ $# -lt 8 ]; then
    print_usage
    exit $STATE_UNKNOWN
fi

# Grab the command line arguments

exitstatus=$STATE_WARNING #default
while test -n "$1"; do
    case "$1" in
        --help)
            print_help
            exit $STATE_OK
            ;;
        -h)
            print_help
            exit $STATE_OK
            ;;
        -F)
            logfile=$2
            shift
            ;;
        -O)
            oldlog=$2
            shift
            ;;
        -C)
            CRITquery=$2
            shift
            ;;
        -W)
            WARNquery=$2
            shift
            ;;
        *)
            echo "Unknown argument: $1"
            print_usage
            exit $STATE_UNKNOWN
            ;;
    esac
    shift
done

# If the source log file doesn't exist, exit

if [ ! -e $logfile ]; then
    echo "Log check error: Log file $logfile does not exist!"
    exit $STATE_UNKNOWN
    echo $STATE_UNKNOWN > $oldlog.STATE
fi

# If the dump/temp log file doesn't exist, this must be the first time
# we're running this test, so copy the original log file over to
# the old diff file and exit

if [ ! -e $oldlog ]; then
    cat $logfile > $oldlog

    TEMPcount=0
    let TEMPcount=$TEMPcount+$(tail -1 $logfile | grep -i $WARNquery | wc -l | awk '{print $1}')
    let TEMPcount=$TEMPcount+$(tail -1 $logfile | grep -i $CRITquery | wc -l | awk '{print $1}')

    if [ $TEMPcount -gt 0 ]
    then
       echo "Log check data initialized... Last line contained error message."
       echo $STATE_WARNING > $oldlog.STATE
       exit $STATE_WARNING
    else
       echo "Log check data initialized..."
       echo $STATE_OK > $oldlog.STATE
       exit $STATE_OK
    fi
fi

# A bug which was caught very late:
# If newlog is shorter than oldlog, the diff used below will return
# false positives for the query because the will be in $oldlog. Why?
# Because $oldlog is not rolled over / rotated, like $newlog. I need
# to fix this in a kludgy way.

if [ `wc -l $logfile|awk '{print $1}'` -lt `wc -l $oldlog|awk '{print $1}'` ]
then
    rm $oldlog
    cat $logfile > $oldlog
    TEMPcount=0
    let TEMPcount=$TEMPcount+$(tail -1 $logfile | grep -i $WARNquery | wc -l | awk '{print $1}')
    let TEMPcount=$TEMPcount+$(tail -1 $logfile | grep -i $CRITquery | wc -l | awk '{print $1}')

    if [ $TEMPcount -gt 0 ]
    then
       echo "Log check data initialized... Last line contained error message."
       echo $STATE_WARNING > $oldlog.STATE
       exit $STATE_WARNING
    else
       echo "Log check data initialized..."
       echo $STATE_OK > $oldlog.STATE
       exit $STATE_OK
    fi
fi

# The oldlog file exists, so compare it to the original log now

# The temporary file that the script should use while
# processing the log file.
if [ -x mktemp ]; then
    tempdiff=`mktemp /tmp/check_log.XXXXXXXXXX`
else
    tempdate=`/bin/date '+%H%M%S'`
    tempdiff="/tmp/check_log.${tempdate}"
    touch $tempdiff
fi

diff $logfile $oldlog > $tempdiff

if [ `wc -l $tempdiff | awk '{print $1}'` -eq 0 ]
then
     rm $tempdiff
     touch $oldlog.STATE
     exitstatus=`cat $oldlog.STATE`
     echo "LOG FILE - No status change detected. Status = $exitstatus"
     exit $exitstatus
fi

# Count the number of matching log entries we have
CRITcount=`grep -c "$CRITquery" $tempdiff`
WARNcount=`grep -c "$WARNquery" $tempdiff`

# Get the last matching entry in the diff file
CRITlastentry=`grep "$CRITquery" $tempdiff | tail -1`
WARNlastentry=`grep "$WARNquery" $tempdiff | tail -1`

rm $tempdiff
cat $logfile > $oldlog

if [ "$CRITcount" -gt 0 ]; then
    	echo "($CRITcount) $CRITlastentry"
    	echo $STATE_CRITICAL > $oldlog.STATE
	exit $STATE_CRITICAL
fi

if [ "$WARNcount" -gt 0 ]; then
    	echo "($WARNcount) $WARNlastentry"
    	echo $STATE_WARNING > $oldlog.STATE
	exit $STATE_WARNING
fi

echo "Log check ok - 0 pattern matches found"
exit $STATE_OK



echo "Starting clean"
rm /tmp/foobar /usr/local/nagios/var/foobar*
/usr/local/nagios/libexec/check_log3 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -C neko -W bla
echo $?
echo ""

echo "Starting normally"
echo "baka"
echo "normal" >> /tmp/foobar
/usr/local/nagios/libexec/check_log3 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -C neko -W bla
echo $?
echo ""
echo "baka"
echo "normal" >> /tmp/foobar
/usr/local/nagios/libexec/check_log3 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -C neko -W bla
echo $?
echo ""
echo "warning"
echo "bla" >> /tmp/foobar
/usr/local/nagios/libexec/check_log3 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -C neko -W bla
echo $?
echo ""
echo "critical"
echo "neko" >> /tmp/foobar
/usr/local/nagios/libexec/check_log3 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -C neko -W bla
echo $?
echo ""
echo "warning"
echo "bla" >> /tmp/foobar
/usr/local/nagios/libexec/check_log3 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -C neko -W bla
echo $?
echo ""
echo "normal"
echo "baka" >> /tmp/foobar
/usr/local/nagios/libexec/check_log3 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -C neko -W bla
echo $?
echo ""

echo "Log rotation with crit"
rm /tmp/foobar
/usr/local/nagios/libexec/check_log3 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -C neko -W bla
echo $?
echo ""
echo "critical"
echo "neko" >> /tmp/foobar
/usr/local/nagios/libexec/check_log3 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -C neko -W bla
echo $?
echo ""
echo "normal"
echo "baka" >> /tmp/foobar
/usr/local/nagios/libexec/check_log3 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -C neko -W bla
echo $?
echo ""

echo "Log rotation with warn"
rm /tmp/foobar
/usr/local/nagios/libexec/check_log3 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -C neko -W bla
echo $?
echo ""
echo "warning"
echo "bla" >> /tmp/foobar
/usr/local/nagios/libexec/check_log3 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -C neko -W bla
echo $?
echo ""
echo "normal"
echo "baka" >> /tmp/foobar
/usr/local/nagios/libexec/check_log3 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -C neko -W bla
echo $?
echo ""

echo "Normal log rotation"
rm /tmp/foobar
/usr/local/nagios/libexec/check_log3 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -C neko -W bla
echo $?
echo ""
echo "normal"
echo "baka" >> /tmp/foobar
/usr/local/nagios/libexec/check_log3 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -C neko -W bla
echo $?
echo ""
echo "normal"
echo "baka" >> /tmp/foobar
/usr/local/nagios/libexec/check_log3 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -C neko -W bla
echo $?
echo ""


kilala.nl tags: , , ,

View or add comments (curr. 2)

Nagios script: check_named

2006-06-01 00:00:00

This script was written at the time I was hired by UPC / Liberty Global.

Basic monitor to check whether BIND is up and running. It checks for a number of processes and tries to perform a basic lookup using the localhost.

This script was quickly hacked together for my current customer, as a Q&D solution for their monitoring needs. It's no beauty, but it works. Written in ksh and tested with:

A Critical is sent if:

A) one or more of the required processes is not running, or

B) the script is unable to perform a basic lookup using the localhost.

UPDATE 19/06/2006:

Cleaned up the script a bit and added some checks that are considered the Right Thing to do. Should have done this -way- earlier!


#!/usr/bin/bash
#
# DNS / Named process monitor plugin for Nagios
# Written by Thomas Sluyter (nagiosATkilalaDOTnl)
# By request of DTV Labs, Liberty Global, the Netherlands
# Last Modified: 19-06-2006
# 
# Usage: ./check_named
#
# Description:
# This plugin determines whether the named DNS server
# is running properly. It will check the following:
# * Are all required processes running?
# * Is it possible to make DNS requests?
#
# Limitations:
# Currently this plugin will only function correctly on Solaris systems.
#
# Output:
# The script returns a CRIT when the abovementioned criteria are
# not matched.
#

# Host OS check and warning message
if [ `uname` != "SunOS" ]
then
        echo "WARNING:"
        echo "This script was originally written for use on Solaris."
        echo "You may run into some problems running it on this host."
        echo ""
        echo "Please verify that the script works before using it in a"
        echo "live environment. You can easily disable this message after"
        echo "testing the script."
        echo ""
fi

# You may have to change this, depending on where you installed your
# Nagios plugins
PATH="/usr/bin:/usr/sbin:/bin:/sbin"
LIBEXEC="/usr/local/nagios/libexec"
. $LIBEXEC/utils.sh

print_usage() {
	echo "Usage: $PROGNAME"
	echo "Usage: $PROGNAME --help"
}

print_help() {
	echo ""
	print_usage
	echo ""
	echo "Named DNS monitor plugin for Nagios"
	echo ""
	echo "This plugin not developped by the Nagios Plugin group."
	echo "Please do not e-mail them for support on this plugin, since"
	echo "they won't know what you're talking about :P"
	echo ""
	echo "For contact info, read the plugin itself..."
}

while test -n "$1" 
do
	case "$1" in
	  --help) print_help; exit $STATE_OK;;
	  -h) print_help; exit $STATE_OK;;
	  *) print_usage; exit $STATE_UNKNOWN;;
	esac
done

check_processes()
{
	PROCESS="0"
	if [ `ps -ef | grep named | grep -v grep | grep -v nagios | wc -l` -lt 1 ]; then 
		echo "NAMED NOK - One or more processes not running"
		exitstatus=$STATE_CRITICAL
		exit $exitstatus
	fi
}

check_service()
{
	SERVICE=0
	nslookup www.google.com localhost >/dev/null 2>&1
	if [ $? -eq 1 ]; then SERVICE=1;fi

	if [ $SERVICE -eq 1 ]; then 
		echo "SQUID NOK - One or more TCP/IP ports not listening."
		exitstatus=$STATE_CRITICAL
		exit $exitstatus
	fi
}

check_processes
check_service

echo "NAMED OK - Everything running like it should"
exitstatus=$STATE_OK
exit $exitstatus

kilala.nl tags: , , ,

View or add comments (curr. 0)

Nagios script: check_networking

2006-06-01 00:00:00

This script was written at the time I was hired by KPN i-Diensten. It is reproduced/shared here with their permission.

I couldn't find an easy way to check whether all interfaces of a host are up and running from the -inside-, so I wrote a Nagios plugin to do this.

Naturally you could also try to ping all of the IP addresses of all of these network cards, but this isn't always possible. Lord knows how many routing issues I had fight through to get our current IP set monitored. I guess using this script is a bit easier :)

The script was tested on Redhat ES3, Mac OSX and Solaris. Its basic requirement is the Korn shell (due to some conversions happening inside the script). On Linux/RH you'll need mii-tool (and sudo) and on Solaris you'll need Perl (for one lousy piece of math :p ).

EDIT:

Oh! Just like my other recent Nagios scripts, check_networking comes with a debugging option. Set $DEBUG at the top of the file to anything larger than zero and the script will dump information at various stages of its execution.



#!/usr/bin/ksh
#
# Basic UNIX networking check script.
# Written by Thomas Sluyter (nagiosATkilalaDOTnl)
# By request of KPN-IS, i-Provide SYS, the Netherlands
# Last Modified: 22-06-2006
#
# Usage: ./check_networking
#
# Description:
#   This plugin determines whether the local host's network interfaces
# are all up and running like they should. It uses the following
# questions to determine this.
# * Does /sbin/mii-tool report any problems? (Linux only)
# * Are the gateways for each subnet pingable?
#
# Limitations:
# * I have no clue whether mii-tool is something specific to Redhat ES3,
#   or whether all Linii have it. 
# * Sudo access to mii-tool is required for the nagios account.
# * Perl is required on Solaris, to do just tiny bit of math.
# * KSH is required.
# * The script assumes that the first available IP from a subnet is the
#   router.
#
# Output:
#   The script retunrs a CRIT when one of the criteria mentioned
# above is not matched.
#
# Other notes:
#   I wish I'd learn Perl. I'm sure that doing all of this stuff in Perl
# would have cut down on the size of this script tremendously. Ah well.
#   If you ever run into problems with the script, set the DEBUG variable
# to 1. I'll need the output the script generates to do troubleshooting.
# See below for details. 
#   I realise that all the debugging commands strewn throughout the script
# may make things a little harder to read. But in the end I'm sure it was
# well worth adding them. It makes troubleshooting so much easier. :3
#

# Enabling the following dumps information into DEBUGFILE at various
# stages during the execution of this script.
DEBUG="0"
DEBUGFILE="/tmp/foobar"


### REQUISITE NAGIOS USER INTERFACE STUFF ###

# You may have to change this, depending on where you installed your
# Nagios plugins
PATH="/usr/bin:/usr/sbin:/bin:/sbin"
LIBEXEC="/usr/local/nagios/libexec"
. $LIBEXEC/utils.sh

[ $DEBUG -gt 0 ] && rm $DEBUGFILE 

print_usage() {
        echo "Usage: $PROGNAME"
        echo "Usage: $PROGNAME --help"
}

print_help() {
        echo ""
        print_usage
        echo ""
        echo "Basic UNIX networking check plugin for Nagios"
        echo ""
        echo "This plugin not developped by the Nagios Plugin group."
        echo "Please do not e-mail them for support on this plugin, since"
        echo "they won't know what you're talking about :P"
        echo ""
        echo "For contact info, read the plugin itself..."
}

while test -n "$1"
do
        case "$1" in
          --help) print_help; exit $STATE_OK;;
          -h) print_help; exit $STATE_OK;;
          *) print_usage; exit $STATE_UNKNOWN;;
        esac
done


### SETTING UP THE ENVIRONMENT ###

# Host OS check and warning message
MIITOOL="0"
if [ -f /sbin/mii-tool ]
then
        MIITOOL="1"

        sudo /sbin/mii-tool >/dev/null 2>&1
        if [ $? -gt 0 ]
        then
                echo "ERROR: sudo permissions"
                echo ""
                echo "This script requires that the Nagios user account has"
                echo "sudo permissions for the mii-tool command. Currently it"
                echo "does not have these permissions. Please fix this."
                echo ""
                exit $STATE_UNKNOWN
        fi
fi


### SUB-ROUTINE DEFINITIONS ### 

function convert_base
{
        typeset -i${2:-16} x
        x=$1
        echo $x
}

function subnet_router
{
[ $DEBUG -gt 0 ] && echo "- Starting subnet_router -" >> $DEBUGFILE
    first="0"; second="0"; third="0"; fourth="0"
    first=`echo $1 | cut -c 1-8`; FIRST=`convert_base 2#$first 10`
[ $DEBUG -gt 0 ] && echo "First: $first $FIRST" >> $DEBUGFILE
    second=`echo $1 | cut -c 9-16`; SECOND=`convert_base 2#$second 10`
[ $DEBUG -gt 0 ] && echo "Second: $second $SECOND" >> $DEBUGFILE
    third=`echo $1 | cut -c 17-24`; THIRD=`convert_base 2#$third 10`
[ $DEBUG -gt 0 ] && echo "Third: $third $THIRD" >> $DEBUGFILE
    fourth=`echo $1 | cut -c 25-32`
    [ `echo $fourth|wc -c` -gt 1 ] || fourth="0"
    TEMPCOUNT=`echo $fourth | wc -c | awk '{print $1}'`
    let PADDING=9-$TEMPCOUNT 
[ $DEBUG -gt 0 ] && echo "Fourth: padding fourth with $PADDING zeroes" >> $DEBUGFILE
    i=1
    while ((i <= $PADDING));
    do
       fourth=$fourth"0" 
       let i=$i+1
    done
    FOURTH=`convert_base 2#$fourth 10`; let FOURTH=$FOURTH+1
[ $DEBUG -gt 0 ] && echo "Fourth: $fourth $FOURTH" >> $DEBUGFILE

    echo "$FIRST.$SECOND.$THIRD.$FOURTH"
}

gather_interfaces_linux()
{
[ $DEBUG -gt 0 ] && echo "- Starting gather_interfaces_linux -" >> $DEBUGFILE
    for INTF in `ifconfig -a | grep ^[a-z] | grep -v ^lo | awk '{print $1}'`
    do
	if [ `echo $INTF | grep : | wc -l` -gt 0 ]
	then
            export INTERFACES="`echo $INTF|awk -F: '{print $1}'` $INTERFACES"
	else
            export INTERFACES="$INTF $INTERFACES"
	fi
    done

    INTFCOUNT=`echo $INTERFACES | wc -w`
[ $DEBUG -gt 0 ] && echo "Interfaces: There are $INTFCOUNT interfaces: $INTERFACES." >> $DEBUGFILE
    if [ $INTFCOUNT -lt 1 ] 
    then
	echo "NOK - No active network interfaces."
	exit $STATE_CRITICAL
    fi
}

gather_interfaces_darwin()
{
[ $DEBUG -gt 0 ] && echo "- Starting gather_interfaces_darwin -" >> $DEBUGFILE
    for INTF in `ifconfig -a | grep ^[a-z] | grep -v ^gif | grep -v ^stf | grep -v ^lo | awk '{print $1}'`
    do
        [ `echo $INTF | grep : | wc -l` -gt 0 ] && INTF=`echo $INTF|awk -F: '{print $1}'`
	[ `ifconfig $INTF | grep "status: inactive" | wc -l` -gt 0 ] && break
        INTERFACES="$INTF $INTERFACES" 
    done

    INTFCOUNT=`echo $INTERFACES | wc -w`
[ $DEBUG -gt 0 ] && echo "Interfaces: There are $INTFCOUNT interfaces: $INTERFACES." >> $DEBUGFILE
    if [ $INTFCOUNT -lt 1 ] 
    then
	echo "NOK - No active network interfaces."
	exit $STATE_CRITICAL
    fi
}

gather_gateway_linux()
{
[ $DEBUG -gt 0 ] && echo "- Starting gather_gateway_linux for interface $1 -" >> $DEBUGFILE
    MASKBIN=""
    MASK=`ifconfig $1 | grep Mask | awk '{print $4}' | awk -F: '{print $2}'` 
    for PART in `echo $MASK | awk -F. '{print $1" "$2" "$3" "$4}'`
    do
        MASKBIN="$MASKBIN`convert_base $PART 2  | awk -F# '{print $2}'`"
    done
[ $DEBUG -gt 0 ] && echo "Mask: $MASK $MASKBIN" >> $DEBUGFILE

        BITCOUNT=`echo $MASKBIN | grep -o 1 | wc -l | awk '{print $1}'`

[ $DEBUG -gt 0 ] && echo "Bitcount: $BITCOUNT" >> $DEBUGFILE

    IPBIN=""
    IP=`ifconfig $1 | grep "inet addr" | awk '{print $2}' | awk -F: '{print $2}'` 
    for PART in `echo $IP | awk -F. '{print $1" "$2" "$3" "$4}'`
    do
        TEMPBIN=`convert_base $PART 2 | awk -F# '{print $2}'`
        TEMPCOUNT=`echo $TEMPBIN | wc -c | awk '{print $1}'`
        let PADDING=9-$TEMPCOUNT
        i=1
        while ((i <= $PADDING));
        do
            IPBIN=$IPBIN"0" 
            let i=$i+1
        done
        IPBIN=$IPBIN$TEMPBIN
    done
[ $DEBUG -gt 0 ] && echo "IP address: $IP $IPBIN" >> $DEBUGFILE

    CUT="1-$BITCOUNT"
[ $DEBUG -gt 0 ] && echo "Cutting: Cutting chars $CUT" >> $DEBUGFILE
    NETBIN=`echo $IPBIN | cut -c $CUT`
[ $DEBUG -gt 0 ] && echo "Netbin: $NETBIN" >> $DEBUGFILE
    ROUTER=`subnet_router $NETBIN`
[ $DEBUG -gt 0 ] && echo "Router: $ROUTER" >> $DEBUGFILE
    echo $ROUTER
}

gather_gateway_darwin()
{
[ $DEBUG -gt 0 ] && echo "- Starting gath_gateway_darwin for interface $1 -" >> $DEBUGFILE
    MASKBIN=""
    [ `uname` == "Darwin" ] && MASK=`ifconfig $1 | grep netmask | awk '{print $4}' | awk -Fx '{print $2}'`
    [ `uname` == "SunOS" ] && MASK=`ifconfig $1 | grep netmask | awk '{print $4}'`
    for PART in `echo 1 3 5 7`
    do
	let PLUSPART=$PART+1
	MASKPART=`echo $MASK | cut -c $PART-$PLUSPART`
        MASKBIN="$MASKBIN`convert_base 16#$MASKPART 2  | awk -F# '{print $2}'`"
    done
[ $DEBUG -gt 0 ] && echo "Mask: $MASK $MASKBIN" >> $DEBUGFILE

    BITCOUNT=`echo $MASKBIN | grep -o 1 | wc -l | awk '{print $1}'`
[ $DEBUG -gt 0 ] && echo "Bitcount: $BITCOUNT" >> $DEBUGFILE

    IPBIN=""
    IP=`ifconfig $1 | grep "inet " | awk '{print $2}'`
    for PART in `echo $IP | awk -F. '{print $1" "$2" "$3" "$4}'`
    do
        TEMPBIN=`convert_base $PART 2 | awk -F# '{print $2}'`
        TEMPCOUNT=`echo $TEMPBIN | wc -c | awk '{print $1}'`
        let PADDING=9-$TEMPCOUNT
        i=1
        while ((i <= $PADDING));
        do
            TEMPBIN="0"$TEMPBIN
            let i=$i+1
        done
        IPBIN=$IPBIN$TEMPBIN
    done
[ $DEBUG -gt 0 ] && echo "IP address: $IP $IPBIN" >> $DEBUGFILE

    CUT="1-$BITCOUNT"
[ $DEBUG -gt 0 ] && echo "Cutting: cutting chars $CUT" >> $DEBUGFILE
    NETBIN=`echo $IPBIN | cut -c $CUT`
[ $DEBUG -gt 0 ] && echo "Netbin: $NETBIN" >> $DEBUGFILE
    ROUTER=`subnet_router $NETBIN`
[ $DEBUG -gt 0 ] && echo "Router: $ROUTER" >> $DEBUGFILE
    echo $ROUTER
}

gather_gateway_sunos()
{
[ $DEBUG -gt 0 ] && echo "- Starting gath_gateway_solaris for interface $1 -" >> $DEBUGFILE
    MASKBIN=""
    [ `uname` == "Darwin" ] && MASK=`ifconfig $1 | grep netmask | awk '{print $4}' | awk -Fx '{print $2}'`
    [ `uname` == "SunOS" ] && MASK=`ifconfig $1 | grep netmask | awk '{print $4}'`
    for PART in `echo 1 3 5 7`
    do
        let PLUSPART=$PART+1
        MASKPART=`echo $MASK | cut -c $PART-$PLUSPART`
        MASKBIN="$MASKBIN`convert_base 16#$MASKPART 2  | awk -F# '{print $2}'`"
    done
[ $DEBUG -gt 0 ] && echo "Mask: $MASK $MASKBIN" >> $DEBUGFILE

# This piece of kludge also requires that all tabs are removed from the beginning of each line.
# Additional character needed to trick the counter below
# Shitty thing is that it doesn't work. Stupid "let" aryth engine...
#MASKBIN="$MASKBIN-"
#[ $DEBUG -gt 0 ] && echo "Bitcount: kludged binmask is $MASKBIN" >> $DEBUGFILE
#
#IFS="1"
#read TEMP << EOT
#echo $MASKBIN
#EOT
#let "BITCOUNT=(${#TEMP[@]} - 1)"
#IFS=" "

# The kludge above was replaced by this one line of Perl. 

    BITCOUNT=`echo $MASKBIN | perl -ne 'while(/1/g){++$count}; print "$count"'`
[ $DEBUG -gt 0 ] && echo "Bitcount: $BITCOUNT" >> $DEBUGFILE

    IPBIN=""
    IP=`ifconfig $1 | grep "inet " | awk '{print $2}'`
    for PART in `echo $IP | awk -F. '{print $1" "$2" "$3" "$4}'`
    do
[ $DEBUG -gt 0 ] && echo "IP part: converting part $PART" >> $DEBUGFILE
        TEMPBIN=`convert_base $PART 2 | awk -F# '{print $2}'`
[ $DEBUG -gt 0 ] && echo "IP part: converted part is $TEMPBIN" >> $DEBUGFILE
        TEMPCOUNT=`echo $TEMPBIN | wc -c | awk '{print $1}'`
[ $DEBUG -gt 0 ] && echo "IP part: this part is $TEMPCOUNT chars long." >> $DEBUGFILE
        let PADDING=9-$TEMPCOUNT
[ $DEBUG -gt 0 ] && echo "IP part: will be padded with $PADDING zeroes" >> $DEBUGFILE
        i=1
        while ((i <= $PADDING));
        do
            TEMPBIN="0"$TEMPBIN
            let i=$i+1
        done
        IPBIN=$IPBIN$TEMPBIN
    done
[ $DEBUG -gt 0 ] && echo "IP address: $IP $IPBIN" >> $DEBUGFILE

    CUT="1-$BITCOUNT"
[ $DEBUG -gt 0 ] && echo "Cutting: cutting chars $CUT" >> $DEBUGFILE
    NETBIN=`echo $IPBIN | cut -c $CUT`
[ $DEBUG -gt 0 ] && echo "Netbin: $NETBIN" >> $DEBUGFILE
    ROUTER=`subnet_router $NETBIN`
[ $DEBUG -gt 0 ] && echo "Router: $ROUTER" >> $DEBUGFILE
    echo $ROUTER
}

check_miitool()
{
[ $DEBUG -gt 0 ] && echo "- Starting check_miitool -" >> $DEBUGFILE
    COUNT="0"
    for INTF in `echo $INTERFACES`
    do
        [ `sudo /sbin/mii-tool $INTF | head -1 | grep -c ok` -gt 0 ] || let COUNT=$COUNT+1
        [ `sudo /sbin/mii-tool $INTF | head -1 | grep -c 100baseTx-FD` -gt 0 ] || let COUNT=$COUNT+1
        [ `sudo /sbin/mii-tool $INTF | head -1 | grep -c 1000baseTx-FD` -gt 0 ] || let COUNT=$COUNT+1
    done

    [ $COUNT -gt $INTFCOUNT ] && (echo "NOK - Problem with one of the interfaces"; exit $STATE_CRITICAL)
}

check_ping()
{
[ $DEBUG -gt 0 ] && echo "- Starting check_ping -" >> $DEBUGFILE
    INTF=""
    for INTF in `echo $INTERFACES`
    do
	case `uname` in
	    Linux) GATEWAY=`gather_gateway_linux $INTF`;;
	    Darwin) GATEWAY=`gather_gateway_darwin $INTF`;;
	    SunOS) GATEWAY=`gather_gateway_sunos $INTF`;;
	    *) echo "OS not supported by this check."; exit 1;;
	esac
[ $DEBUG -gt 0 ] && echo "Gateway: $GATEWAY" >> $DEBUGFILE

 	ping -c 3 $GATEWAY >/dev/null 2>&1
        if [ $? -gt 0 ] 
        then
            echo "NOK - Problem pinging gateway $GATEWAY"; exit $STATE_CRITICAL
        fi
    done
}


### THE MAIN ROUTINE FINALLY STARTS ###

case `uname` in
            Linux) gather_interfaces_linux;;
            Darwin) gather_interfaces_darwin;;
            #SunOS) gather_interfaces_sunos;;
            SunOS) gather_interfaces_linux;;
            *) echo "OS not supported by this check."; exit 1;;
        esac

[ $MIITOOL -eq 1 ] && check_miitool

check_ping

# None of the other subroutines forced us to exit 1 before here, so let's quit with a 0.
echo "OK - Everything running like it should"
exit $STATE_OK


kilala.nl tags: , , ,

View or add comments (curr. 0)

Nagios script: check_nfs_stale

2006-06-01 00:00:00

This script was written at the time I was hired by KPN i-Diensten. It is reproduced/shared here with their permission.

There really isn't much to say... This script is so fscking basic that it shames me to even put it up here among all the other projects


#!/usr/bin/bash
#
# NFS stale mounts monitor plugin for Nagios
# Written by Thomas Sluyter (nagiosATkilalaDOTnl)
# By request of KPN-IS, i-Provide, the Netherlands
# Last Modified: 13-07-2006
# 
# Usage: ./check_nfs_stale
#
# Description:
# This script couldn't be simpler than it is. It just checks to see
# whether there are any stale NFS mounts present on the system. 
#
# Limitations:
#   This script should work properly on all implementations of Linux, Solaris
# and Mac OS X.
#
# Output:
# If there are stale NFS mounts, a CRIT is issued.
#

# You may have to change this, depending on where you installed your
# Nagios plugins
PROGNAME="check_nfs_stale"
PATH="/usr/bin:/usr/sbin:/bin:/sbin"
LIBEXEC="/usr/local/nagios/libexec"
. $LIBEXEC/utils.sh


### REQUISITE NAGIOS COMMAND LINE STUFF ###

print_usage() {
	echo "Usage: $PROGNAME"
	echo "Usage: $PROGNAME --help"
}

print_help() {
	echo ""
	print_usage
	echo ""
	echo "NFS stale mounts monitor plugin for Nagios"
	echo ""
	echo "This plugin not developped by the Nagios Plugin group."
	echo "Please do not e-mail them for support on this plugin, since"
	echo "they won't know what you're talking about :P"
	echo ""
	echo "For contact info, read the plugin itself..."
}

while test -n "$1" 
do
	case "$1" in
	  --help) print_help; exit $STATE_OK;;
	  -h) print_help; exit $STATE_OK;;
	  *) print_usage; exit $STATE_UNKNOWN;;
	esac
done

[ `df -k | grep "Stale NFS file handle" | wc -l` -gt 0 ] && (echo "NOK - Stale NFS mounts."; exit $STATE_CRITICAL)

# Nothing caused us to exit early, so we're okay.
echo "OK - No stale NFS mounts."
exit $STATE_OK


kilala.nl tags: , , ,

View or add comments (curr. 0)

Nagios script: check_nsca

2006-06-01 00:00:00

This script was written at the time I was hired by KPN i-Diensten. It is reproduced/shared here with their permission.

At $CLIENT we've often run into problems with the NSCA daemon, where the daemon would not crash per se, but where it would also not process incoming service checks. The nsca process was still running, but it simply wasn't transferring the incoming results to the Nagios command file.

I was amazed to find that nobody else had written a script to do this! So I quickly wrote one.


#!/usr/bin/bash
#
# NSCA Nagios service results monitor plugin for Nagios
# Written by Thomas Sluyter (nagiosATkilalaDOTnl)
# By request of KPN-IS, i-Provide, the Netherlands
# Last Modified: 16-08-2006
# 
# Usage: ./check_nsca
#
# Description:
# Aside from checking whether the NSCA process is still running, this script
# also attempts to insert a message into the Nagios queue. After sending a 
# message to the NSCA daemon, it will verify that the message is received by
# Nagios, by checking the nagios.log file. 
#
# Limitations:
#   This script should work properly on all implementations of Linux, Solaris
# and Mac OS X.
#
# Output:
# If the NSCA daemon, or something along the message path, is borked, a 
# CRIT message will be issued. 
#

# You may have to change this, depending on where you installed your
# Nagios plugins
PROGNAME="check_nsca"
PATH="/usr/bin:/usr/sbin:/bin:/sbin"

NAGIOSHOME="/usr/local/nagios"
LIBEXEC="$NAGIOSHOME/libexec"
NAGVAR="$NAGIOSHOME/var"
NAGBIN="$NAGIOSHOME/bin"
NAGETC="$NAGIOSHOME/etc"

. $LIBEXEC/utils.sh


### REQUISITE NAGIOS COMMAND LINE STUFF ###

print_usage() {
	echo "Usage: $PROGNAME"
	echo "Usage: $PROGNAME --help"
}

print_help() {
	echo ""
	print_usage
	echo ""
	echo "NSCA Nagios service results monitor plugin for Nagios"
	echo ""
	echo "This plugin not developped by the Nagios Plugin group."
	echo "Please do not e-mail them for support on this plugin, since"
	echo "they won't know what you're talking about :P"
	echo ""
	echo "For contact info, read the plugin itself..."
}

while test -n "$1" 
do
	case "$1" in
	  --help) print_help; exit $STATE_OK;;
	  -h) print_help; exit $STATE_OK;;
	  *) print_usage; exit $STATE_UNKNOWN;;
	esac
done


### PLATFORM INDEPENDENCE ###

case `uname` in
	Linux) PSLIST="ps -ef";;
	SunOS) PSLIST="ps -ef";;
	Darwin) PSLIST="ps -ajx";;
	*) ;;
    esac


### CHECKING FOR THE NSCA PROCESS ###

[ `$PSLIST | grep nsca | grep -v grep | wc -l` -lt 1 ] && (echo "NSCA process not running."; exit $STATE_CRITICAL)


### INSERTING A TEST MESSAGE ###

DATE=`date +%Y%m%d%H%M`
STRING="`hostname`\tFOOBAR\t0\t$DATE This is a test of the emergency broadcast system.\n"

echo -e "$STRING" | $NAGBIN/send_nsca -H localhost -c $NAGETC/send_nsca.cfg >/dev/null 2>&1


### CHECKING THE NAGIOS LOG FILE ###

sleep 10

if [ `tail -1000 $NAGVAR/nagios.log | grep "emergency broadcast system" | grep $DATE | wc -l` -lt 1 ] 
then
	# Giving it a second try
	sleep 10
	if [ `tail -5000 $NAGVAR/nagios.log | grep "emergency broadcast system" | grep $DATE | wc -l` -lt 1 ]	
	then
		echo "NSCA daemon not processing check results."
		exit $STATE_CRITICAL
	fi
fi


### EXITING NORMALLY ###

echo "OK - NSCA working like it should."
exit $STATE_OK


kilala.nl tags: , , ,

View or add comments (curr. 2)

Nagios script: check_ntp_config

2006-06-01 00:00:00

This script was written at the time I was hired by KPN i-Diensten. It is reproduced/shared here with their permission.

As far as I know there was no Nagios plugin that allowed you to really check your client configuration. I mean, it would be nice to know for sure that all your systems are syncing against the proper server... Wouldn't it?

The script was tested on Redhat ES3, Mac OS X and Solaris. Its basic requirement is the bash shell.

EDIT:

Oh! Just like my other recent Nagios scripts, check_ntp_config comes with a debugging option. Set $DEBUG at the top of the file to anything larger than zero and the script will dump information at various stages of its execution.



#!/usr/bin/bash
#
# CPU load monitor plugin for Nagios
# Written by Thomas Sluyter (nagiosATkilalaDOTnl)
# By request of KPN-IS, i-Provide, the Netherlands
# Last Modified: 10-07-2006
# 
# Usage: ./check_ntp_config
#
# Description:
#   Well, there's not much to tell. We have no way of making sure that our 
# NTP clients are all configured in the right way, so I thought I'd make
# a Nagios check for it. ^_^ 
#   You can change the NTP config at the top of this script, to match your
# own situation.
#
# Limitations:
#   This script should work properly on all implementations of Linux, Solaris
# and Mac OS X.
#
# Output:
#   If the NTP client config does not match what has been defined at the 
# top of this script, the script will return a WARN.
#
# Other notes:
#   If you ever run into problems with the script, set the DEBUG variable
# to 1. I'll need the output the script generates to do troubleshooting.
# See below for details.
#   I realise that all the debugging commands strewn throughout the script
# may make things a little harder to read. But in the end I'm sure it was
# well worth adding them. It makes troubleshooting so much easier. :3
#

# You may have to change this, depending on where you installed your
# Nagios plugins
PATH="/usr/bin:/usr/sbin:/bin:/sbin"
LIBEXEC="/usr/local/nagios/libexec"
. $LIBEXEC/utils.sh


### DEFINING THE NTP CLIENT CONFIGURATION AS IT SHOULD BE ###
NTP_SERVER="ntp.wxs.nl"


### DEBUGGING SETUP ###
# Cause you never know when you'll need to squash a bug or two
DEBUG="0"

if [ $DEBUG -gt 0 ]
then
        DEBUGFILE="/tmp/foobar"
        rm $DEBUGFILE >/dev/null 2>&1
fi


### REQUISITE NAGIOS COMMAND LINE STUFF ###

print_usage() {
	echo "Usage: $PROGNAME"
	echo "Usage: $PROGNAME --help"
}

print_help() {
	echo ""
	print_usage
	echo ""
	echo "NTP client configuration monitor plugin for Nagios"
	echo ""
	echo "This plugin not developped by the Nagios Plugin group."
	echo "Please do not e-mail them for support on this plugin, since"
	echo "they won't know what you're talking about :P"
	echo ""
	echo "For contact info, read the plugin itself..."
}

while test -n "$1" 
do
	case "$1" in
	  --help) print_help; exit $STATE_OK;;
	  -h) print_help; exit $STATE_OK;;
	  *) print_usage; exit $STATE_UNKNOWN;;
	esac
done


### DEFINING SUBROUTINES ###

function gather_config()
{
    case `uname` in
	Linux) CFGFILE="/etc/ntp.conf"; IP_SERVER=`host $NTP_SERVER | awk '{print $4}'` ;;
	SunOS) CFGFILE="/etc/inet/ntpd.conf"; IP_SERVER=`getent hosts $NTP_SERVER | awk '{print $2}'`;;
	Darwin) CFGFILE="/etc/ntp.conf"; IP_SERVER=`host $NTP_SERVER | awk '{print $4}'` ;;
	*) ;;
    esac

    REAL_SERVER=`cat $CFGFILE | grep ^server | awk '{print $2}'`

[ $DEBUG -gt 0 ] && echo "Gather_config: Host name for required server is $NTP_SERVER." >> $DEBUGFILE
[ $DEBUG -gt 0 ] && echo "Gather_config: IP address for required server is $IP_SERVER." >> $DEBUGFILE
[ $DEBUG -gt 0 ] && echo "Gather_config: currently configured server is $REAL_SERVER." >> $DEBUGFILE
} 

function check_config()
{
    if [ $REAL_SERVER != $NTP_SERVER ]
    then
	if [ $REAL_SERVER != $IP_SERVER ]
	then
	    echo "NOK - NTP client is not configured to speak to $NTP_SERVER"
	    exit $STATE_WARNING
     	fi
    fi
}


### FINALLY, THE MAIN ROUTINE ###

gather_config
check_config

# Nothing caused us to exit early, so we're okay.
echo "OK - NTP client configured correctly."
exit $STATE_OK


kilala.nl tags: , , ,

View or add comments (curr. 0)

Nagios script: check_processes

2006-06-01 00:00:00

This script was written at the time I was hired by KPN i-Diensten. It is reproduced/shared here with their permission.

A very simply script that takes a list of processes, instead of a single processes name (as is the case with check_process). This should make monitoring a basic list of processes a lot easier. I really should change the script in such a way that it takes the process list from the command line, instead of from the $LIST variable that's defined internally. I'll do that when I have the time.

Until I've made those change, I use the script by copying check_processes to a new file which is used specifically for one purpose. For example check_linux_processes and check_solaris_processes check a list of processes that should be up and running on Linux and Solaris respectively.

This check script should work on just about any UNIX OS.



#!/bin/bash
#
# Process monitor plugin for Nagios
# Written by Thomas Sluyter (nagiosATkilalaDOTnl)
# By request of KPN-IS, i-Provide, the Netherlands
# Last Modified: 13-07-2006
# 
# Usage: ./check_solaris_processes
#
# Description:
# This script couldn't be simpler than it is. It just checks to see
# whether a predefined list of processes is up and running. 
#
# Limitations:
#   This script should work properly on all implementations of Linux, Solaris
# and Mac OS X.
#
# Output:
# If there one of the processes is down, a CRIT is issued.
#

# You may have to change this, depending on where you installed your
# Nagios plugins
PROGNAME="check_linux_processes"
PATH="/usr/bin:/usr/sbin:/bin:/sbin"
LIBEXEC="/usr/local/nagios/libexec"
. $LIBEXEC/utils.sh


### DEFINING THE PROCESS LIST ###
LIST="init"


### REQUISITE NAGIOS COMMAND LINE STUFF ###

print_usage() {
        echo "Usage: $PROGNAME"
        echo "Usage: $PROGNAME --help"
}

print_help() {
        echo ""
        print_usage
        echo ""
        echo "Basic processes list monitor plugin for Nagios"
        echo ""
        echo "This plugin not developped by the Nagios Plugin group."
        echo "Please do not e-mail them for support on this plugin, since"
        echo "they won't know what you're talking about :P"
        echo ""
        echo "For contact info, read the plugin itself..."
}

while test -n "$1" 
do
        case "$1" in
          --help) print_help; exit $STATE_OK;;
          -h) print_help; exit $STATE_OK;;
          *) print_usage; exit $STATE_UNKNOWN;;
        esac
done


### FINALLY THE MAIN ROUTINE ###

COUNT="0"
DOWN=""

for PROCESS in `echo $LIST`
do
        if [ `ps -ef | grep -i $PROCESS | grep -v grep | wc -l` -lt 1 ]
        then
                let COUNT=$COUNT+1
                DOWN="$DOWN $PROCESS"
        fi
done

if [ $COUNT -gt 0 ]
then
        echo "NOK - $COUNT processes not running: $DOWN"
        exit $STATE_CRITICAL
fi

# Nothing caused us to exit early, so we're okay.
echo "OK - All requisite processes running."
exit $STATE_OK


kilala.nl tags: , , ,

View or add comments (curr. 2)

Nagios script: check_processes

2006-06-01 00:00:00

This script was written at the time I was hired by UPC / Liberty Global.

Basic monitor to check percentage of used physical RAM.

This script was quickly hacked together for my current customer, as a Q&D solution for their monitoring needs. It's no beauty, but it works. Written in ksh and tested with:

UPDATE 19/06/2006:

Cleaned up the script a bit and added some checks that are considered the Right Thing to do. Should have done this -way- earlier!

I've also -finally- changed the script so that it takes the Warning and Critical percentages from the command line.

UPDATE 15/07/2006:

Whoops... I just noticed that the file had gone missing <3



#!/bin/ksh
#
# Free physical RAM monitor plugin for Nagios
# Written by Thomas Sluyter (nagiosATkilalaDOTnl)
# By request of DTV Labs, Liberty Global, the Netherlands
# Last Modified: 20-10-2006
# 
# Usage: ./check_ram
#
# Description:
# This plugin determines how much of the physical RAM in the 
# system is in use.
#
# Limitations:
# Currently this plugin will only function correctly on Solaris systems.
# And it really is only usefull at DTV Labs.
#
# Output:
# The script returns either a WARN or a CRIT, depending on the 
# percentage of free physical memory.
#

# Enabling the following dumps information into DEBUGFILE at various
# stages during the execution of this script.
DEBUG="1"
DEBUGFILE="/tmp/foobar"
rm $DEBUGFILE >/dev/null 2>&1
echo "Starting script check_ram." > $DEBUGFILE

# Host OS check and warning message
if [ `uname` != "SunOS" ]
then
        echo "WARNING:"
        echo "This script was originally written for use on Solaris."
        echo "You may run into some problems running it on this host."
        echo ""
        echo "Please verify that the script works before using it in a"
        echo "live environment. You can easily disable this message after"
        echo "testing the script."
        echo ""
        exit 1
fi

# You may have to change this, depending on where you installed your
# Nagios plugins
PATH="/usr/bin:/usr/sbin:/bin:/usr/local/bin:/sbin"
LIBEXEC="/usr/local/nagios/libexec"
. $LIBEXEC/utils.sh

print_usage() {
        echo "Usage: $PROGNAME warning-percentage critical-percentage"
        echo ""
        echo "e.g. : $PROGNAME 15 5"
        echo "This will start alerting when more than 85% of RAM has"
        echo "been used."
        echo ""
}

print_help() {
        echo ""
        print_usage
        echo ""
        echo "Free physical RAM plugin for Nagios"
        echo ""
        echo "This plugin not developped by the Nagios Plugin group."
        echo "Please do not e-mail them for support on this plugin, since"
        echo "they won't know what you're talking about :P"
        echo ""
        echo "For contact info, read the plugin itself..."
}

if [ $# -lt 2 ]; then print_help; exit $STATE_WARNING;fi

case "$1" in
        --help) print_help; exit $STATE_OK;;
        -h) print_help; exit $STATE_OK;;
        *) if [  $# -lt 2 ]; then print_help; exit $STATE_WARNING;fi ;;
esac

RAM_WARN=$1
RAM_CRIT=$2
[ $DEBUG -gt 0 ] && echo "Warning and Critical percentages are $RAM_WARN and $RAM_CRIT." >> $DEBUGFILE

if [ $RAM_WARN -le RAM_CRIT ]
then
        echo "Warning percentage should be larger than critical percentage."
        exit $STATE_WARNING
fi

check_space()
{
[ $DEBUG -gt 0 ] && echo "Starting check_space." >> $DEBUGFILE
        TOTALSPACE=0
        TOTALSPACE=`prtconf | grep ^"Memory size" | awk '{print $3}'`
[ $DEBUG -gt 0 ] && echo "Total space is $TOTALSPACE." >> $DEBUGFILE

        TOTALFREE=0
        TOTALFREE=`vmstat 2 2 | tail -1 | awk '{print $5}'`
[ $DEBUG -gt 0 ] && echo "Free space is $TOTALFREE." >> $DEBUGFILE
        let TOTALFREE=$TOTALFREE/1000
[ $DEBUG -gt 0 ] && echo "Free space, div1000 is $TOTALFREE." >> $DEBUGFILE
}

check_percentile() 
{
[ $DEBUG -gt 0 ] && echo "Starting check_percentile." >> $DEBUGFILE
        FRACTION=`echo "scale=2; $TOTALFREE/$TOTALSPACE" | bc`
[ $DEBUG -gt 0 ] && echo "Fraction is $FRACTION." >> $DEBUGFILE

        PERCENT=`echo "scale=2; $FRACTION*100" | bc | awk -F. '{print $1}'`
[ $DEBUG -gt 0 ] && echo "Percentile is $PERCENT." >> $DEBUGFILE

        if [ $PERCENT -lt $RAM_CRIT ]; then
[ $DEBUG -gt 0 ] && echo "$PERCENT is smaller than $RAM_CRIT. Critical." >> $DEBUGFILE
          echo "RAM NOK - Less than $RAM_CRIT % of physical RAM is unused."
          exitstatus=$STATE_CRITICAL
          exit $exitstatus
        fi

        if [ $PERCENT -lt $RAM_WARN ]; then
[ $DEBUG -gt 0 ] && echo "$PERCENT is smaller than $RAM_WARN. Warning." >> $DEBUGFILE
          echo "RAM NOK - Less than $RAM_WARN % of physical RAM is unused."
          exitstatus=$STATE_WARNING
          exit $exitstatus
        fi
}

check_space
check_percentile

[ $DEBUG -gt 0 ] && echo "$PERCENT is greater than $RAM_WARN. OK." >> $DEBUGFILE
echo "RAM OK - $TOTALFREE MB out of $TOTALSPACE MB RAM unused."
exitstatus=$STATE_OK
exit $exitstatus



kilala.nl tags: , , ,

View or add comments (curr. 0)

Nagios script: check_suncluster

2006-06-01 00:00:00

This script was written at the time I was hired by KPN i-Diensten. It is reproduced/shared here with their permission.

A few of our projects and services are run on Solaris systems running Sun Cluster software. Since there were no Nagios scripts available to perform checks against Sun Cluster I made a basic script that checks the most important factors.

This script performs a different function, depending on the parameter with which it is called. This allows you to define multiple service checks in Nagios, without needing seperate check scripts for each.

EDIT:

Oh! Just like my other recent Nagios scripts, check_suncluster comes with a debugging option. Set $DEBUG at the top of the file to anything larger than zero and the script will dump information at various stages of its execution. And like my other, recent scripts it also comes with its own test script.



#!/usr/bin/ksh
#
# Nagios check script for Sun Cluster.
# Written by Thomas Sluyter (nagiosATkilalaDOTnl)
# By request of KPN-IS, i-Provide SYS, the Netherlands
# Last Modified: 25-09-2006
#
# Usage: ./check_suncluster [-t, -q, -g, -G resource-group, -r, -R resource, -i]
#
# Description:
# This script is capable of performing a number of basic checks on a 
# system running Sun Cluster. Depending on the parameter you pass to 
# it, it will check:
# * Transport paths (-t).
# * Quorum (-q).
# * Resource groups (-g).
# * One selected resource group (-G).
# * Resources (-r).
# * One selected resource (-R).
# * IPMP groups (-i).
#
# Limitations:
# This script will only work with Korn shell, due to some funky while
# looping with pipe forking. Bash doesn't handle this very gracefully,
# due to its sub-shell variable scoping. Maybe I really should learn
# to program in Perl.   
#
# Output:
# * Transport paths return a WARN when one of the paths is down and a
#   CRIT when all paths are offline. 
# * Quorum returns a WARN when not all, but enough quorum devices are
#   available. It returns a CRIT when quorum cannot be reached.
# * Resource groups returns a CRIT when a group is offline on all nodes
#   and a WARN if a group is in an unstable state.
# * Resources returns a CRIT when a resource is offline on all nodes
#   and a WARN if a resource is in an unstable state.
# * IPMP groups returns a CRIT when a group is offline.
#
# Other notes:
# Aside from the debugging output that I've built into most of my recent
# scripts, this check script will also have a testing mode  hacked on, as
# a bag on the side. This testing mode is only engaged when the test_check_suncluster
# script is being run and will intentionally "break" a few things, to 
# verify the failure options of this check script.
#

# Enabling the following dumps information into DEBUGFILE at various
# stages during the execution of this script.
DEBUG=0
DEBUGFILE="/tmp/foobar"

if [ -f /tmp/neko-wa-baka ]
then
	if [ `cat /tmp/neko-wa-baka` == "Nyo!" ]
	then
	   TESTING="1"
	else
	   TESTING="0"
	fi
else
	TESTING="0"
fi


### REQUISITE NAGIOS USER INTERFACE STUFF ###

# You may have to change this, depending on where you installed your
# Nagios plugins
PATH="/usr/bin:/usr/sbin:/bin:/sbin:/usr/cluster/bin"
LIBEXEC="/usr/local/nagios/libexec"
PROGNAME="check_suncluster"
. $LIBEXEC/utils.sh

[ $DEBUG -gt 0 ] && rm $DEBUGFILE 

print_usage() {
        echo "Usage: $PROGNAME [-t, -q, -g, -G resource-group, -r, -R resource, -i]"
        echo "Usage: $PROGNAME --help"
}

print_help() {
        echo ""
        print_usage
        echo ""
        echo "Sun Cluster check plugin for Nagios"
        echo ""
        echo "-t: check transport paths"
        echo "-q: check quorum"
        echo "-g: check resource groups"
        echo "-G: check one individual resource group"
        echo "-r: check all resources"
        echo "-R: check one individual resources"
        echo "-i: check IPMP groups"
        echo ""
        echo "This plugin not developped by the Nagios Plugin group."
        echo "Please do not e-mail them for support on this plugin, since"
        echo "they won't know what you're talking about :P"
        echo ""
        echo "For contact info, read the plugin itself..."
}


### SUB-ROUTINE DEFINITIONS ### 

function check_transport_paths
{
[ $DEBUG -gt 0 ] && echo "Starting check_transport_path subroutine." >> $DEBUGFILE

	TOTAL=`scstat -W | grep "Transport path:" | wc -l`
	let COUNT=0

	scstat -W | grep "Transport path:" | awk '{print $3" "$6}' | while read PATH STATUS
	do
[ $DEBUG -gt 0 ] && echo "Before math, Count has the value of $COUNT." >> $DEBUGFILE
		if [ $STATUS == "online" ]
		then
		   let COUNT=$COUNT+1
		fi
[ $DEBUG -gt 0 ] && echo "Path: $PATH has status $STATUS" >> $DEBUGFILE
[ $DEBUG -gt 0 ] && echo "Count: $COUNT online transport paths." >> $DEBUGFILE
	done

[ $DEBUG -gt 0 ] && echo "Count: Outside the loop it has a value of $COUNT." >> $DEBUGFILE
[ $TESTING -gt 0 ] && COUNT="0"

	if [ $COUNT -lt 1 ]
	then
	   echo "NOK - No transport paths online."
	   exit $STATE_CRITICAL
	elif [ $COUNT -lt $TOTAL ]
	then
	   echo "NOK - One or more transport paths offline."
	   exit $STATE_WARNING
	fi
}

function check_quorum
{
[ $DEBUG -gt 0 ] && echo "Starting check_quorum subroutine." >> $DEBUGFILE
	NEED=`scstat -q | grep "votes needed:" | awk '{print $4}'`
	PRES=`scstat -q | grep "votes present:" | awk '{print $4}'`

[ $DEBUG -gt 0 ] && echo "Quorum needed: $NEED" >> $DEBUGFILE
[ $DEBUG -gt 0 ] && echo "Quorum present: $PRES" >> $DEBUGFILE

[ $TESTING -gt 0 ] && PRES="0"
	if [ $PRES -ge $NEED ]
	then
[ $DEBUG -gt 0 ] && echo "Enough quorum votes." >> $DEBUGFILE
		scstat -q | grep "votes:" | awk '{print $3" "$6}' | while read VOTE STATUS
		do
[ $DEBUG -gt 0 ] && echo "Vote: $VOTE has status $STATUS." >> $DEBUGFILE
			if [ $STATUS != "Online" ] 
			then
			   echo "NOK - Quorum vote $VOTE not available."
			   exit $STATE_WARNING
			fi
		done		
	else
[ $DEBUG -gt 0 ] && echo "Not enough quorum." >> $DEBUGFILE
		echo "NOK - Not enough quorum votes present."
		exit $STATE_CRITICAL
	fi
}

function check_resource_groups
{
[ $DEBUG -gt 0 ] && echo "Starting check_resource_groups subroutine." >> $DEBUGFILE
	scstat -g | grep "Group:" | awk '{print $2}' | sort -u | while read GROUP
	do
	ONLINE=`scstat -g | grep "Group: $GROUP" | grep "Online" | wc -l`
	WEIRD=`scstat -g | grep "Group: $GROUP" | grep -v "Resources" | grep -v "Online" | grep -v "Offline" | wc -l`
[ $DEBUG -gt 0 ] && echo "Resource Group $GROUP has $ONLINE instances online." >> $DEBUGFILE
[ $DEBUG -gt 0 ] && echo "Resource Group $GROUP has $WEIRD instances in a weird state." >> $DEBUGFILE
[ $TESTING -gt 0 ] && ONLINE="0"
		if [ $ONLINE -lt 1 ] 
		then
		   echo "NOK - Resource group $GROUP not online."
		   exit $STATE_CRITICAL
		fi
                if [ $WEIRD -gt 1 ]
                then
                   echo "NOK - Resource group $GROUP is an unstable state."
                   exit $STATE_WARNING
                fi
	done
}

function check_resource_grp
{
[ $DEBUG -gt 0 ] && echo "Starting check_resource_grp subroutine." >> $DEBUGFILE
[ $DEBUG -gt 0 ] && echo "Selected group: $RGROUP" >> $DEBUGFILE
	ONLINE=`scstat -g | grep $RGROUP | grep "Online" | wc -l`
	WEIRD=`scstat -g | grep $RGROUP | grep -v "Resources" | grep -v "Online" | grep -v "Offline" | wc -l`
[ $DEBUG -gt 0 ] && echo "Resource Group $GROUP has $ONLINE instances online." >> $DEBUGFILE
[ $DEBUG -gt 0 ] && echo "Resource Group $GROUP has $WEIRD instances in a weird state." >> $DEBUGFILE
[ $TESTING -gt 0 ] && ONLINE="0"
	if [ $ONLINE -lt 1 ] 
	then
	   echo "NOK - Resource group $RGROUP not online."
	   exit $STATE_CRITICAL
	fi
	if [ $WEIRD -gt 1 ]
        then
           echo "NOK - Resource group $RGROUP is in an unstable state."
           exit $STATE_WARNING
        fi
}

function check_resources
{
[ $DEBUG -gt 0 ] && echo "Starting check_resources subroutine." >> $DEBUGFILE
	RESOURCES=`scstat -g | grep "Resource:" | awk '{print $2}' | sort -u`
[ $DEBUG -gt 0 ] && echo "List of resources to check: $RESOURCES" >> $DEBUGFILE
	for RESOURCE in `echo $RESOURCES`
	do
	ONLINE=`scstat -g | grep "Resource: $RESOURCE" | awk '{print $4}' | grep "Online" | wc -l` 
	WEIRD=`scstat -g | grep "Resource: $RESOURCE" | awk '{print $4}' | grep -v "Online" | grep -v "Offline" | wc -l`
[ $DEBUG -gt 0 ] && echo "Resource $RESOURCE has $ONLINE instances online." >> $DEBUGFILE
[ $DEBUG -gt 0 ] && echo "Resource $RESOURCE has $WEIRD instances in a weird state." >> $DEBUGFILE
[ $TESTING -gt 0 ] && ONLINE="0"
		if [ $ONLINE -lt 1 ] 
		then
		   echo "NOK - Resource $RESOURCE not online."
		   exit $STATE_CRITICAL
		fi
                if [ $WEIRD -gt 1 ]
                then
                   echo "NOK - Resource $RESOURCE is in an unstable state."
                   exit $STATE_WARNING
                fi
	done
}

function check_rsrce
{
[ $DEBUG -gt 0 ] && echo "Starting check_rsrce subroutine." >> $DEBUGFILE
[ $DEBUG -gt 0 ] && echo "Selected resource: $RSRCE" >> $DEBUGFILE
	ONLINE=`scstat -g | grep "Resource: $RSRCE" | awk '{print $4}' | grep "Online" | wc -l`
	WEIRD=`scstat -g | grep "Resource: $RSRCE" | awk '{print $4}' | grep -v "Online" | grep -v "Offline" | wc -l`
[ $DEBUG -gt 0 ] && echo "Resource $RESOURCE has $ONLINE instances online." >> $DEBUGFILE
[ $DEBUG -gt 0 ] && echo "Resource $RESOURCE has $WEIRD instances in a weird state." >> $DEBUGFILE
[ $TESTING -gt 0 ] && ONLINE="0"
	if [ $ONLINE -lt 1 ] 
	then
	   echo "NOK - Resource $RESOURCE not online."
	   exit $STATE_CRITICAL
	fi
	if [ $WEIRD -gt 1 ]
        then
           echo "NOK - Resource $RESOURCE is in an unstable state."
           exit $STATE_WARNING
        fi
}

function check_ipmp
{
[ $DEBUG -gt 0 ] && echo "Starting check_ipmp subroutine." >> $DEBUGFILE
	scstat -i | grep "IPMP Group:" | awk '{print $3" "$5}' | while read GROUP STATUS
	do
[ $DEBUG -gt 0 ] && echo "IPMP Group: $GROUP has status $STATUS" >> $DEBUGFILE
		if [ $STATUS != "Online" ] 
		then
		   echo "NOK - IPMP group $GROUP not online."
		   exit $STATE_CRITICAL
		fi
if [ $TESTING -gt 0 ]
then
   echo "NOK - IPMP group $GROUP not online."
   exit $STATE_CRITICAL
fi
	done
}

### THE MAIN ROUTINE FINALLY STARTS ###

[ $DEBUG -gt 0 ] && echo "Starting main routine." >> $DEBUGFILE

if [ $# -lt 1 ]
then
	print_usage
	exit $STATE_UNKNOWN
fi

[ $DEBUG -gt 0 ] && echo "More than one argument." >> $DEBUGFILE
[ $DEBUG -gt 0 ] && echo "" >> $DEBUGFILE

case "$1" in
	--help) print_help; exit $STATE_OK;;
	-h) print_help; exit $STATE_OK;;
	-t) check_transport_paths;;
	-q) check_quorum;;
	-g) check_resource_groups;;
	-G) RGROUP="$2"; check_resource_grp;;
	-r) check_resources;;
	-R) RSRCE="$2"; check_rsrce;;
	-i) check_ipmp;;
	*) print_usage; exit $STATE_UNKNOWN;;
esac

[ $DEBUG -gt 0 ] && echo "No problems. Exiting normally." >> $DEBUGFILE

# None of the other subroutines forced us to exit 1 before here, so let's quit with a 0.
echo "OK - Everything running like it should"
exit $STATE_OK

#!/usr/bin/bash

function testrun()
{
	echo "Running without parameters."
	/usr/local/nagios/libexec/check_suncluster 
	echo "Exit code is $?."
	echo ""

	echo "Testing transport paths."
	/usr/local/nagios/libexec/check_suncluster -t
	echo "Exit code is $?."
	echo ""

	echo "Quorum votes."
	/usr/local/nagios/libexec/check_suncluster -q
	echo "Exit code is $?."
	echo ""

	echo "Checking all resource groups."
	/usr/local/nagios/libexec/check_suncluster -g
	echo "Exit code is $?."
	echo ""

	echo "Checking individual resource groups."
	for GROUP in `scstat -g | grep "Group:" | awk '{print $2}' | sort -u`
	do
		echo "Running for group $GROUP."
		/usr/local/nagios/libexec/check_suncluster -G $GROUP
		echo "Exit code is $?."
		echo ""
	done

	echo "Checking all resources."
	/usr/local/nagios/libexec/check_suncluster -r
	echo "Exit code is $?."
	echo ""
	
	echo "Checking all resources."
	for RESOURCE in `scstat -g | grep "Resource:" | awk '{print $2}' | sort -u`
	do
		echo "Running for resource $RESOURCE."
		/usr/local/nagios/libexec/check_suncluster -R $RESOURCE
		echo "Exit code is $?."
		echo ""
	done
	
	echo "Checking IPMP groups."
	/usr/local/nagios/libexec/check_suncluster -i
	echo "Exit code is $?."
	echo ""
}

function breakstuff()
{
	# Now we'll start breaking things!!
	echo ""
	echo "Now it's time to start breaking things! Gruaargh!"
	echo "Mind you, it's all fake and simulated. I am not changing -anything-"
	echo "about the cluster itself."
	echo ""
	
	echo "Nyo!" > /tmp/neko-wa-baka 
}

echo "Starting clean"
rm /tmp/neko-wa-baka /tmp/foobar >/dev/null 2>&1
echo ""

testrun
breakstuff
testrun

echo "Starting clean at the end"
rm /tmp/neko-wa-baka  >/dev/null 2>&1
echo ""

kilala.nl tags: , , ,

View or add comments (curr. 2)

Nagios script: retrieve_custom_snmp

2006-06-01 00:00:00

This script was written while I was hired by KPN i-Diensten. It is reproduced/shared here with their permission.

One of the things we've been looking into recently, is running the standard Nagios plugins through SNMP instead of through NRPE. Putting aside the discussion of the various merits and flaws such a solution has, let's say that it works nicely.

How do you do this?

In your snmpd.conf add a line like:

exec .1.3.6.1.4.1.6886.4.1.1 check_load /usr/local/nagios/libexec/check_load

exec .1.3.6.1.4.1.6886.4.1.2 check_mem /usr/local/nagios/libexec/check_mem w 85 c 95

exec .1.3.6.1.4.1.6886.4.1.3 check_swap /usr/local/nagios/libexec/check_swap -w 15% -c 5%

What this does, is tell the SNMP daemon to run the check_load script when someone asks for object .1.3.6.1.4.1.6886.4.1.1 (or .2, or .3). The exit code for the script will be place in OID.100.0 and the first line of output will be placed in OID.101.1. This script retrieves those two values through SNMP and returns them to Nagios.

Your checkcommands.cfg should contain something like:

define command{

command_name retrieve_custom_snmp

command_line $USER1$/retrieve_custom_snmp -H $HOSTADDRESS$ -o $ARG1$ }

The "-o" parameter takes the OID you have selected for your custom check.

Now... How do you select an OID? There's two ways:

1. The WRONG way = randomly selecting some OID. You might pick an OID which is needed for other monitoring purposes in your network.

2. The RIGHT way = requesting a private Enterprise ID for your company at IANA. You are free to build an SNMP tree beneath this EID. For example, the EID 6886 mentioned above is registered to KPN (my current client). The sub-tree .4.1 contains all OIDs referring to Nagios checks performed by my department.

Before sending out that request, please check the current EID list to see if you company already owns a private subtree. If that's the case, contact the "owner" to request your own part of the subtree.

UPDATE (2006-10-02):

Thanks to the kind folks on the Nagios Users ML I've found out that my original version of the script was totally bug-ridden. I've made a big bunch of adjustments and now the script should work properly. Thanks especially to Andreas Ericsson.



#!/bin/bash
#
# Script to retrieve custom SNMP objects set using the "exec" handler
# Written by Thomas Sluyter (nagiosATkilalaDOTnl)
# By request of KPN-IS, i-Provide, the Netherlands
# Last Modified: 18-07-2006
# 
# Usage: ./retrieve_custom_snmp
#
# Description:
#   On our Nagios client systems we use a lot of custom MIB OIDs which are
# registered under our own Enterprise ID. A whole bunch of the 
# original Nagios script are run through the SNMP daemon and their exit
# codes and output are appended to specific OID. This all happens using the
# SNMP "exec" handler.
#   Unfortunately the default check_snmp script doesn't allow for easy 
# handling of these objects, so I hacked together a quick script. 
#
# So basically this script doesn't do any checking. It just retrieves 
# information :)
#
# Limitations:
# This script should work properly on all implementations of Linux, Solaris
# and Mac OS X.
#
# Output:
# The exit code is the exit code retrieved from OID.100.1. It is temporarily
# stored in $EXITCODE.
# The output string is the string retrieved from OID.101.1. It is tempo-
# rarily stored in $OUTPUT.
#
# Other notes:
#   If you ever run into problems with the script, set the DEBUG variable
# to 1. I'll need the output the script generates to do troubleshooting.
# See below for details.
#   I realise that all the debugging commands strewn throughout the script
# may make things a little harder to read. But in the end I'm sure it was
# well worth adding them. It makes troubleshooting so much easier. :3
#   Also, for some reason the case statement with the shifts (to detect
# passed options) doesn't seem to be working right. FIXME!
#
# Check command definition:
# define command{
#       command_name    retrieve_custom_snmp
#       command_line    $USER1$/retrieve_custom_snmp -H $HOSTADDRESS$ -o $ARG1$
#		}
#

# You may have to change this, depending on where you installed your
# Nagios plugins
PATH="/usr/bin:/usr/sbin:/bin:/sbin"
LIBEXEC="/usr/local/nagios/libexec"
. $LIBEXEC/utils.sh
PROGNAME="retrieve_custom_snmp"
COMMUNITY="public"

[ `uname` == "SunOS" ] && SNMPGET="/usr/local/bin/snmpget -Oqv -v 2c -c $COMMUNITY"
[ `uname` == "Darwin" ] && SNMPGET="/usr/bin/snmpget -Oqv -v 2c -c $COMMUNITY"
[ `uname` == "Linux" ] && SNMPGET="/usr/bin/snmpget -Oqv -v 2c -c $COMMUNITY"

### DEBUGGING SETUP ###
# Cause you never know when you'll need to squash a bug or two
DEBUG="0"

if [ $DEBUG -gt 0 ]
then
        DEBUGFILE="/tmp/foobar"
        rm $DEBUGFILE >/dev/null 2>&1
fi


### REQUISITE NAGIOS COMMAND LINE STUFF ###

print_usage() {
	echo "Usage: $PROGNAME -H hostname -o OID"
	echo "Usage: $PROGNAME --help"
}

print_help() {
	echo ""
	print_usage
	echo ""
	echo "Script to retrieve the status for custom SNMP objects."
	echo ""
	echo "This plugin not developped by the Nagios Plugin group."
	echo "Please do not e-mail them for support on this plugin, since"
	echo "they won't know what you're talking about :P"
	echo ""
	echo "For contact info, read the plugin itself..."
}

while test -n "$1"; do
    case "$1" in
        --help)
            print_help
            exit $STATE_OK
            ;;
        -h)
            print_help
            exit $STATE_OK
            ;;
        -H)
			HOST=$2
	        shift
            ;;
        -o)
	    	OID=$2
	    	STATUS="$OID.100.1"
	    	STRING="$OID.101.1"
            shift
            ;;
        *)
            echo "Unknown argument: $1"
            print_usage
            exit $STATE_UNKNOWN
            ;;
    esac
    shift
done


### FINALLY... RETRIEVING THE VALUES ###

EXITCODE=`$SNMPGET $HOST $STATUS`
[ $DEBUG -gt 0 ] && echo "Retrieve exit code is $EXITCODE" >> $DEBUGFILE
 
OUTPUT=`$SNMPGET $HOST $STRING | sed 's/"//g'`
[ $DEBUG -gt 0 ] && echo "Retrieve status message is: $OUTPUT" >> $DEBUGFILE

echo $OUTPUT
exit $EXITCODE


kilala.nl tags: , , ,

View or add comments (curr. 0)

Nagios script: check_fwm

2005-07-01 00:00:00

This script was written at the time I was hired by UPC / Liberty Global.

Basic monitor that checks if the Checkpoint Firewall-1 Management software is up and running. It checks for a number of processes and ports.

This script was quickly hacked together for my current customer, as a Q&D solution for their monitoring needs. It's no beauty, but it works. Written in ksh and tested with:

The script sends a Critical if:

A) One or more processes are not running, or

B) One or more ports are not available for connections.

UPDATE 19/06/2006:

Cleaned up the script a bit and added some checks that are considered the Right Thing to do. Should have done this -way- earlier!


#!/usr/bin/bash
#
# Firewall-1 process monitor plugin for Nagios
# Written by Thomas Sluyter (nagiosATkilalaDOTnl)
# By request of DTV Labs, Liberty Global, the Netherlands
# Last Modified: 19-06-2006
# 
# Usage: ./check_fwm
#
# Description:
# This plugin determines whether the Firewall-1 management
# software is running properly. It will check the following:
# * Are all required processes running?
# * Are all the required TCP/IP ports open?
#
# Limitations:
# Currently this plugin will only function correctly on Solaris systems.
#
# Output:
# The script retunrs a CRIT when one of the criteria mentioned
# above is not matched.
#

# Host OS check and warning message
if [ `uname` != "SunOS" ]
then
        echo "WARNING:"
        echo "This script was originally written for use on Solaris."
        echo "You may run into some problems running it on this host."
        echo ""
        echo "Please verify that the script works before using it in a"
        echo "live environment. You can easily disable this message after"
        echo "testing the script."
        echo ""
fi

# You may have to change this, depending on where you installed your
# Nagios plugins
PATH="/usr/bin:/usr/sbin:/bin:/sbin"
LIBEXEC="/usr/local/nagios/libexec"
. $LIBEXEC/utils.sh

print_usage() {
	echo "Usage: $PROGNAME"
	echo "Usage: $PROGNAME --help"
}

print_help() {
	echo ""
	print_usage
	echo ""
	echo "Firewall-1 monitor plugin for Nagios"
	echo ""
	echo "This plugin not developped by the Nagios Plugin group."
	echo "Please do not e-mail them for support on this plugin, since"
	echo "they won't know what you're talking about :P"
	echo ""
	echo "For contact info, read the plugin itself..."
}

while test -n "$1" 
do
	case "$1" in
	  --help) print_help; exit $STATE_OK;;
	  -h) print_help; exit $STATE_OK;;
	  *) print_usage; exit $STATE_UNKNOWN;;
	esac
done

check_processes()
{
	PROCESS="0"
	# PROCLIST="cpd fwd fwm cpwd cpca cpmad cplmd cpstat cpshrd cpsnmpd"
	PROCLIST="cpd fwd fwm cpwd cpca cpmad cpstat cpsnmpd"
	for PROC in `echo $PROCLIST`; do
	if [ `ps -ef | grep $PROC | grep -v grep | wc -l` -lt 1 ]; then PROCESS=1;fi
	done

	if [ $PROCESS -eq 1 ]; then 
		echo "FWM NOK - One or more processes not running"
		exitstatus=$STATE_CRITICAL
		exit $exitstatus
	fi
}

check_ports()
{
	PORTS="0"
	PORTLIST="256 257 18183 18184 18187 18190 18191 18192 18196 18264"
	for NUM in `echo $PORTLIST`; do
	if [ `netstat -an | grep LISTEN | grep $NUM | grep -v grep | wc -l` -lt 1 ]; then PORTS=1;fi
	done

	if [ $PORTS -eq 1 ]; then 
		echo "FWM NOK - One or more TCP/IP ports not listening."
		exitstatus=$STATE_CRITICAL
		exit $exitstatus
	fi
}

check_processes
check_ports

echo "FWM OK - Everything running like it should"
exitstatus=$STATE_OK
exit $exitstatus

kilala.nl tags: , , ,

View or add comments (curr. 0)

Nagios script: check_load2

2005-07-01 00:00:00

This script was written at the time I was hired by KPN i-Diensten. It is reproduced/shared here with their permission.

We are currently in the process of distributing a standard set of Nagios monitoring scripts to over 300 client systems. One of the metrics we would like to monitor is the three load averages (or as Dr. Gunther calls them: the LaLaLa triplets).

Since these 300 servers aren't all alike, we are bound to run into systems with one, two, four, eight or more processors. That way there is no nice way of making one standard configuration, since you'll have to define separate LA levels for WARN and CRIT. Why? Cause a quad system can take much more load than a single core system.

One way to get around this would be by defining separate host groups, based on the amount of processors in a system. You could then define a unique check_load command for each CPU host group.

I've gone the other way around though...

My work-around for this is by replacing check_load with check_load2. This script takes no command line parameters and works on the basis of standard multipliers. We are of the opinion that the number of processors multiplied by a certain factor (150%? 200%? and so on) is a good enough way to define these WARN and CRIT levels. These multipliers can easily be modified (at the top of the script) to fit what -you- think is a worrying level of activity.

This script was tested on Redhat ES3, Solaris 8 and Mac OS X 10.4. It should run on other versions of these OSes as well.

EDIT:

Oh! Just like my other recent Nagios scripts, check_load2 comes with a debugging option. Set $DEBUG at the top of the file to anything larger than zero and the script will dump information at various stages of its execution.


#!/usr/bin/bash
#
# CPU load monitor plugin for Nagios
# Written by Thomas Sluyter (nagiosATkilalaDOTnl)
# By request of KPN-IS, i-Provide, the Netherlands
# Last Modified: 22-06-2006
# 
# Usage: ./check_load2
#
# Description:
#   Ethan's original version of the check_load script is very flexible.
# It allows you to specifically set WARN and CRIT levels regarding 
# the CPU load of the system you're monitoring.
#   However: flexibility is not always a good thing. Say for example that
# you want to monitor the CPU load across a few hundred of systems having
# various CPU configurations. You -could- define host groups for single, dual
# quad (and so on) processor systems and assign unique check_load command
# definitions to each group.
#   Or you could write a script which checks the amount of active CPUs and
# then makes an educated guess at the WARN and CRIT levels for the system. 
# In most cases this should really be enough. 
#
# Limitations:
# This script should work properly on all implementations of Linux, Solaris
# and Mac OS X.
#
# Output:
# Depending on the levels defined at the top of the script,
# the script returns an OK, WARN or CRIT to Nagios based on CPU load.
#
# Other notes:
#   If you ever run into problems with the script, set the DEBUG variable
# to 1. I'll need the output the script generates to do troubleshooting.
# See below for details.
#   I realise that all the debugging commands strewn throughout the script
# may make things a little harder to read. But in the end I'm sure it was
# well worth adding them. It makes troubleshooting so much easier. :3
#

# You may have to change this, depending on where you installed your
# Nagios plugins
PATH="/usr/bin:/usr/sbin:/bin:/sbin"
LIBEXEC="/usr/local/nagios/libexec"
. $LIBEXEC/utils.sh


### DEBUGGING SETUP ###
# Cause you never know when you'll need to squash a bug or two
DEBUG="1"
DEBUGFILE="/tmp/foobar"
rm $DEBUGFILE


### REQUISITE NAGIOS COMMAND LINE STUFF ###

print_usage() {
	echo "Usage: $PROGNAME"
	echo "Usage: $PROGNAME --help"
}

print_help() {
	echo ""
	print_usage
	echo ""
	echo "Semi-intelligent CPU load monitor plugin for Nagios"
	echo ""
	echo "This plugin not developped by the Nagios Plugin group."
	echo "Please do not e-mail them for support on this plugin, since"
	echo "they won't know what you're talking about :P"
	echo ""
	echo "For contact info, read the plugin itself..."
}

while test -n "$1" 
do
	case "$1" in
	  --help) print_help; exit $STATE_OK;;
	  -h) print_help; exit $STATE_OK;;
	  *) print_usage; exit $STATE_UNKNOWN;;
	esac
done


### SETTING UP THE WARN AND CRIT FACTORS ###
# Please be aware that these are -factors- and not real load average values.
# The numbers below will be multiplied by the amount of processors to come
# to the desired WARN and CRIT levels. Feel free to adjust these factors, if
# you feel the need to tweak them.

WARN_1min="2.00"
WARN_5min="1.50"
WARN_15min="1.50"
[ $DEBUG -gt 0 ] && echo "Factors: warning factors are at $WARN_1min, $WARN_5min, $WARN_15min." >> $DEBUGFILE

CRIT_1min="3.00"
CRIT_5min="2.00"
CRIT_15min="2.00"
[ $DEBUG -gt 0 ] && echo "Factors: critical factors are at $CRIT_1min, $CRIT_5min, $CRIT_15min." >> $DEBUGFILE


### DEFINING SUBROUTINES ###

function gather_procs_linux()
{
    NUMPROCS=`cat /proc/cpuinfo | grep ^processor | wc -l` 
[ $DEBUG -gt 0 ] && echo "Numprocs: Number of processors detected is $NUMPROCS." >> $DEBUGFILE
} 

function gather_procs_sunos()
{
    NUMPROCS=`/usr/bin/mpstat | grep -v CPU | wc -l` 
[ $DEBUG -gt 0 ] && echo "Numprocs: Number of processors detected is $NUMPROCS." >> $DEBUGFILE
}

function gather_procs_darwin()
{
    NUMPROCS=`/usr/bin/hostinfo | grep "Default processor set" | awk '{print $8}'` 
[ $DEBUG -gt 0 ] && echo "Numprocs: Number of processors detected is $NUMPROCS." >> $DEBUGFILE
}

function gather_load_linux()
{
    REAL_1min=`cat /proc/loadavg | awk '{print $1}'`
    REAL_5min=`cat /proc/loadavg | awk '{print $2}'`
    REAL_15min=`cat /proc/loadavg | awk '{print $3}'`
[ $DEBUG -gt 0 ] && echo "Gather_load: Detected load averages are $REAL_1min, $REAL_5min, $REAL_15min." >> $DEBUGFILE
}

function gather_load_sunos()
{
    REAL_1min=`w | grep "load average" | awk -F, '{print $4}' | awk '{print $3}'`
    REAL_5min=`w | grep "load average" | awk -F, '{print $5}'`
    REAL_15min=`w | grep "load average" | awk -F, '{print $6}'`
[ $DEBUG -gt 0 ] && echo "Gather_load: Detected load averages are $REAL_1min, $REAL_5min, $REAL_15min." >> $DEBUGFILE
}

function gather_load_darwin()
{
    REAL_1min=`sysctl -n vm.loadavg | awk '{print $1}'`
    REAL_5min=`sysctl -n vm.loadavg | awk '{print $2}'`
    REAL_15min=`sysctl -n vm.loadavg | awk '{print $3}'`
[ $DEBUG -gt 0 ] && echo "Gather_load: Detected load averages are $REAL_1min, $REAL_5min, $REAL_15min." >> $DEBUGFILE
}

function check_load()
{
    WARN="0"; CRIT="0"

    [ `echo "if(($NUMPROCS * $WARN_1min) > $REAL_1min) 0; if(($NUMPROCS * $WARN_1min) <= $REAL_1min) 1" | bc` -gt 0 ] && let WARN=$WARN+1
    [ `echo "if(($NUMPROCS * $WARN_5min) > $REAL_5min) 0; if(($NUMPROCS * $WARN_5min) <= $REAL_5min) 1" | bc` -gt 0 ] && let WARN=$WARN+1
    [ `echo "if(($NUMPROCS * $WARN_15min) > $REAL_15min) 0; if(($NUMPROCS * $WARN_15min) <= $REAL_15min) 1" | bc` -gt 0 ] && let WARN=$WARN+1
[ $DEBUG -gt 0 ] && echo "Check_load: warning levels are `echo "$NUMPROCS * $WARN_1min"|bc`, `echo "$NUMPROCS * $WARN_5min"|bc`, `echo "$NUMPROCS * $WARN_15min"|bc`," >> $DEBUGFILE

    [ `echo "if(($NUMPROCS * $CRIT_1min) > $REAL_1min) 0; if(($NUMPROCS * $CRIT_1min) <= $REAL_1min) 1" | bc` -gt 0 ] && let CRIT=$CRIT+1
    [ `echo "if(($NUMPROCS * $CRIT_5min) > $REAL_5min) 0; if(($NUMPROCS * $CRIT_5min) <= $REAL_5min) 1" | bc` -gt 0 ] && let CRIT=$CRIT+1
    [ `echo "if(($NUMPROCS * $CRIT_15min) > $REAL_15min) 0; if(($NUMPROCS * $CRIT_15min) <= $REAL_15min) 1" | bc` -gt 0 ] && let CRIT=$CRIT+1
[ $DEBUG -gt 0 ] && echo "Check_load: critical levels are `echo "$NUMPROCS * $CRIT_1min"|bc`, `echo "$NUMPROCS * $CRIT_5min"|bc`, `echo "$NUMPROCS * $CRIT_15min"|bc`," >> $DEBUGFILE

    [ $WARN -gt 0 ] && (echo "NOK: load averages are at $REAL_1min, $REAL_5min, $REAL_15min"; exit $STATE_WARNING)
    [ $CRIT -gt 0 ] && (echo "NOK: load averages are at $REAL_1min, $REAL_5min, $REAL_15min"; exit $STATE_CRITICAL)
}

### FINALLY, THE MAIN ROUTINE ###

NUMPROCS="0"

case `uname` in
            Linux) gather_procs_linux; gather_load_linux; check_load;;
            Darwin) gather_procs_darwin; gather_load_darwin; check_load;;
            SunOS) gather_procs_sunos; gather_load_sunos; check_load;;
            *) echo "OS not supported by this check."; exit 1;;
esac

# Nothing caused us to exit early, so we're okay.
echo "OK - load averages are at $REAL_1min, $REAL_5min, $REAL_15min"
exit $STATE_OK


kilala.nl tags: , , ,

View or add comments (curr. 7)

Nagios script: check_ntp_s

2005-07-01 00:00:00

This script was written at the time I was hired by UPC / Liberty Global.

Basic monitor that checks if the server is up and running. It checks for a process and whether the server has drifted from its higher level Stratum server.

This script was quickly hacked together for my current customer, as a Q&D solution for their monitoring needs. It's no beauty, but it works. Written in ksh and tested with:

The script sends a Critical if:

A) One or more processes are not running, or

B) The server's clock has drifted too far from its higher level Stratum server.

Requires the "check_ntp" plugin which is part of the default monitor package.

UPDATE 19/06/2006:

Cleaned up the script a bit and added some checks that are considered the Right Thing to do. Should have done this -way- earlier!



#!/usr/bin/bash
#
# NTP server process monitor plugin for Nagios
# Written by Thomas Sluyter (nagiosATkilalaDOTnl)
# By request of DTV Labs, Liberty Global, the Netherlands
# Last Modified: 19-06-2006
# 
# Usage: ./check_ntp_s
#
# Description:
# This plugin determines whether the Nagios client is functioning 
# properly as an NTP server. It does this by checking:
# * Are all required processes running?
# * Is the server's time up to scratch with its higher stratum server?
#
# Limitations:
# Currently this plugin will only function correctly on Solaris systems.
#
# Output:
# The script returns a CRIT when one of the abovementioned criteria
# is not matched.
#

# Host OS check and warning message
if [ `uname` != "SunOS" ]
then
        echo "WARNING:"
        echo "This script was originally written for use on Solaris."
        echo "You may run into some problems running it on this host."
        echo ""
        echo "Please verify that the script works before using it in a"
        echo "live environment. You can easily disable this message after"
        echo "testing the script."
        echo ""
fi

# You may have to change this, depending on where you installed your
# Nagios plugins
PATH="/usr/bin:/usr/sbin:/bin:/sbin"
LIBEXEC="/usr/local/nagios/libexec"
. $LIBEXEC/utils.sh

print_usage() {
	echo "Usage: $PROGNAME"
	echo "Usage: $PROGNAME --help"
}

print_help() {
	echo ""
	print_usage
	echo ""
	echo "NTP server plugin for Nagios"
	echo ""
	echo "This plugin not developped by the Nagios Plugin group."
	echo "Please do not e-mail them for support on this plugin, since"
	echo "they won't know what you're talking about :P"
	echo ""
	echo "For contact info, read the plugin itself..."
}

while test -n "$1" 
do
	case "$1" in
	  --help) print_help; exit $STATE_OK;;
	  -h) print_help; exit $STATE_OK;;
	  *) print_usage; exit $STATE_UNKNOWN;;
	esac
done

check_processes()
{
	PROCESS="0"
	if [ `ps -ef | grep xntpd | grep -v grep | grep -v nagios | wc -l` -lt 1 ]; then PROCESS=1;fi
	if [ $PROCESS -eq 1 ]; then 
		echo "NTP-S NOK - One or more processes not running"
		exitstatus=$STATE_CRITICAL
		exit $exitstatus
	fi
}

check_time()
{
	TIME="0"
	#SERVERS="ntp0.nl.net ntp1.nl.net ntp2.nl.net"
	SERVERS="nl-ams99z-a02-01"
	for SERV in `echo $SERVERS`; do
		if [ `/usr/local/nagios/libexec/check_ntp -H $SERV | awk '{print $2}'` != "OK:" ]; then
			TIME=1
		else
			TIME=0
			break
		fi
	done
	if [ $TIME -eq 1 ]; then
		echo "NTP-S NOK - Time not in synch with higher Stratum."
		exitstatus=$STATE_CRITICAL
		exit $exitstatus
	fi
}

check_processes
check_time

echo "NTP-S OK - Everything running like it should"
exitstatus=$STATE_OK
exit $exitstatus


kilala.nl tags: , , ,

View or add comments (curr. 1)

Nagios script: check_postfix

2005-07-01 00:00:00

This script was written at the time I was hired by UPC / Liberty Global.

Basic monitor that checks if Postfix is up and running. It checks for a number of processes and ports.

This script was quickly hacked together for my current customer, as a Q&D solution for their monitoring needs. It's no beauty, but it works. Written in ksh and tested with:

The script sends a Critical if:

A) One or more processes are not running, or

B) One or more ports are not available for connections.

UPDATE 19/06/2006:

Cleaned up the script a bit and added some checks that are considered the Right Thing to do. Should have done this -way- earlier!



#!/usr/bin/bash
#
# Postfix process monitor plugin for Nagios
# Written by Thomas Sluyter (nagiosATkilalaDOTnl)
# By request of DTV Labs, Liberty Global, the Netherlands
# Last Modified: 19-06-2006
# 
# Usage: ./check_postfix
#
# Description:
# This plugin determines whether the Postfix SMTP server
# is running properly. It will check the following:
# * Are all required processes running?
# * Are all the required TCP/IP ports open?
#
# Limitations:
# Currently this plugin will only function correctly on Solaris systems.
#
# Output:
# Script returns a CRIT when one of the abovementioned criteria is 
# not matched
#

# Host OS check and warning message
if [ `uname` != "SunOS" ]
then
        echo "WARNING:"
        echo "This script was originally written for use on Solaris."
        echo "You may run into some problems running it on this host."
        echo ""
        echo "Please verify that the script works before using it in a"
        echo "live environment. You can easily disable this message after"
        echo "testing the script."
        echo ""
fi

# You may have to change this, depending on where you installed your
# Nagios plugins
PATH="/usr/bin:/usr/sbin:/bin:/sbin"
LIBEXEC="/usr/local/nagios/libexec"
. $LIBEXEC/utils.sh

print_usage() {
	echo "Usage: $PROGNAME"
	echo "Usage: $PROGNAME --help"
}

print_help() {
	echo ""
	print_usage
	echo ""
	echo "Postfix monitor plugin for Nagios"
	echo ""
	echo "This plugin not developped by the Nagios Plugin group."
	echo "Please do not e-mail them for support on this plugin, since"
	echo "they won't know what you're talking about :P"
	echo ""
	echo "For contact info, read the plugin itself..."
}

while test -n "$1" 
do
	case "$1" in
	  --help) print_help; exit $STATE_OK;;
	  -h) print_help; exit $STATE_OK;;
	  *) print_usage; exit $STATE_UNKNOWN;;
	esac
done

check_processes()
{
	PROCESS="0"
	PROCLIST="smtpd qmgr pickup master sendmail"
	for PROC in `echo $PROCLIST`; do
	if [ `ps -ef | grep $PROC | grep -v grep | wc -l` -lt 1 ]; then 
		if [ $PROC == "smtpd" ]; then
			if [ `ps -ef | grep proxymap | grep -v grep | wc -l` -lt 1 ]; then
				PROCESS=1
			else
				PROCESS=0
			fi
		else
			PROCESS=1
		fi
	fi
	done

	if [ $PROCESS -eq 1 ]; then 
		echo "SMTP-S NOK - One or more processes not running"
		exitstatus=$STATE_CRITICAL
		exit $exitstatus
	fi
}

check_ports()
{
	PORTS="0"
	PORTLIST="25"
	for NUM in `echo $PORTLIST`; do
	if [ `netstat -an | grep LISTEN | grep $NUM | grep -v grep | wc -l` -lt 1 ]; then PORTS=1;fi
	done

	if [ $PORTS -eq 1 ]; then 
		echo "SMTP-S NOK - One or more TCP/IP ports not listening."
		exitstatus=$STATE_CRITICAL
		exit $exitstatus
	fi
}

check_processes
check_ports

echo "SMTP-S OK - Everything running like it should"
exitstatus=$STATE_OK
exit $exitstatus



kilala.nl tags: , , ,

View or add comments (curr. 0)

Nagios script: check_retro_client

2005-07-01 00:00:00

This script was written at the time I was hired by UPC / Liberty Global.

Basic monitor that checks if the Retrospect client is up and running.

This script was quickly hacked together for my current customer, as a Q&D solution for their monitoring needs. It's no beauty, but it works. Written in ksh and tested with:

The script sends a Critical if the required process is not running.

UPDATE 19/06/2006:

Cleaned up the script a bit and added some checks that are considered the Right Thing to do. Should have done this -way- earlier!



#!/usr/bin/bash
#
# Retrospect Backup Client monitor plugin for Nagios
# Written by Thomas Sluyter (nagiosATkilalaDOTnl)
# By request of DTV Labs, Liberty Global, the Netherlands
# Last Modified: 19-06-2006
# 
# Usage: ./check_retro_client
#
# Description:
# This plugin determines whether the Retrospect backup client 
# is running properly. It will check the following:
# * Are all required processes running?
#
# Limitations:
# Currently this plugin will only function correctly on Solaris systems.
#
# Output:
# The script returns a CRIT when the abovementioned criteria are
# not matched
#

# Host OS check and warning message
if [ `uname` != "SunOS" ]
then
        echo "WARNING:"
        echo "This script was originally written for use on Solaris."
        echo "You may run into some problems running it on this host."
        echo ""
        echo "Please verify that the script works before using it in a"
        echo "live environment. You can easily disable this message after"
        echo "testing the script."
        echo ""
fi

# You may have to change this, depending on where you installed your
# Nagios plugins
PATH="/usr/bin:/usr/sbin:/bin:/sbin"
LIBEXEC="/usr/local/nagios/libexec"
. $LIBEXEC/utils.sh

print_usage() {
	echo "Usage: $PROGNAME"
	echo "Usage: $PROGNAME --help"
}

print_help() {
	echo ""
	print_usage
	echo ""
	echo "Retrospect Backup Client monitor plugin for Nagios"
	echo ""
	echo "This plugin not developped by the Nagios Plugin group."
	echo "Please do not e-mail them for support on this plugin, since"
	echo "they won't know what you're talking about :P"
	echo ""
	echo "For contact info, read the plugin itself..."
}

while test -n "$1" 
do
	case "$1" in
	  --help) print_help; exit $STATE_OK;;
	  -h) print_help; exit $STATE_OK;;
	  *) print_usage; exit $STATE_UNKNOWN;;
	esac
done

check_processes()
{
	PROCESS="0"
	if [ `ps -ef | grep retroclient | grep -v grep | grep -v nagios | wc -l` -lt 1 ]; then 
		echo "RETROSPECT NOK - One or more processes not running"
		exitstatus=$STATE_CRITICAL
		exit $exitstatus
	fi
}

check_processes

echo "RETROSPECT OK - Everything running like it should"
exitstatus=$STATE_OK
exit $exitstatus


kilala.nl tags: , , ,

View or add comments (curr. 0)

Nagios script: check_squid

2005-07-01 00:00:00

This script was written in the time I was hired by UPC / Liberty Global.

The text I wrote on Nagios Exchange about this script has been lost. I guess it speaks for itself :)



#!/usr/bin/bash
#
# Squid process monitor plugin for Nagios
# Written by Thomas Sluyter (nagiosATkilalaDOTnl)
# By request of DTV Labs, Liberty Global, the Netherlands
# Last Modified: 19-06-2006
# 
# Usage: ./check_squid
#
# Description:
# This plugin determines whether the Squid proxy server
# is running properly. It will check the following:
# * Are all required processes running?
# * Are all the required TCP/IP ports open?
#
# Limitations:
# Currently this plugin will only function correctly on Solaris systems.
#
# Output:
# The script returns a CRIT when the abovementioned criteria are
# not matched
#

# Host OS check and warning message
if [ `uname` != "SunOS" ]
then
        echo "WARNING:"
        echo "This script was originally written for use on Solaris."
        echo "You may run into some problems running it on this host."
        echo ""
        echo "Please verify that the script works before using it in a"
        echo "live environment. You can easily disable this message after"
        echo "testing the script."
        echo ""
fi

# You may have to change this, depending on where you installed your
# Nagios plugins
PATH="/usr/bin:/usr/sbin:/bin:/sbin"
LIBEXEC="/usr/local/nagios/libexec"
. $LIBEXEC/utils.sh

print_usage() {
	echo "Usage: $PROGNAME"
	echo "Usage: $PROGNAME --help"
}

print_help() {
	echo ""
	print_usage
	echo ""
	echo "Squid monitor plugin for Nagios"
	echo ""
	echo "This plugin not developped by the Nagios Plugin group."
	echo "Please do not e-mail them for support on this plugin, since"
	echo "they won't know what you're talking about :P"
	echo ""
	echo "For contact info, read the plugin itself..."
}

while test -n "$1" 
do
	case "$1" in
	  --help) print_help; exit $STATE_OK;;
	  -h) print_help; exit $STATE_OK;;
	  *) print_usage; exit $STATE_UNKNOWN;;
	esac
done

check_processes()
{
	PROCESS="0"
	if [ `ps -ef | grep squid | grep -v grep | grep -v nagios | wc -l` -lt 2 ]; then 
		echo "SQUID NOK - One or more processes not running"
		exitstatus=$STATE_CRITICAL
		exit $exitstatus
	fi
}

check_ports()
{
	PORTS=0
	PORTLIST="8080 3128 3130"
	for NUM in `echo $PORTLIST`; do
	if [ `netstat -an | grep LISTEN | grep $NUM | grep -v grep | wc -l` -lt 1 ]; then PORTS=1;fi
	done

	if [ $PORTS -eq 1 ]; then 
		echo "SQUID NOK - One or more TCP/IP ports not listening."
		exitstatus=$STATE_CRITICAL
		exit $exitstatus
	fi
}

check_processes
check_ports

echo "SQUID OK - Everything running like it should"
exitstatus=$STATE_OK
exit $exitstatus


kilala.nl tags: , , ,

View or add comments (curr. 0)

Older blog posts