2017-05-11 21:04:00
A few weeks ago, I reopened commenting on this site after having it locked behind logins for years. Since then the amount of spam submissions have been growing steadily. Sucks, so I finally took the time to implement proper spam checking. Enter Google's free project reCaptcha. Of course I realize that, if something's free on the web, it probably means that I'm the product being sold. I'll have to poke around the code to see what it actually does :)
CodexWorld have a great tutorial on getting reCaptcha to work in a basic script. Took me less than an hour to get it all set up! Lovely!
kilala.nl tags: programming,
View or add comments (curr. 3)
2017-01-19 22:18:00
It's been roughly eight years since I started work on KilalaCMS, the code that runs this website. She's served me well and I haven't had many headaches. Early on, Dick offered me lots of great help in sanitizing input, putting up at least some SQL injection protection. In the end it might not be much to look at, but she's mine :)
A few months back Dreamhost sent their customers who were still on PHP5.5 a warning that said version would soon be dropped from their servers. Thus, it was a warning to go check your code. Obviously KilalaCMS was behind the times, so I've now taken some time to adjust things here and there so it works in PHP7.0. I've also taken the liberty to default everything to HTTPS, using a free SSL cert from Lets Encrypt. Dreamhost took care of the latter part for me. Good service!
I may run into a bug or two, but so far things are looking good!
EDIT: Kudos by the way to Dreamhost for their tech support! As part of the reno, I'd decided to run an "sqlmap" test against my DEV site, to make sure I wasn't leaving SQLI in plain sight. After the first tentative probe, the server slammed the door on my nose! They've got their boxes set up quite nicely, to prevent attacks like these. Nice! Had a chat with their support people and we worked out a nice way for me to test, without affecting my site or any of the other folks hosted on my box.
kilala.nl tags: programming, website,
View or add comments (curr. 0)
2009-09-14 22:05:00
This script is used to monitor the basic processes that go with Cisco's CNR (Network Registrar), which can be likened to a DHCP server. Cisco's Support Wiki described CNR as follows:
Cisco CNS Network Registrar is a full-featured DNS/DHCP system that provides scalable naming and addressing services for service provider and enterprise networks. Cisco CNS Network Registrar dramatically improves the reliability of naming and addressing services for enterprise networks. For cable ISPs, Cisco CNS Network Registrar provides scalable DNS and DHCP services and forms the basis of a DOCSIS cable modem provisioning system.
As said my script only checks the basics of CNR to ensure that the required daemons are running. It does not actually check any of the functionality, though at a later point in time it may be expanded to include this.
./check_cnr [-nagios|-tivoli] [-d -o FILE] -nagios Nagios output mode (default) -tivoli Tivoli output mode -d Debug mode -o Output file for debug logging
Depending on which mode you've selected the output of the script will differ slightly.
In Tivoli mode the output will be limited to a numerical value as the script is to be used as a "numeric script". 0 = OK, 1 = WARNING/UNKNOWN, 2= SEVERE. The exit code of the script will be identical to this value.
In Nagios mode the exit code of the script will be be similar to Tivoli's, with the exception that the value 3 portrays an unknown state. The output on stdout includes the service name and state (CNR OK/NOK) and a helpful error message.
$ wc check_cnr.sh 189 666 4531 check_cnr.sh $ cksum check_cnr.sh 4161895780 4531 check_cnr.sh
kilala.nl tags: nagios, unix, programming,
View or add comments (curr. 0)
2008-01-01 00:00:00
Just today I ran into something shiny that peeked my interest. A shell script I'd written in Bash didn't work like I expected it to, with regards to the scope of a variable. I thought the incident was interesting enough to report, although I won't go into the whole scoping story too deeply.
What is basically boils down to is that there was a difference in the way two shells handle a certain situation. A difference that I didn't expect to be there. Not that exciting, but still very educational.
Yeah. In most programming languages variables have a certain range within your program, within which they can be used. Some variables only exist within one subroutine, while other exist across the whole program or even across multiple parts of the whole.
In shell scripting things aren't that complicated, luckily. In most cases a variable that's set in one part of the script can be used in every other part of the script. There are some notable exceptions, one of which I ran into today without realising it.
My situation:
I have a command that outputs a number of lines, some of which I need. The lines that I'm interested in consist of various fields, two of which I need as variables. Depending on the value of one of these variables, a counter needs to be incremented.
I guess that sounds kinda complicated, so here's the real code snippet:
function check_transport_paths
{
TOTAL=`scstat -W | grep "Transport path:" | wc -l`
let COUNT=0
scstat -W | grep "Transport path:" | awk '{print $3" "$6}' | while read PATH STATUS
do
if [ $STATUS == "online" ]
then
let COUNT=$COUNT+1
fi
done
if [ $COUNT -lt 1 ]
then
echo "NOK - No transport paths online."
exit $STATE_CRITICAL
elif [ $COUNT -lt $TOTAL ]
then
echo "NOK - One or more transport paths offline."
exit $STATE_WARNING
fi
}
While testing my script, I found out that $COUNT would never retain the value it gained in the while-loop. This of course led to the script always failing the check. After some fiddling about, I found out that the problem lay in the use of the while loop: it was being used that the end of a pipe.
To illustrate, the following -does- work.
let COUNT=0
while read i
do
let COUNT=$COUNT+$i
echo $COUNT
done
echo "Total is $COUNT."
This leads to the following output.
$ ./baka.sh
1
1
2
3
3
6
4
10
^D
Total is 10.
However, if I were to create a script called neko.sh that outputs the numbers one through four on seperate lines, which is then used in baka.sh... well... it doesn't work :D Regardez!
let COUNT=0
./neko.sh | while read i
do
let COUNT=$COUNT+$i
echo $COUNT
done
echo "Total is $COUNT."
This gives the following output
1
3
6
10
Total is 0.
After discussing the matter with two of my colleagues (one of them as puzzled as I was, and the other knowing what was going wrong) we came to the following conclusions.
This conclusion is supported by an example in the "Advanced Bash-scripting guide" by Mendel Cooper. In the following example an additional comment is made about the scoping of variables with redirected while loops. The comment warns that older shells branch a redirected while into a sub-shell, but also tells that Bash and Ksh this properly.
I guess our version of Bash is too old :3
I'd like to thank my colleagues Dennis Roos and Tom Scholten for spending a spare hour with me, hacking at this problem. And I'd like to thank Ondrej Jombik for pointing out the fact that this article didn't make my conclusions very clear in its original version.
kilala.nl tags: unix, sysadmin, programming,
View or add comments (curr. 28)
2007-08-30 11:46:00
This script was written at the time I was hired by T-Systems.
This script is an evolution of my earlier check_ntp_config. This time it's meant for use with Tivoli, although modifying it for use with Nagios is trivial. The script was written to be usable on at least five different Unices, though i've been having trouble with Darwin/OS X.
The script was tested on Red Hat Linux, Tru64, HP-UX, AIX and Solaris. Only Darwin seems to have problems.
Just like my other recent Nagios scripts, check_ntpconfig.sh comes with a debugging option. Set $DEBUG at the top of the file to anything larger than zero and the script will dump information at various stages of its execution.
#!/usr/bin/ksh # # NTP configuration check script for Tivoli. # Written by Thomas Sluyter (nagiosATkilalaDOTnl) # By request of T-Systems, CSS-CCTMO, the Netherlands # Last Modified: 13-09-2007 # # Usage: ./check_ntp_config # # Description: # Well, there's not much to tell. We have no way of making sure that our # NTP clients are all configured in the right way, so I thought I'd make # a Nagios check for it. ^_^ After that came this derivative Tivoli script. # You can change the NTP config at the top of this script, to match your # own situation. # # Limitations: # This script should work fine on Solaris, HP-UX, AIX, Tru64 and some # flavors of Linux. So far Darwin-compatibility has eluded me. # # Output: # If the NTP client config does not match what has been defined at the # top of this script, the script will echo $STATE_NOK. In this case, the # STATE variables contain a zero and a one, so you'll need to use a # "Numeric Script" monitor definition in Tivoli. Anything above zero is bad. # # Other notes: # If you ever run into problems with the script, set the DEBUG variable # to 1. I'll need the output the script generates to do troubleshooting. # See below for details. # I realise that all the debugging commands strewn throughout the script # may make things a little harder to read. But in the end I'm sure it was # well worth adding them. It makes troubleshooting so much easier. :3 # ### SETTING THINGS UP ### PATH="/usr/bin:/usr/sbin:/bin:/sbin" PROGNAME="./check_ntp_config" STATE_NOK="1" STATE_OK="0" . /opt/Tivoli/lcf/dat/dm_env.sh >/dev/null 2>&1 ### DEFINING THE NTP CLIENT CONFIGURATION AS IT SHOULD BE ### NTPSERVERS="192.168.22.7 192.168.25.7 192.168.16.7" ### DEBUGGING SETUP ### # Cause you never know when you'll need to squash a bug or two DEBUG="1" if [[ $DEBUG -gt 0 ]] then DEBUGFILE="/tmp/thomas-debug.txt" if [[ -f $DEBUGFILE ]] then rm $DEBUGFILE >/dev/null 2>&1 [[ $? -gt 0 ]] && echo "Removing old debug file failed." touch $DEBUGFILE fi fi ### REQUISITE COMMAND LINE STUFF ### print_usage() { echo "" echo "Usage: $PROGNAME" } print_help() { echo "" echo "NTP client configuration monitor plugin for Tivoli." echo "" echo "This plugin not developped by IBM." echo "Please do not e-mail them for support on this plugin, since" echo "they won't know what you're talking about :P" echo "" echo "For contact info, read the plugin itself..." echo "" print_usage echo "" } while test -n "$1" do case "$1" in *) print_help; exit $STATE_OK;; esac done ### DEFINING SUBROUTINES ### function SetupEnv { case $(uname) in Linux) CFGFILE="/etc/ntp.conf"; IPCMD="host" IPMOD="tail -1" NAMEMOD="tail -1" IPFIELD="4" NAMEFIELD="5" GREP="egrep -e" ;; SunOS) CFGFILE="/etc/inet/ntp.conf" IPCMD="getent hosts" IPMOD="" NAMEMOD="" IPFIELD="1" NAMEFIELD="2" GREP="egrep -e" ;; Darwin) CFGFILE="/etc/ntp.conf" IPCMD="host" IPMOD="" NAMEMOD="" IPFIELD="4" NAMEFIELD="1" GREP="egrep -e" ;; AIX) CFGFILE="/etc/ntp.conf" IPCMD="host" IPMOD="" NAMEMOD="" IPFIELD="3" NAMEFIELD="1" GREP="egrep -e" ;; HP-UX) CFGFILE="/etc/ntp.conf" IPCMD="nslookup" IPMOD="grep ^\"Address\"" NAMEMOD="grep ^\"Name\"" IPFIELD="2" NAMEFIELD="2" GREP="egrep -e" ;; OSF1) CFGFILE="/etc/ntp.conf" IPCMD="nslookup" IPMOD="grep ^\"Address\" | tail -1" NAMEMOD="grep ^\"Name\" |tail -1" IPFIELD="2" NAMEFIELD="2" GREP="egrep -e" ;; *) echo "Sorry. OS not supported."; exit 1 ;; esac FAULT=0 if [[ $DEBUG -gt 0 ]] then echo "=== SETUP ===" >> $DEBUGFILE echo "OS name is $(uname)" >> $DEBUGFILE echo "CFGFILE is $CFGFILE" >> $DEBUGFILE echo "IPCMD is $IPCMD" >> $DEBUGFILE echo "IPMOD is $IPMOD" >> $DEBUGFILE echo "NAMEMOD is $NAMEMOD" >> $DEBUGFILE echo "IPFIELD is $IPFIELD" >> $DEBUGFILE echo "NAMEFIELD is $NAMEFIELD" >> $DEBUGFILE echo "" >> $DEBUGFILE echo "NTPSERVERS is $NTPSERVERS" >> $DEBUGFILE echo "" >> $DEBUGFILE fi } function ListInConf { if [[ -z $NTPSERVERS ]] then echo "You haven't configured this monitor yet. Set \$NTPSERVERS."; exit 0 [[ $DEBUG -gt 0 ]] && echo "NTPSERVERS variable not set." >> $DEBUGFILE else for HOST in $(echo $NTPSERVERS) do SKIPIP=0 SKIPNAME=0 if [[ $DEBUG -gt 0 ]] then echo "=== LISTINCONF ===" >> $DEBUGFILE echo "HOST is $HOST" >> $DEBUGFILE echo "" >> $DEBUGFILE fi if [[ -z $(echo $HOST | $GREP [a-z,A-Z]) ]] then IPADDRESS="$HOST" TEST=$($IPCMD $HOST 2>/dev/null) if [[ ( $? -eq 0 ) && ( -z $(echo $TEST | $GREP NXDOMAIN) ) ]] then [[ $DEBUG -gt 0 ]] && echo "TEST is $TEST" >> $DEBUGFILE HOSTNAME=$($IPCMD $HOST 2>/dev/null | $NAMEMOD | cut -f$NAMEFIELD -d" " | cut -f1 -d.) else [[ $DEBUG -gt 0 ]] && echo "TEST is $TEST" >> $DEBUGFILE HOSTNAME="" fi if [[ $HOSTNAME -eq "" ]] then QUERY="$IPADDRESS" [[ $DEBUG -gt 0 ]] && echo "Skipping hostname verification" >> $DEBUGFILE else QUERY="$HOSTNAME $IPADDRESS" [[ $DEBUG -gt 0 ]] && echo "Checking both IP and name." >> $DEBUGFILE fi else HOSTNAME="$HOST" TEST=$($IPCMD $HOST 2>/dev/null) if [[ ( $? -eq 0 ) && ( -z $(echo $TEST | $GREP NXDOMAIN) ) ]] then [[ $DEBUG -gt 0 ]] && echo "TEST is $TEST" >> $DEBUGFILE IPADDRESS=$($IPCMD $HOST 2>/dev/null | $IPMOD | cut -f$IPFIELD -d" ") else [[ $DEBUG -gt 0 ]] && echo "TEST is $TEST" >> $DEBUGFILE IPADDRESS="" fi if [[ $IPADDRESS -eq "" ]] then QUERY="$HOSTNAME" [[ $DEBUG -gt 0 ]] && echo "Skipping IP address verification" >> $DEBUGFILE else QUERY="$HOSTNAME $IPADDRESS" [[ $DEBUG -gt 0 ]] && echo "Checking both IP and name." >> $DEBUGFILE fi fi if [[ $DEBUG -gt 0 ]] then echo "IPADDRESS is $IPADDRESS" >> $DEBUGFILE echo "HOSTNAME is $HOSTNAME" >> $DEBUGFILE echo "" >> $DEBUGFILE fi for NAME in `echo $QUERY` do [[ -z $($GREP $NAME $CFGFILE | $GREP "server") ]] && let FAULT=$FAULT+1 done done fi } function ConfInList { NUMSERVERS=$($GREP ^"server" $CFGFILE | wc -l) if [[ $DEBUG -gt 0 ]] then echo "=== CONFINLIST ===" >> $DEBUGFILE echo "Number of \"server\" lines in $CFGFILE is $NUMSERVERS" >> $DEBUGFILE echo "" >> $DEBUGFILE fi if [[ $($GREP ^"server" $CFGFILE | wc -l) -gt 0 ]] then for HOST in $(cat $CFGFILE | $GREP ^"server" | awk '{print $2}') do if [[ $DEBUG -gt 0 ]] then echo "HOST is $HOST" >> $DEBUGFILE echo "" >> $DEBUGFILE fi if [[ -z $(echo $HOST | $GREP [a-z,A-Z]) ]] then IPADDRESS="$HOST" TEST=$($IPCMD $HOST 2>/dev/null) if [[ ( $? -eq 0 ) && ( -z $(echo $TEST | $GREP NXDOMAIN) ) ]] then [[ $DEBUG -gt 0 ]] && echo "TEST is $TEST" >> $DEBUGFILE HOSTNAME=$($IPCMD $HOST 2>/dev/null | $NAMEMOD | cut -f$NAMEFIELD -d" " | cut -f1 -d.) else [[ $DEBUG -gt 0 ]] && echo "TEST is $TEST" >> $DEBUGFILE HOSTNAME="" fi if [[ $HOSTNAME -eq "" ]] then QUERY="$IPADDRESS" echo "Skipping hostname verification" >> $DEBUGFILE else QUERY="$HOSTNAME $IPADDRESS" [[ $DEBUG -gt 0 ]] && echo "Checking both IP and name." >> $DEBUGFILE fi else HOSTNAME="$HOST" TEST=$($IPCMD $HOST 2>/dev/null) if [[ ( $? -eq 0 ) && ( -z $(echo $TEST | $GREP NXDOMAIN) ) ]] then [[ $DEBUG -gt 0 ]] && echo "TEST is $TEST" >> $DEBUGFILE HOSTNAME=$($IPCMD $HOST 2>/dev/null | $IPMOD | cut -f$IPFIELD -d" ") else [[ $DEBUG -gt 0 ]] && echo "TEST is $TEST" >> $DEBUGFILE IPADDRESS="" fi if [[ $IPADDRESS -eq "" ]] then QUERY="$HOSTNAME" echo "Skipping IP address verification" >> $DEBUGFILE else QUERY="$HOSTNAME $IPADDRESS" [[ $DEBUG -gt 0 ]] && echo "Checking both IP and name." >> $DEBUGFILE fi fi if [[ $DEBUG -gt 0 ]] then echo "IPADDRESS is $IPADDRESS" >> $DEBUGFILE echo "HOSTNAME is $HOSTNAME" >> $DEBUGFILE echo "" >> $DEBUGFILE fi for NAME in `echo $QUERY` do [[ -z $(echo $NTPSERVERS | $GREP $NAME) ]] && let FAULT=$FAULT+1 done done fi } ### FINALLY, THE MAIN ROUTINE ### SetupEnv if [[ $DEBUG -gt 0 ]] then echo "=== STARTING MAIN PHASE ===" >> $DEBUGFILE echo "" >> $DEBUGFILE echo "=== NTP CONFIG FILE ===" >> $DEBUGFILE cat $CFGFILE | grep -v ^"\#" >> $DEBUGFILE echo "" >> $DEBUGFILE echo "" >> $DEBUGFILE fi ListInConf ConfInList # Nothing caused us to exit early, so we're okay. if [[ $FAULT -gt 0 ]] then echo "$STATE_NOK" exit $STATE_NOK else echo "$STATE_OK" exit $STATE_OK fi
kilala.nl tags: unix, sysadmin, programming,
View or add comments (curr. 0)
2006-06-01 00:00:00
This script was written at the time I was hired by KPN i-Diensten. It is reproduced/shared here with their permission.
A few of our projects and services are run on Solaris systems running Sun Cluster software. Since there were no Nagios scripts available to perform checks against Sun Cluster I made a basic script that checks the most important factors.
This script performs a different function, depending on the parameter with which it is called. This allows you to define multiple service checks in Nagios, without needing seperate check scripts for each.
EDIT:
Oh! Just like my other recent Nagios scripts, check_suncluster comes with a debugging option. Set $DEBUG at the top of the file to anything larger than zero and the script will dump information at various stages of its execution. And like my other, recent scripts it also comes with its own test script.
#!/usr/bin/ksh # # Nagios check script for Sun Cluster. # Written by Thomas Sluyter (nagiosATkilalaDOTnl) # By request of KPN-IS, i-Provide SYS, the Netherlands # Last Modified: 25-09-2006 # # Usage: ./check_suncluster [-t, -q, -g, -G resource-group, -r, -R resource, -i] # # Description: # This script is capable of performing a number of basic checks on a # system running Sun Cluster. Depending on the parameter you pass to # it, it will check: # * Transport paths (-t). # * Quorum (-q). # * Resource groups (-g). # * One selected resource group (-G). # * Resources (-r). # * One selected resource (-R). # * IPMP groups (-i). # # Limitations: # This script will only work with Korn shell, due to some funky while # looping with pipe forking. Bash doesn't handle this very gracefully, # due to its sub-shell variable scoping. Maybe I really should learn # to program in Perl. # # Output: # * Transport paths return a WARN when one of the paths is down and a # CRIT when all paths are offline. # * Quorum returns a WARN when not all, but enough quorum devices are # available. It returns a CRIT when quorum cannot be reached. # * Resource groups returns a CRIT when a group is offline on all nodes # and a WARN if a group is in an unstable state. # * Resources returns a CRIT when a resource is offline on all nodes # and a WARN if a resource is in an unstable state. # * IPMP groups returns a CRIT when a group is offline. # # Other notes: # Aside from the debugging output that I've built into most of my recent # scripts, this check script will also have a testing mode hacked on, as # a bag on the side. This testing mode is only engaged when the test_check_suncluster # script is being run and will intentionally "break" a few things, to # verify the failure options of this check script. # # Enabling the following dumps information into DEBUGFILE at various # stages during the execution of this script. DEBUG=0 DEBUGFILE="/tmp/foobar" if [ -f /tmp/neko-wa-baka ] then if [ `cat /tmp/neko-wa-baka` == "Nyo!" ] then TESTING="1" else TESTING="0" fi else TESTING="0" fi ### REQUISITE NAGIOS USER INTERFACE STUFF ### # You may have to change this, depending on where you installed your # Nagios plugins PATH="/usr/bin:/usr/sbin:/bin:/sbin:/usr/cluster/bin" LIBEXEC="/usr/local/nagios/libexec" PROGNAME="check_suncluster" . $LIBEXEC/utils.sh [ $DEBUG -gt 0 ] && rm $DEBUGFILE print_usage() { echo "Usage: $PROGNAME [-t, -q, -g, -G resource-group, -r, -R resource, -i]" echo "Usage: $PROGNAME --help" } print_help() { echo "" print_usage echo "" echo "Sun Cluster check plugin for Nagios" echo "" echo "-t: check transport paths" echo "-q: check quorum" echo "-g: check resource groups" echo "-G: check one individual resource group" echo "-r: check all resources" echo "-R: check one individual resources" echo "-i: check IPMP groups" echo "" echo "This plugin not developped by the Nagios Plugin group." echo "Please do not e-mail them for support on this plugin, since" echo "they won't know what you're talking about :P" echo "" echo "For contact info, read the plugin itself..." } ### SUB-ROUTINE DEFINITIONS ### function check_transport_paths { [ $DEBUG -gt 0 ] && echo "Starting check_transport_path subroutine." >> $DEBUGFILE TOTAL=`scstat -W | grep "Transport path:" | wc -l` let COUNT=0 scstat -W | grep "Transport path:" | awk '{print $3" "$6}' | while read PATH STATUS do [ $DEBUG -gt 0 ] && echo "Before math, Count has the value of $COUNT." >> $DEBUGFILE if [ $STATUS == "online" ] then let COUNT=$COUNT+1 fi [ $DEBUG -gt 0 ] && echo "Path: $PATH has status $STATUS" >> $DEBUGFILE [ $DEBUG -gt 0 ] && echo "Count: $COUNT online transport paths." >> $DEBUGFILE done [ $DEBUG -gt 0 ] && echo "Count: Outside the loop it has a value of $COUNT." >> $DEBUGFILE [ $TESTING -gt 0 ] && COUNT="0" if [ $COUNT -lt 1 ] then echo "NOK - No transport paths online." exit $STATE_CRITICAL elif [ $COUNT -lt $TOTAL ] then echo "NOK - One or more transport paths offline." exit $STATE_WARNING fi } function check_quorum { [ $DEBUG -gt 0 ] && echo "Starting check_quorum subroutine." >> $DEBUGFILE NEED=`scstat -q | grep "votes needed:" | awk '{print $4}'` PRES=`scstat -q | grep "votes present:" | awk '{print $4}'` [ $DEBUG -gt 0 ] && echo "Quorum needed: $NEED" >> $DEBUGFILE [ $DEBUG -gt 0 ] && echo "Quorum present: $PRES" >> $DEBUGFILE [ $TESTING -gt 0 ] && PRES="0" if [ $PRES -ge $NEED ] then [ $DEBUG -gt 0 ] && echo "Enough quorum votes." >> $DEBUGFILE scstat -q | grep "votes:" | awk '{print $3" "$6}' | while read VOTE STATUS do [ $DEBUG -gt 0 ] && echo "Vote: $VOTE has status $STATUS." >> $DEBUGFILE if [ $STATUS != "Online" ] then echo "NOK - Quorum vote $VOTE not available." exit $STATE_WARNING fi done else [ $DEBUG -gt 0 ] && echo "Not enough quorum." >> $DEBUGFILE echo "NOK - Not enough quorum votes present." exit $STATE_CRITICAL fi } function check_resource_groups { [ $DEBUG -gt 0 ] && echo "Starting check_resource_groups subroutine." >> $DEBUGFILE scstat -g | grep "Group:" | awk '{print $2}' | sort -u | while read GROUP do ONLINE=`scstat -g | grep "Group: $GROUP" | grep "Online" | wc -l` WEIRD=`scstat -g | grep "Group: $GROUP" | grep -v "Resources" | grep -v "Online" | grep -v "Offline" | wc -l` [ $DEBUG -gt 0 ] && echo "Resource Group $GROUP has $ONLINE instances online." >> $DEBUGFILE [ $DEBUG -gt 0 ] && echo "Resource Group $GROUP has $WEIRD instances in a weird state." >> $DEBUGFILE [ $TESTING -gt 0 ] && ONLINE="0" if [ $ONLINE -lt 1 ] then echo "NOK - Resource group $GROUP not online." exit $STATE_CRITICAL fi if [ $WEIRD -gt 1 ] then echo "NOK - Resource group $GROUP is an unstable state." exit $STATE_WARNING fi done } function check_resource_grp { [ $DEBUG -gt 0 ] && echo "Starting check_resource_grp subroutine." >> $DEBUGFILE [ $DEBUG -gt 0 ] && echo "Selected group: $RGROUP" >> $DEBUGFILE ONLINE=`scstat -g | grep $RGROUP | grep "Online" | wc -l` WEIRD=`scstat -g | grep $RGROUP | grep -v "Resources" | grep -v "Online" | grep -v "Offline" | wc -l` [ $DEBUG -gt 0 ] && echo "Resource Group $GROUP has $ONLINE instances online." >> $DEBUGFILE [ $DEBUG -gt 0 ] && echo "Resource Group $GROUP has $WEIRD instances in a weird state." >> $DEBUGFILE [ $TESTING -gt 0 ] && ONLINE="0" if [ $ONLINE -lt 1 ] then echo "NOK - Resource group $RGROUP not online." exit $STATE_CRITICAL fi if [ $WEIRD -gt 1 ] then echo "NOK - Resource group $RGROUP is in an unstable state." exit $STATE_WARNING fi } function check_resources { [ $DEBUG -gt 0 ] && echo "Starting check_resources subroutine." >> $DEBUGFILE RESOURCES=`scstat -g | grep "Resource:" | awk '{print $2}' | sort -u` [ $DEBUG -gt 0 ] && echo "List of resources to check: $RESOURCES" >> $DEBUGFILE for RESOURCE in `echo $RESOURCES` do ONLINE=`scstat -g | grep "Resource: $RESOURCE" | awk '{print $4}' | grep "Online" | wc -l` WEIRD=`scstat -g | grep "Resource: $RESOURCE" | awk '{print $4}' | grep -v "Online" | grep -v "Offline" | wc -l` [ $DEBUG -gt 0 ] && echo "Resource $RESOURCE has $ONLINE instances online." >> $DEBUGFILE [ $DEBUG -gt 0 ] && echo "Resource $RESOURCE has $WEIRD instances in a weird state." >> $DEBUGFILE [ $TESTING -gt 0 ] && ONLINE="0" if [ $ONLINE -lt 1 ] then echo "NOK - Resource $RESOURCE not online." exit $STATE_CRITICAL fi if [ $WEIRD -gt 1 ] then echo "NOK - Resource $RESOURCE is in an unstable state." exit $STATE_WARNING fi done } function check_rsrce { [ $DEBUG -gt 0 ] && echo "Starting check_rsrce subroutine." >> $DEBUGFILE [ $DEBUG -gt 0 ] && echo "Selected resource: $RSRCE" >> $DEBUGFILE ONLINE=`scstat -g | grep "Resource: $RSRCE" | awk '{print $4}' | grep "Online" | wc -l` WEIRD=`scstat -g | grep "Resource: $RSRCE" | awk '{print $4}' | grep -v "Online" | grep -v "Offline" | wc -l` [ $DEBUG -gt 0 ] && echo "Resource $RESOURCE has $ONLINE instances online." >> $DEBUGFILE [ $DEBUG -gt 0 ] && echo "Resource $RESOURCE has $WEIRD instances in a weird state." >> $DEBUGFILE [ $TESTING -gt 0 ] && ONLINE="0" if [ $ONLINE -lt 1 ] then echo "NOK - Resource $RESOURCE not online." exit $STATE_CRITICAL fi if [ $WEIRD -gt 1 ] then echo "NOK - Resource $RESOURCE is in an unstable state." exit $STATE_WARNING fi } function check_ipmp { [ $DEBUG -gt 0 ] && echo "Starting check_ipmp subroutine." >> $DEBUGFILE scstat -i | grep "IPMP Group:" | awk '{print $3" "$5}' | while read GROUP STATUS do [ $DEBUG -gt 0 ] && echo "IPMP Group: $GROUP has status $STATUS" >> $DEBUGFILE if [ $STATUS != "Online" ] then echo "NOK - IPMP group $GROUP not online." exit $STATE_CRITICAL fi if [ $TESTING -gt 0 ] then echo "NOK - IPMP group $GROUP not online." exit $STATE_CRITICAL fi done } ### THE MAIN ROUTINE FINALLY STARTS ### [ $DEBUG -gt 0 ] && echo "Starting main routine." >> $DEBUGFILE if [ $# -lt 1 ] then print_usage exit $STATE_UNKNOWN fi [ $DEBUG -gt 0 ] && echo "More than one argument." >> $DEBUGFILE [ $DEBUG -gt 0 ] && echo "" >> $DEBUGFILE case "$1" in --help) print_help; exit $STATE_OK;; -h) print_help; exit $STATE_OK;; -t) check_transport_paths;; -q) check_quorum;; -g) check_resource_groups;; -G) RGROUP="$2"; check_resource_grp;; -r) check_resources;; -R) RSRCE="$2"; check_rsrce;; -i) check_ipmp;; *) print_usage; exit $STATE_UNKNOWN;; esac [ $DEBUG -gt 0 ] && echo "No problems. Exiting normally." >> $DEBUGFILE # None of the other subroutines forced us to exit 1 before here, so let's quit with a 0. echo "OK - Everything running like it should" exit $STATE_OK
#!/usr/bin/bash function testrun() { echo "Running without parameters." /usr/local/nagios/libexec/check_suncluster echo "Exit code is $?." echo "" echo "Testing transport paths." /usr/local/nagios/libexec/check_suncluster -t echo "Exit code is $?." echo "" echo "Quorum votes." /usr/local/nagios/libexec/check_suncluster -q echo "Exit code is $?." echo "" echo "Checking all resource groups." /usr/local/nagios/libexec/check_suncluster -g echo "Exit code is $?." echo "" echo "Checking individual resource groups." for GROUP in `scstat -g | grep "Group:" | awk '{print $2}' | sort -u` do echo "Running for group $GROUP." /usr/local/nagios/libexec/check_suncluster -G $GROUP echo "Exit code is $?." echo "" done echo "Checking all resources." /usr/local/nagios/libexec/check_suncluster -r echo "Exit code is $?." echo "" echo "Checking all resources." for RESOURCE in `scstat -g | grep "Resource:" | awk '{print $2}' | sort -u` do echo "Running for resource $RESOURCE." /usr/local/nagios/libexec/check_suncluster -R $RESOURCE echo "Exit code is $?." echo "" done echo "Checking IPMP groups." /usr/local/nagios/libexec/check_suncluster -i echo "Exit code is $?." echo "" } function breakstuff() { # Now we'll start breaking things!! echo "" echo "Now it's time to start breaking things! Gruaargh!" echo "Mind you, it's all fake and simulated. I am not changing -anything-" echo "about the cluster itself." echo "" echo "Nyo!" > /tmp/neko-wa-baka } echo "Starting clean" rm /tmp/neko-wa-baka /tmp/foobar >/dev/null 2>&1 echo "" testrun breakstuff testrun echo "Starting clean at the end" rm /tmp/neko-wa-baka >/dev/null 2>&1 echo ""
kilala.nl tags: nagios, unix, programming,
View or add comments (curr. 2)
2006-06-01 00:00:00
This script was written while I was hired by KPN i-Diensten. It is reproduced/shared here with their permission.
One of the things we've been looking into recently, is running the standard Nagios plugins through SNMP instead of through NRPE. Putting aside the discussion of the various merits and flaws such a solution has, let's say that it works nicely.
How do you do this?
In your snmpd.conf add a line like:
exec .1.3.6.1.4.1.6886.4.1.1 check_load /usr/local/nagios/libexec/check_load
exec .1.3.6.1.4.1.6886.4.1.2 check_mem /usr/local/nagios/libexec/check_mem –w 85 –c 95
exec .1.3.6.1.4.1.6886.4.1.3 check_swap /usr/local/nagios/libexec/check_swap -w 15% -c 5%
What this does, is tell the SNMP daemon to run the check_load script when someone asks for object .1.3.6.1.4.1.6886.4.1.1 (or .2, or .3). The exit code for the script will be place in OID.100.0 and the first line of output will be placed in OID.101.1. This script retrieves those two values through SNMP and returns them to Nagios.
Your checkcommands.cfg should contain something like:
define command{
command_name retrieve_custom_snmp
command_line $USER1$/retrieve_custom_snmp -H $HOSTADDRESS$ -o $ARG1$
}
The "-o" parameter takes the OID you have selected for your custom check.
Now... How do you select an OID? There's two ways:
1. The WRONG way = randomly selecting some OID. You might pick an OID which is needed for other monitoring purposes in your network.
2. The RIGHT way = requesting a private Enterprise ID for your company at IANA. You are free to build an SNMP tree beneath this EID. For example, the EID 6886 mentioned above is registered to KPN (my current client). The sub-tree .4.1 contains all OIDs referring to Nagios checks performed by my department.
Before sending out that request, please check the current EID list to see if you company already owns a private subtree. If that's the case, contact the "owner" to request your own part of the subtree.
UPDATE (2006-10-02):
Thanks to the kind folks on the Nagios Users ML I've found out that my original version of the script was totally bug-ridden. I've made a big bunch of adjustments and now the script should work properly. Thanks especially to Andreas Ericsson.
#!/bin/bash # # Script to retrieve custom SNMP objects set using the "exec" handler # Written by Thomas Sluyter (nagiosATkilalaDOTnl) # By request of KPN-IS, i-Provide, the Netherlands # Last Modified: 18-07-2006 # # Usage: ./retrieve_custom_snmp # # Description: # On our Nagios client systems we use a lot of custom MIB OIDs which are # registered under our own Enterprise ID. A whole bunch of the # original Nagios script are run through the SNMP daemon and their exit # codes and output are appended to specific OID. This all happens using the # SNMP "exec" handler. # Unfortunately the default check_snmp script doesn't allow for easy # handling of these objects, so I hacked together a quick script. # # So basically this script doesn't do any checking. It just retrieves # information :) # # Limitations: # This script should work properly on all implementations of Linux, Solaris # and Mac OS X. # # Output: # The exit code is the exit code retrieved from OID.100.1. It is temporarily # stored in $EXITCODE. # The output string is the string retrieved from OID.101.1. It is tempo- # rarily stored in $OUTPUT. # # Other notes: # If you ever run into problems with the script, set the DEBUG variable # to 1. I'll need the output the script generates to do troubleshooting. # See below for details. # I realise that all the debugging commands strewn throughout the script # may make things a little harder to read. But in the end I'm sure it was # well worth adding them. It makes troubleshooting so much easier. :3 # Also, for some reason the case statement with the shifts (to detect # passed options) doesn't seem to be working right. FIXME! # # Check command definition: # define command{ # command_name retrieve_custom_snmp # command_line $USER1$/retrieve_custom_snmp -H $HOSTADDRESS$ -o $ARG1$ # } # # You may have to change this, depending on where you installed your # Nagios plugins PATH="/usr/bin:/usr/sbin:/bin:/sbin" LIBEXEC="/usr/local/nagios/libexec" . $LIBEXEC/utils.sh PROGNAME="retrieve_custom_snmp" COMMUNITY="public" [ `uname` == "SunOS" ] && SNMPGET="/usr/local/bin/snmpget -Oqv -v 2c -c $COMMUNITY" [ `uname` == "Darwin" ] && SNMPGET="/usr/bin/snmpget -Oqv -v 2c -c $COMMUNITY" [ `uname` == "Linux" ] && SNMPGET="/usr/bin/snmpget -Oqv -v 2c -c $COMMUNITY" ### DEBUGGING SETUP ### # Cause you never know when you'll need to squash a bug or two DEBUG="0" if [ $DEBUG -gt 0 ] then DEBUGFILE="/tmp/foobar" rm $DEBUGFILE >/dev/null 2>&1 fi ### REQUISITE NAGIOS COMMAND LINE STUFF ### print_usage() { echo "Usage: $PROGNAME -H hostname -o OID" echo "Usage: $PROGNAME --help" } print_help() { echo "" print_usage echo "" echo "Script to retrieve the status for custom SNMP objects." echo "" echo "This plugin not developped by the Nagios Plugin group." echo "Please do not e-mail them for support on this plugin, since" echo "they won't know what you're talking about :P" echo "" echo "For contact info, read the plugin itself..." } while test -n "$1"; do case "$1" in --help) print_help exit $STATE_OK ;; -h) print_help exit $STATE_OK ;; -H) HOST=$2 shift ;; -o) OID=$2 STATUS="$OID.100.1" STRING="$OID.101.1" shift ;; *) echo "Unknown argument: $1" print_usage exit $STATE_UNKNOWN ;; esac shift done ### FINALLY... RETRIEVING THE VALUES ### EXITCODE=`$SNMPGET $HOST $STATUS` [ $DEBUG -gt 0 ] && echo "Retrieve exit code is $EXITCODE" >> $DEBUGFILE OUTPUT=`$SNMPGET $HOST $STRING | sed 's/"//g'` [ $DEBUG -gt 0 ] && echo "Retrieve status message is: $OUTPUT" >> $DEBUGFILE echo $OUTPUT exit $EXITCODE
kilala.nl tags: nagios, unix, programming,
View or add comments (curr. 0)
2006-06-01 00:00:00
This script was written at the time I was hired by UPC / Liberty Global.
Basic monitor to check percentage of used physical RAM.
This script was quickly hacked together for my current customer, as a Q&D solution for their monitoring needs. It's no beauty, but it works. Written in ksh and tested with:
UPDATE 19/06/2006:
Cleaned up the script a bit and added some checks that are considered the Right Thing to do. Should have done this -way- earlier!
I've also -finally- changed the script so that it takes the Warning and Critical percentages from the command line.
UPDATE 15/07/2006:
Whoops... I just noticed that the file had gone missing <3
#!/bin/ksh # # Free physical RAM monitor plugin for Nagios # Written by Thomas Sluyter (nagiosATkilalaDOTnl) # By request of DTV Labs, Liberty Global, the Netherlands # Last Modified: 20-10-2006 # # Usage: ./check_ram # # Description: # This plugin determines how much of the physical RAM in the # system is in use. # # Limitations: # Currently this plugin will only function correctly on Solaris systems. # And it really is only usefull at DTV Labs. # # Output: # The script returns either a WARN or a CRIT, depending on the # percentage of free physical memory. # # Enabling the following dumps information into DEBUGFILE at various # stages during the execution of this script. DEBUG="1" DEBUGFILE="/tmp/foobar" rm $DEBUGFILE >/dev/null 2>&1 echo "Starting script check_ram." > $DEBUGFILE # Host OS check and warning message if [ `uname` != "SunOS" ] then echo "WARNING:" echo "This script was originally written for use on Solaris." echo "You may run into some problems running it on this host." echo "" echo "Please verify that the script works before using it in a" echo "live environment. You can easily disable this message after" echo "testing the script." echo "" exit 1 fi # You may have to change this, depending on where you installed your # Nagios plugins PATH="/usr/bin:/usr/sbin:/bin:/usr/local/bin:/sbin" LIBEXEC="/usr/local/nagios/libexec" . $LIBEXEC/utils.sh print_usage() { echo "Usage: $PROGNAME warning-percentage critical-percentage" echo "" echo "e.g. : $PROGNAME 15 5" echo "This will start alerting when more than 85% of RAM has" echo "been used." echo "" } print_help() { echo "" print_usage echo "" echo "Free physical RAM plugin for Nagios" echo "" echo "This plugin not developped by the Nagios Plugin group." echo "Please do not e-mail them for support on this plugin, since" echo "they won't know what you're talking about :P" echo "" echo "For contact info, read the plugin itself..." } if [ $# -lt 2 ]; then print_help; exit $STATE_WARNING;fi case "$1" in --help) print_help; exit $STATE_OK;; -h) print_help; exit $STATE_OK;; *) if [ $# -lt 2 ]; then print_help; exit $STATE_WARNING;fi ;; esac RAM_WARN=$1 RAM_CRIT=$2 [ $DEBUG -gt 0 ] && echo "Warning and Critical percentages are $RAM_WARN and $RAM_CRIT." >> $DEBUGFILE if [ $RAM_WARN -le RAM_CRIT ] then echo "Warning percentage should be larger than critical percentage." exit $STATE_WARNING fi check_space() { [ $DEBUG -gt 0 ] && echo "Starting check_space." >> $DEBUGFILE TOTALSPACE=0 TOTALSPACE=`prtconf | grep ^"Memory size" | awk '{print $3}'` [ $DEBUG -gt 0 ] && echo "Total space is $TOTALSPACE." >> $DEBUGFILE TOTALFREE=0 TOTALFREE=`vmstat 2 2 | tail -1 | awk '{print $5}'` [ $DEBUG -gt 0 ] && echo "Free space is $TOTALFREE." >> $DEBUGFILE let TOTALFREE=$TOTALFREE/1000 [ $DEBUG -gt 0 ] && echo "Free space, div1000 is $TOTALFREE." >> $DEBUGFILE } check_percentile() { [ $DEBUG -gt 0 ] && echo "Starting check_percentile." >> $DEBUGFILE FRACTION=`echo "scale=2; $TOTALFREE/$TOTALSPACE" | bc` [ $DEBUG -gt 0 ] && echo "Fraction is $FRACTION." >> $DEBUGFILE PERCENT=`echo "scale=2; $FRACTION*100" | bc | awk -F. '{print $1}'` [ $DEBUG -gt 0 ] && echo "Percentile is $PERCENT." >> $DEBUGFILE if [ $PERCENT -lt $RAM_CRIT ]; then [ $DEBUG -gt 0 ] && echo "$PERCENT is smaller than $RAM_CRIT. Critical." >> $DEBUGFILE echo "RAM NOK - Less than $RAM_CRIT % of physical RAM is unused." exitstatus=$STATE_CRITICAL exit $exitstatus fi if [ $PERCENT -lt $RAM_WARN ]; then [ $DEBUG -gt 0 ] && echo "$PERCENT is smaller than $RAM_WARN. Warning." >> $DEBUGFILE echo "RAM NOK - Less than $RAM_WARN % of physical RAM is unused." exitstatus=$STATE_WARNING exit $exitstatus fi } check_space check_percentile [ $DEBUG -gt 0 ] && echo "$PERCENT is greater than $RAM_WARN. OK." >> $DEBUGFILE echo "RAM OK - $TOTALFREE MB out of $TOTALSPACE MB RAM unused." exitstatus=$STATE_OK exit $exitstatus
kilala.nl tags: nagios, unix, programming,
View or add comments (curr. 0)
2006-06-01 00:00:00
This script was written at the time I was hired by KPN i-Diensten. It is reproduced/shared here with their permission.
A very simply script that takes a list of processes, instead of a single processes name (as is the case with check_process). This should make monitoring a basic list of processes a lot easier. I really should change the script in such a way that it takes the process list from the command line, instead of from the $LIST variable that's defined internally. I'll do that when I have the time.
Until I've made those change, I use the script by copying check_processes to a new file which is used specifically for one purpose. For example check_linux_processes and check_solaris_processes check a list of processes that should be up and running on Linux and Solaris respectively.
This check script should work on just about any UNIX OS.
#!/bin/bash # # Process monitor plugin for Nagios # Written by Thomas Sluyter (nagiosATkilalaDOTnl) # By request of KPN-IS, i-Provide, the Netherlands # Last Modified: 13-07-2006 # # Usage: ./check_solaris_processes # # Description: # This script couldn't be simpler than it is. It just checks to see # whether a predefined list of processes is up and running. # # Limitations: # This script should work properly on all implementations of Linux, Solaris # and Mac OS X. # # Output: # If there one of the processes is down, a CRIT is issued. # # You may have to change this, depending on where you installed your # Nagios plugins PROGNAME="check_linux_processes" PATH="/usr/bin:/usr/sbin:/bin:/sbin" LIBEXEC="/usr/local/nagios/libexec" . $LIBEXEC/utils.sh ### DEFINING THE PROCESS LIST ### LIST="init" ### REQUISITE NAGIOS COMMAND LINE STUFF ### print_usage() { echo "Usage: $PROGNAME" echo "Usage: $PROGNAME --help" } print_help() { echo "" print_usage echo "" echo "Basic processes list monitor plugin for Nagios" echo "" echo "This plugin not developped by the Nagios Plugin group." echo "Please do not e-mail them for support on this plugin, since" echo "they won't know what you're talking about :P" echo "" echo "For contact info, read the plugin itself..." } while test -n "$1" do case "$1" in --help) print_help; exit $STATE_OK;; -h) print_help; exit $STATE_OK;; *) print_usage; exit $STATE_UNKNOWN;; esac done ### FINALLY THE MAIN ROUTINE ### COUNT="0" DOWN="" for PROCESS in `echo $LIST` do if [ `ps -ef | grep -i $PROCESS | grep -v grep | wc -l` -lt 1 ] then let COUNT=$COUNT+1 DOWN="$DOWN $PROCESS" fi done if [ $COUNT -gt 0 ] then echo "NOK - $COUNT processes not running: $DOWN" exit $STATE_CRITICAL fi # Nothing caused us to exit early, so we're okay. echo "OK - All requisite processes running." exit $STATE_OK
kilala.nl tags: nagios, unix, programming,
View or add comments (curr. 2)
2006-06-01 00:00:00
This script was written at the time I was hired by KPN i-Diensten. It is reproduced/shared here with their permission.
As far as I know there was no Nagios plugin that allowed you to really check your client configuration. I mean, it would be nice to know for sure that all your systems are syncing against the proper server... Wouldn't it?
The script was tested on Redhat ES3, Mac OS X and Solaris. Its basic requirement is the bash shell.
EDIT:
Oh! Just like my other recent Nagios scripts, check_ntp_config comes with a debugging option. Set $DEBUG at the top of the file to anything larger than zero and the script will dump information at various stages of its execution.
#!/usr/bin/bash # # CPU load monitor plugin for Nagios # Written by Thomas Sluyter (nagiosATkilalaDOTnl) # By request of KPN-IS, i-Provide, the Netherlands # Last Modified: 10-07-2006 # # Usage: ./check_ntp_config # # Description: # Well, there's not much to tell. We have no way of making sure that our # NTP clients are all configured in the right way, so I thought I'd make # a Nagios check for it. ^_^ # You can change the NTP config at the top of this script, to match your # own situation. # # Limitations: # This script should work properly on all implementations of Linux, Solaris # and Mac OS X. # # Output: # If the NTP client config does not match what has been defined at the # top of this script, the script will return a WARN. # # Other notes: # If you ever run into problems with the script, set the DEBUG variable # to 1. I'll need the output the script generates to do troubleshooting. # See below for details. # I realise that all the debugging commands strewn throughout the script # may make things a little harder to read. But in the end I'm sure it was # well worth adding them. It makes troubleshooting so much easier. :3 # # You may have to change this, depending on where you installed your # Nagios plugins PATH="/usr/bin:/usr/sbin:/bin:/sbin" LIBEXEC="/usr/local/nagios/libexec" . $LIBEXEC/utils.sh ### DEFINING THE NTP CLIENT CONFIGURATION AS IT SHOULD BE ### NTP_SERVER="ntp.wxs.nl" ### DEBUGGING SETUP ### # Cause you never know when you'll need to squash a bug or two DEBUG="0" if [ $DEBUG -gt 0 ] then DEBUGFILE="/tmp/foobar" rm $DEBUGFILE >/dev/null 2>&1 fi ### REQUISITE NAGIOS COMMAND LINE STUFF ### print_usage() { echo "Usage: $PROGNAME" echo "Usage: $PROGNAME --help" } print_help() { echo "" print_usage echo "" echo "NTP client configuration monitor plugin for Nagios" echo "" echo "This plugin not developped by the Nagios Plugin group." echo "Please do not e-mail them for support on this plugin, since" echo "they won't know what you're talking about :P" echo "" echo "For contact info, read the plugin itself..." } while test -n "$1" do case "$1" in --help) print_help; exit $STATE_OK;; -h) print_help; exit $STATE_OK;; *) print_usage; exit $STATE_UNKNOWN;; esac done ### DEFINING SUBROUTINES ### function gather_config() { case `uname` in Linux) CFGFILE="/etc/ntp.conf"; IP_SERVER=`host $NTP_SERVER | awk '{print $4}'` ;; SunOS) CFGFILE="/etc/inet/ntpd.conf"; IP_SERVER=`getent hosts $NTP_SERVER | awk '{print $2}'`;; Darwin) CFGFILE="/etc/ntp.conf"; IP_SERVER=`host $NTP_SERVER | awk '{print $4}'` ;; *) ;; esac REAL_SERVER=`cat $CFGFILE | grep ^server | awk '{print $2}'` [ $DEBUG -gt 0 ] && echo "Gather_config: Host name for required server is $NTP_SERVER." >> $DEBUGFILE [ $DEBUG -gt 0 ] && echo "Gather_config: IP address for required server is $IP_SERVER." >> $DEBUGFILE [ $DEBUG -gt 0 ] && echo "Gather_config: currently configured server is $REAL_SERVER." >> $DEBUGFILE } function check_config() { if [ $REAL_SERVER != $NTP_SERVER ] then if [ $REAL_SERVER != $IP_SERVER ] then echo "NOK - NTP client is not configured to speak to $NTP_SERVER" exit $STATE_WARNING fi fi } ### FINALLY, THE MAIN ROUTINE ### gather_config check_config # Nothing caused us to exit early, so we're okay. echo "OK - NTP client configured correctly." exit $STATE_OK
kilala.nl tags: nagios, unix, programming,
View or add comments (curr. 0)
2006-06-01 00:00:00
This script was written at the time I was hired by KPN i-Diensten. It is reproduced/shared here with their permission.
At $CLIENT we've often run into problems with the NSCA daemon, where the daemon would not crash per se, but where it would also not process incoming service checks. The nsca process was still running, but it simply wasn't transferring the incoming results to the Nagios command file.
I was amazed to find that nobody else had written a script to do this! So I quickly wrote one.
#!/usr/bin/bash # # NSCA Nagios service results monitor plugin for Nagios # Written by Thomas Sluyter (nagiosATkilalaDOTnl) # By request of KPN-IS, i-Provide, the Netherlands # Last Modified: 16-08-2006 # # Usage: ./check_nsca # # Description: # Aside from checking whether the NSCA process is still running, this script # also attempts to insert a message into the Nagios queue. After sending a # message to the NSCA daemon, it will verify that the message is received by # Nagios, by checking the nagios.log file. # # Limitations: # This script should work properly on all implementations of Linux, Solaris # and Mac OS X. # # Output: # If the NSCA daemon, or something along the message path, is borked, a # CRIT message will be issued. # # You may have to change this, depending on where you installed your # Nagios plugins PROGNAME="check_nsca" PATH="/usr/bin:/usr/sbin:/bin:/sbin" NAGIOSHOME="/usr/local/nagios" LIBEXEC="$NAGIOSHOME/libexec" NAGVAR="$NAGIOSHOME/var" NAGBIN="$NAGIOSHOME/bin" NAGETC="$NAGIOSHOME/etc" . $LIBEXEC/utils.sh ### REQUISITE NAGIOS COMMAND LINE STUFF ### print_usage() { echo "Usage: $PROGNAME" echo "Usage: $PROGNAME --help" } print_help() { echo "" print_usage echo "" echo "NSCA Nagios service results monitor plugin for Nagios" echo "" echo "This plugin not developped by the Nagios Plugin group." echo "Please do not e-mail them for support on this plugin, since" echo "they won't know what you're talking about :P" echo "" echo "For contact info, read the plugin itself..." } while test -n "$1" do case "$1" in --help) print_help; exit $STATE_OK;; -h) print_help; exit $STATE_OK;; *) print_usage; exit $STATE_UNKNOWN;; esac done ### PLATFORM INDEPENDENCE ### case `uname` in Linux) PSLIST="ps -ef";; SunOS) PSLIST="ps -ef";; Darwin) PSLIST="ps -ajx";; *) ;; esac ### CHECKING FOR THE NSCA PROCESS ### [ `$PSLIST | grep nsca | grep -v grep | wc -l` -lt 1 ] && (echo "NSCA process not running."; exit $STATE_CRITICAL) ### INSERTING A TEST MESSAGE ### DATE=`date +%Y%m%d%H%M` STRING="`hostname`\tFOOBAR\t0\t$DATE This is a test of the emergency broadcast system.\n" echo -e "$STRING" | $NAGBIN/send_nsca -H localhost -c $NAGETC/send_nsca.cfg >/dev/null 2>&1 ### CHECKING THE NAGIOS LOG FILE ### sleep 10 if [ `tail -1000 $NAGVAR/nagios.log | grep "emergency broadcast system" | grep $DATE | wc -l` -lt 1 ] then # Giving it a second try sleep 10 if [ `tail -5000 $NAGVAR/nagios.log | grep "emergency broadcast system" | grep $DATE | wc -l` -lt 1 ] then echo "NSCA daemon not processing check results." exit $STATE_CRITICAL fi fi ### EXITING NORMALLY ### echo "OK - NSCA working like it should." exit $STATE_OK
kilala.nl tags: nagios, unix, programming,
View or add comments (curr. 2)
2006-06-01 00:00:00
This script was written at the time I was hired by KPN i-Diensten. It is reproduced/shared here with their permission.
There really isn't much to say... This script is so fscking basic that it shames me to even put it up here among all the other projects
#!/usr/bin/bash # # NFS stale mounts monitor plugin for Nagios # Written by Thomas Sluyter (nagiosATkilalaDOTnl) # By request of KPN-IS, i-Provide, the Netherlands # Last Modified: 13-07-2006 # # Usage: ./check_nfs_stale # # Description: # This script couldn't be simpler than it is. It just checks to see # whether there are any stale NFS mounts present on the system. # # Limitations: # This script should work properly on all implementations of Linux, Solaris # and Mac OS X. # # Output: # If there are stale NFS mounts, a CRIT is issued. # # You may have to change this, depending on where you installed your # Nagios plugins PROGNAME="check_nfs_stale" PATH="/usr/bin:/usr/sbin:/bin:/sbin" LIBEXEC="/usr/local/nagios/libexec" . $LIBEXEC/utils.sh ### REQUISITE NAGIOS COMMAND LINE STUFF ### print_usage() { echo "Usage: $PROGNAME" echo "Usage: $PROGNAME --help" } print_help() { echo "" print_usage echo "" echo "NFS stale mounts monitor plugin for Nagios" echo "" echo "This plugin not developped by the Nagios Plugin group." echo "Please do not e-mail them for support on this plugin, since" echo "they won't know what you're talking about :P" echo "" echo "For contact info, read the plugin itself..." } while test -n "$1" do case "$1" in --help) print_help; exit $STATE_OK;; -h) print_help; exit $STATE_OK;; *) print_usage; exit $STATE_UNKNOWN;; esac done [ `df -k | grep "Stale NFS file handle" | wc -l` -gt 0 ] && (echo "NOK - Stale NFS mounts."; exit $STATE_CRITICAL) # Nothing caused us to exit early, so we're okay. echo "OK - No stale NFS mounts." exit $STATE_OK
kilala.nl tags: nagios, unix, programming,
View or add comments (curr. 0)
2006-06-01 00:00:00
This script was written at the time I was hired by KPN i-Diensten. It is reproduced/shared here with their permission.
I couldn't find an easy way to check whether all interfaces of a host are up and running from the -inside-, so I wrote a Nagios plugin to do this.
Naturally you could also try to ping all of the IP addresses of all of these network cards, but this isn't always possible. Lord knows how many routing issues I had fight through to get our current IP set monitored. I guess using this script is a bit easier :)
The script was tested on Redhat ES3, Mac OSX and Solaris. Its basic requirement is the Korn shell (due to some conversions happening inside the script). On Linux/RH you'll need mii-tool (and sudo) and on Solaris you'll need Perl (for one lousy piece of math :p ).
EDIT:
Oh! Just like my other recent Nagios scripts, check_networking comes with a debugging option. Set $DEBUG at the top of the file to anything larger than zero and the script will dump information at various stages of its execution.
#!/usr/bin/ksh # # Basic UNIX networking check script. # Written by Thomas Sluyter (nagiosATkilalaDOTnl) # By request of KPN-IS, i-Provide SYS, the Netherlands # Last Modified: 22-06-2006 # # Usage: ./check_networking # # Description: # This plugin determines whether the local host's network interfaces # are all up and running like they should. It uses the following # questions to determine this. # * Does /sbin/mii-tool report any problems? (Linux only) # * Are the gateways for each subnet pingable? # # Limitations: # * I have no clue whether mii-tool is something specific to Redhat ES3, # or whether all Linii have it. # * Sudo access to mii-tool is required for the nagios account. # * Perl is required on Solaris, to do just tiny bit of math. # * KSH is required. # * The script assumes that the first available IP from a subnet is the # router. # # Output: # The script retunrs a CRIT when one of the criteria mentioned # above is not matched. # # Other notes: # I wish I'd learn Perl. I'm sure that doing all of this stuff in Perl # would have cut down on the size of this script tremendously. Ah well. # If you ever run into problems with the script, set the DEBUG variable # to 1. I'll need the output the script generates to do troubleshooting. # See below for details. # I realise that all the debugging commands strewn throughout the script # may make things a little harder to read. But in the end I'm sure it was # well worth adding them. It makes troubleshooting so much easier. :3 # # Enabling the following dumps information into DEBUGFILE at various # stages during the execution of this script. DEBUG="0" DEBUGFILE="/tmp/foobar" ### REQUISITE NAGIOS USER INTERFACE STUFF ### # You may have to change this, depending on where you installed your # Nagios plugins PATH="/usr/bin:/usr/sbin:/bin:/sbin" LIBEXEC="/usr/local/nagios/libexec" . $LIBEXEC/utils.sh [ $DEBUG -gt 0 ] && rm $DEBUGFILE print_usage() { echo "Usage: $PROGNAME" echo "Usage: $PROGNAME --help" } print_help() { echo "" print_usage echo "" echo "Basic UNIX networking check plugin for Nagios" echo "" echo "This plugin not developped by the Nagios Plugin group." echo "Please do not e-mail them for support on this plugin, since" echo "they won't know what you're talking about :P" echo "" echo "For contact info, read the plugin itself..." } while test -n "$1" do case "$1" in --help) print_help; exit $STATE_OK;; -h) print_help; exit $STATE_OK;; *) print_usage; exit $STATE_UNKNOWN;; esac done ### SETTING UP THE ENVIRONMENT ### # Host OS check and warning message MIITOOL="0" if [ -f /sbin/mii-tool ] then MIITOOL="1" sudo /sbin/mii-tool >/dev/null 2>&1 if [ $? -gt 0 ] then echo "ERROR: sudo permissions" echo "" echo "This script requires that the Nagios user account has" echo "sudo permissions for the mii-tool command. Currently it" echo "does not have these permissions. Please fix this." echo "" exit $STATE_UNKNOWN fi fi ### SUB-ROUTINE DEFINITIONS ### function convert_base { typeset -i${2:-16} x x=$1 echo $x } function subnet_router { [ $DEBUG -gt 0 ] && echo "- Starting subnet_router -" >> $DEBUGFILE first="0"; second="0"; third="0"; fourth="0" first=`echo $1 | cut -c 1-8`; FIRST=`convert_base 2#$first 10` [ $DEBUG -gt 0 ] && echo "First: $first $FIRST" >> $DEBUGFILE second=`echo $1 | cut -c 9-16`; SECOND=`convert_base 2#$second 10` [ $DEBUG -gt 0 ] && echo "Second: $second $SECOND" >> $DEBUGFILE third=`echo $1 | cut -c 17-24`; THIRD=`convert_base 2#$third 10` [ $DEBUG -gt 0 ] && echo "Third: $third $THIRD" >> $DEBUGFILE fourth=`echo $1 | cut -c 25-32` [ `echo $fourth|wc -c` -gt 1 ] || fourth="0" TEMPCOUNT=`echo $fourth | wc -c | awk '{print $1}'` let PADDING=9-$TEMPCOUNT [ $DEBUG -gt 0 ] && echo "Fourth: padding fourth with $PADDING zeroes" >> $DEBUGFILE i=1 while ((i <= $PADDING)); do fourth=$fourth"0" let i=$i+1 done FOURTH=`convert_base 2#$fourth 10`; let FOURTH=$FOURTH+1 [ $DEBUG -gt 0 ] && echo "Fourth: $fourth $FOURTH" >> $DEBUGFILE echo "$FIRST.$SECOND.$THIRD.$FOURTH" } gather_interfaces_linux() { [ $DEBUG -gt 0 ] && echo "- Starting gather_interfaces_linux -" >> $DEBUGFILE for INTF in `ifconfig -a | grep ^[a-z] | grep -v ^lo | awk '{print $1}'` do if [ `echo $INTF | grep : | wc -l` -gt 0 ] then export INTERFACES="`echo $INTF|awk -F: '{print $1}'` $INTERFACES" else export INTERFACES="$INTF $INTERFACES" fi done INTFCOUNT=`echo $INTERFACES | wc -w` [ $DEBUG -gt 0 ] && echo "Interfaces: There are $INTFCOUNT interfaces: $INTERFACES." >> $DEBUGFILE if [ $INTFCOUNT -lt 1 ] then echo "NOK - No active network interfaces." exit $STATE_CRITICAL fi } gather_interfaces_darwin() { [ $DEBUG -gt 0 ] && echo "- Starting gather_interfaces_darwin -" >> $DEBUGFILE for INTF in `ifconfig -a | grep ^[a-z] | grep -v ^gif | grep -v ^stf | grep -v ^lo | awk '{print $1}'` do [ `echo $INTF | grep : | wc -l` -gt 0 ] && INTF=`echo $INTF|awk -F: '{print $1}'` [ `ifconfig $INTF | grep "status: inactive" | wc -l` -gt 0 ] && break INTERFACES="$INTF $INTERFACES" done INTFCOUNT=`echo $INTERFACES | wc -w` [ $DEBUG -gt 0 ] && echo "Interfaces: There are $INTFCOUNT interfaces: $INTERFACES." >> $DEBUGFILE if [ $INTFCOUNT -lt 1 ] then echo "NOK - No active network interfaces." exit $STATE_CRITICAL fi } gather_gateway_linux() { [ $DEBUG -gt 0 ] && echo "- Starting gather_gateway_linux for interface $1 -" >> $DEBUGFILE MASKBIN="" MASK=`ifconfig $1 | grep Mask | awk '{print $4}' | awk -F: '{print $2}'` for PART in `echo $MASK | awk -F. '{print $1" "$2" "$3" "$4}'` do MASKBIN="$MASKBIN`convert_base $PART 2 | awk -F# '{print $2}'`" done [ $DEBUG -gt 0 ] && echo "Mask: $MASK $MASKBIN" >> $DEBUGFILE BITCOUNT=`echo $MASKBIN | grep -o 1 | wc -l | awk '{print $1}'` [ $DEBUG -gt 0 ] && echo "Bitcount: $BITCOUNT" >> $DEBUGFILE IPBIN="" IP=`ifconfig $1 | grep "inet addr" | awk '{print $2}' | awk -F: '{print $2}'` for PART in `echo $IP | awk -F. '{print $1" "$2" "$3" "$4}'` do TEMPBIN=`convert_base $PART 2 | awk -F# '{print $2}'` TEMPCOUNT=`echo $TEMPBIN | wc -c | awk '{print $1}'` let PADDING=9-$TEMPCOUNT i=1 while ((i <= $PADDING)); do IPBIN=$IPBIN"0" let i=$i+1 done IPBIN=$IPBIN$TEMPBIN done [ $DEBUG -gt 0 ] && echo "IP address: $IP $IPBIN" >> $DEBUGFILE CUT="1-$BITCOUNT" [ $DEBUG -gt 0 ] && echo "Cutting: Cutting chars $CUT" >> $DEBUGFILE NETBIN=`echo $IPBIN | cut -c $CUT` [ $DEBUG -gt 0 ] && echo "Netbin: $NETBIN" >> $DEBUGFILE ROUTER=`subnet_router $NETBIN` [ $DEBUG -gt 0 ] && echo "Router: $ROUTER" >> $DEBUGFILE echo $ROUTER } gather_gateway_darwin() { [ $DEBUG -gt 0 ] && echo "- Starting gath_gateway_darwin for interface $1 -" >> $DEBUGFILE MASKBIN="" [ `uname` == "Darwin" ] && MASK=`ifconfig $1 | grep netmask | awk '{print $4}' | awk -Fx '{print $2}'` [ `uname` == "SunOS" ] && MASK=`ifconfig $1 | grep netmask | awk '{print $4}'` for PART in `echo 1 3 5 7` do let PLUSPART=$PART+1 MASKPART=`echo $MASK | cut -c $PART-$PLUSPART` MASKBIN="$MASKBIN`convert_base 16#$MASKPART 2 | awk -F# '{print $2}'`" done [ $DEBUG -gt 0 ] && echo "Mask: $MASK $MASKBIN" >> $DEBUGFILE BITCOUNT=`echo $MASKBIN | grep -o 1 | wc -l | awk '{print $1}'` [ $DEBUG -gt 0 ] && echo "Bitcount: $BITCOUNT" >> $DEBUGFILE IPBIN="" IP=`ifconfig $1 | grep "inet " | awk '{print $2}'` for PART in `echo $IP | awk -F. '{print $1" "$2" "$3" "$4}'` do TEMPBIN=`convert_base $PART 2 | awk -F# '{print $2}'` TEMPCOUNT=`echo $TEMPBIN | wc -c | awk '{print $1}'` let PADDING=9-$TEMPCOUNT i=1 while ((i <= $PADDING)); do TEMPBIN="0"$TEMPBIN let i=$i+1 done IPBIN=$IPBIN$TEMPBIN done [ $DEBUG -gt 0 ] && echo "IP address: $IP $IPBIN" >> $DEBUGFILE CUT="1-$BITCOUNT" [ $DEBUG -gt 0 ] && echo "Cutting: cutting chars $CUT" >> $DEBUGFILE NETBIN=`echo $IPBIN | cut -c $CUT` [ $DEBUG -gt 0 ] && echo "Netbin: $NETBIN" >> $DEBUGFILE ROUTER=`subnet_router $NETBIN` [ $DEBUG -gt 0 ] && echo "Router: $ROUTER" >> $DEBUGFILE echo $ROUTER } gather_gateway_sunos() { [ $DEBUG -gt 0 ] && echo "- Starting gath_gateway_solaris for interface $1 -" >> $DEBUGFILE MASKBIN="" [ `uname` == "Darwin" ] && MASK=`ifconfig $1 | grep netmask | awk '{print $4}' | awk -Fx '{print $2}'` [ `uname` == "SunOS" ] && MASK=`ifconfig $1 | grep netmask | awk '{print $4}'` for PART in `echo 1 3 5 7` do let PLUSPART=$PART+1 MASKPART=`echo $MASK | cut -c $PART-$PLUSPART` MASKBIN="$MASKBIN`convert_base 16#$MASKPART 2 | awk -F# '{print $2}'`" done [ $DEBUG -gt 0 ] && echo "Mask: $MASK $MASKBIN" >> $DEBUGFILE # This piece of kludge also requires that all tabs are removed from the beginning of each line. # Additional character needed to trick the counter below # Shitty thing is that it doesn't work. Stupid "let" aryth engine... #MASKBIN="$MASKBIN-" #[ $DEBUG -gt 0 ] && echo "Bitcount: kludged binmask is $MASKBIN" >> $DEBUGFILE # #IFS="1" #read TEMP << EOT #echo $MASKBIN #EOT #let "BITCOUNT=(${#TEMP[@]} - 1)" #IFS=" " # The kludge above was replaced by this one line of Perl. BITCOUNT=`echo $MASKBIN | perl -ne 'while(/1/g){++$count}; print "$count"'` [ $DEBUG -gt 0 ] && echo "Bitcount: $BITCOUNT" >> $DEBUGFILE IPBIN="" IP=`ifconfig $1 | grep "inet " | awk '{print $2}'` for PART in `echo $IP | awk -F. '{print $1" "$2" "$3" "$4}'` do [ $DEBUG -gt 0 ] && echo "IP part: converting part $PART" >> $DEBUGFILE TEMPBIN=`convert_base $PART 2 | awk -F# '{print $2}'` [ $DEBUG -gt 0 ] && echo "IP part: converted part is $TEMPBIN" >> $DEBUGFILE TEMPCOUNT=`echo $TEMPBIN | wc -c | awk '{print $1}'` [ $DEBUG -gt 0 ] && echo "IP part: this part is $TEMPCOUNT chars long." >> $DEBUGFILE let PADDING=9-$TEMPCOUNT [ $DEBUG -gt 0 ] && echo "IP part: will be padded with $PADDING zeroes" >> $DEBUGFILE i=1 while ((i <= $PADDING)); do TEMPBIN="0"$TEMPBIN let i=$i+1 done IPBIN=$IPBIN$TEMPBIN done [ $DEBUG -gt 0 ] && echo "IP address: $IP $IPBIN" >> $DEBUGFILE CUT="1-$BITCOUNT" [ $DEBUG -gt 0 ] && echo "Cutting: cutting chars $CUT" >> $DEBUGFILE NETBIN=`echo $IPBIN | cut -c $CUT` [ $DEBUG -gt 0 ] && echo "Netbin: $NETBIN" >> $DEBUGFILE ROUTER=`subnet_router $NETBIN` [ $DEBUG -gt 0 ] && echo "Router: $ROUTER" >> $DEBUGFILE echo $ROUTER } check_miitool() { [ $DEBUG -gt 0 ] && echo "- Starting check_miitool -" >> $DEBUGFILE COUNT="0" for INTF in `echo $INTERFACES` do [ `sudo /sbin/mii-tool $INTF | head -1 | grep -c ok` -gt 0 ] || let COUNT=$COUNT+1 [ `sudo /sbin/mii-tool $INTF | head -1 | grep -c 100baseTx-FD` -gt 0 ] || let COUNT=$COUNT+1 [ `sudo /sbin/mii-tool $INTF | head -1 | grep -c 1000baseTx-FD` -gt 0 ] || let COUNT=$COUNT+1 done [ $COUNT -gt $INTFCOUNT ] && (echo "NOK - Problem with one of the interfaces"; exit $STATE_CRITICAL) } check_ping() { [ $DEBUG -gt 0 ] && echo "- Starting check_ping -" >> $DEBUGFILE INTF="" for INTF in `echo $INTERFACES` do case `uname` in Linux) GATEWAY=`gather_gateway_linux $INTF`;; Darwin) GATEWAY=`gather_gateway_darwin $INTF`;; SunOS) GATEWAY=`gather_gateway_sunos $INTF`;; *) echo "OS not supported by this check."; exit 1;; esac [ $DEBUG -gt 0 ] && echo "Gateway: $GATEWAY" >> $DEBUGFILE ping -c 3 $GATEWAY >/dev/null 2>&1 if [ $? -gt 0 ] then echo "NOK - Problem pinging gateway $GATEWAY"; exit $STATE_CRITICAL fi done } ### THE MAIN ROUTINE FINALLY STARTS ### case `uname` in Linux) gather_interfaces_linux;; Darwin) gather_interfaces_darwin;; #SunOS) gather_interfaces_sunos;; SunOS) gather_interfaces_linux;; *) echo "OS not supported by this check."; exit 1;; esac [ $MIITOOL -eq 1 ] && check_miitool check_ping # None of the other subroutines forced us to exit 1 before here, so let's quit with a 0. echo "OK - Everything running like it should" exit $STATE_OK
kilala.nl tags: nagios, unix, programming,
View or add comments (curr. 0)
2006-06-01 00:00:00
This script was written at the time I was hired by UPC / Liberty Global.
Basic monitor to check whether BIND is up and running. It checks for a number of processes and tries to perform a basic lookup using the localhost.
This script was quickly hacked together for my current customer, as a Q&D solution for their monitoring needs. It's no beauty, but it works. Written in ksh and tested with:
A Critical is sent if:
A) one or more of the required processes is not running, or
B) the script is unable to perform a basic lookup using the localhost.
UPDATE 19/06/2006:
Cleaned up the script a bit and added some checks that are considered the Right Thing to do. Should have done this -way- earlier!
#!/usr/bin/bash # # DNS / Named process monitor plugin for Nagios # Written by Thomas Sluyter (nagiosATkilalaDOTnl) # By request of DTV Labs, Liberty Global, the Netherlands # Last Modified: 19-06-2006 # # Usage: ./check_named # # Description: # This plugin determines whether the named DNS server # is running properly. It will check the following: # * Are all required processes running? # * Is it possible to make DNS requests? # # Limitations: # Currently this plugin will only function correctly on Solaris systems. # # Output: # The script returns a CRIT when the abovementioned criteria are # not matched. # # Host OS check and warning message if [ `uname` != "SunOS" ] then echo "WARNING:" echo "This script was originally written for use on Solaris." echo "You may run into some problems running it on this host." echo "" echo "Please verify that the script works before using it in a" echo "live environment. You can easily disable this message after" echo "testing the script." echo "" fi # You may have to change this, depending on where you installed your # Nagios plugins PATH="/usr/bin:/usr/sbin:/bin:/sbin" LIBEXEC="/usr/local/nagios/libexec" . $LIBEXEC/utils.sh print_usage() { echo "Usage: $PROGNAME" echo "Usage: $PROGNAME --help" } print_help() { echo "" print_usage echo "" echo "Named DNS monitor plugin for Nagios" echo "" echo "This plugin not developped by the Nagios Plugin group." echo "Please do not e-mail them for support on this plugin, since" echo "they won't know what you're talking about :P" echo "" echo "For contact info, read the plugin itself..." } while test -n "$1" do case "$1" in --help) print_help; exit $STATE_OK;; -h) print_help; exit $STATE_OK;; *) print_usage; exit $STATE_UNKNOWN;; esac done check_processes() { PROCESS="0" if [ `ps -ef | grep named | grep -v grep | grep -v nagios | wc -l` -lt 1 ]; then echo "NAMED NOK - One or more processes not running" exitstatus=$STATE_CRITICAL exit $exitstatus fi } check_service() { SERVICE=0 nslookup www.google.com localhost >/dev/null 2>&1 if [ $? -eq 1 ]; then SERVICE=1;fi if [ $SERVICE -eq 1 ]; then echo "SQUID NOK - One or more TCP/IP ports not listening." exitstatus=$STATE_CRITICAL exit $exitstatus fi } check_processes check_service echo "NAMED OK - Everything running like it should" exitstatus=$STATE_OK exit $exitstatus
kilala.nl tags: nagios, unix, programming,
View or add comments (curr. 0)
2006-06-01 00:00:00
This script was written at the time I was hired by KPN i-Diensten. It is reproduced/shared here with their permission.
Today I made an improved version of the Nagios monitor "check_log2", which is now aptly called "check_log3". It includes all the improvements I originally added to "check_log2", so you can simply use this as a drop-in replacement.
Version 3 of this script gives you the option to add a second query to the monitor.
The previous two incarnations of the script only allowed you to search for one query and would return a Critical if it was found. Now you can also add a query which will return in a Warning message as well. Goody! :3
1st of Feb, 2006:
Kyle Tucker pointed out that he had problems running this script with bash on Solaris. The changes he suggested have been worked into the newer version. Thanks Kyle :)
5th of Mar, 2006:
I finally got round to fix the script according to all the changes Kyle (and others) suggested. So here's another try! Right now I've tested the script on Red Hat, Mac OS X and Solaris, so it should be much better than before.
19th of June, 2006:
Cleaned up the script a bit and added some checks that are considered the Right Thing to do. Should have done this -way- earlier!
Also stomped out a few horrendous bugs! I'm very sorry for putting out such a buggy script earlier... If you've started using the script in your environment, please download the latest version. Thanks to Ali Khan for pointing out these mistakes.
#!/bin/bash # # Log file pattern detector plugin for Nagios # Written by Ethan Galstad (nagios@nagios.org) # Last Modified: 07-31-1999 # Heavily modified by Thomas Sluyter (nagiosATkilalaDOTnl) # Last Modified: 19-06-2006 # # Usage: ./check_log3 -F log_file -O old_log_file -C crit-pattern -W warn-pattern # # Description: # # This plugin will scan a log file (specified by the log_file option) # for specific patterns (specified by the XXX-pattern options). Successive # calls to the plugin script will only report *new* pattern matches in the # log file, since an copy of the log file from the previous run is saved # to old_log_file. # # Output: # # On the first run of the plugin, it will return an OK state with a message # of "Log check data initialized". On successive runs, it will return an OK # state if *no* pattern matches have been found in the *difference* between the # log file and the older copy of the log file. If the plugin detects any # pattern matches in the log diff, it will return a CRITICAL state and print # out a message is the following format: "(x) last_match", where "x" is the # total number of pattern matches found in the file and "last_match" is the # last entry in the log file which matches the pattern. # # Notes: # # If you use this plugin make sure to keep the following in mind: # # 1. The "max_attempts" value for the service should be 1, as this # will prevent Nagios from retrying the service check (the # next time the check is run it will not produce the same results). # # 2. The "notify_recovery" value for the service should be 0, so that # Nagios does not notify you of "recoveries" for the check. Since # pattern matches in the log file will only be reported once and not # the next time, there will always be "recoveries" for the service, even # though recoveries really don't apply to this type of check. # # 3. You *must* supply a different old_file_log for each service that # you define to use this plugin script - even if the different services # check the same log_file for pattern matches. This is necessary # because of the way the script operates. # # 4. Changes to the script were made by Thomas Sluyter (cailin@kilala.nl). # * The first set of changes will allow the script to run properly on Solaris, which # it did not do by default. The second set of changes will allow the following: # * State retention. In the original script, if a NOK was put into the log file # at point A in time and it is not repeated at A+1, then an OK is sent to Nagios. # Not something that you would like to happen. # I've added the $oldlog.STATE trigger file which retains the last exitstatus. Should # there be no new lines added to the log, check_log will simply repeat the last state # instead of give an OK. # In order for this state retention to work properly your client system MUST # HAVE THE DIRECTORY /USR/LOCAL/NAGIOS/VAR. # * Two queries. In the original script you could only enter one query which, when # found, would result in a Critical message being sent to Nagios. I've added the # possibility to add another query, which will result in a Warning message. # * Bugfix: changed all instances of "crit-count" and "warn-count" to "critcount" and # "warncount" after a tip from Kyle Tucker who ran into problems running this script # with bash on Solaris. # # Paths to commands used in this script. These # may have to be modified to match your system setup. PATH="/usr/bin:/usr/sbin:/bin:/sbin" PROGNAME=`basename $0` PROGPATH=`echo $0 | sed -e 's,[\\/][^\\/][^\\/]*$,,'` #. $PROGPATH/utils.sh . /usr/local/nagios/libexec/utils.sh print_usage() { echo "Usage: $PROGNAME -F logfile -O oldlog -C CRITquery -W WARNquery" echo "Usage: $PROGNAME --help" echo "Usage: $PROGNAME --version" } print_help() { echo "" print_usage echo "" echo "Log file pattern detector plugin for Nagios" echo "" support } # Make sure the correct number of command line # arguments have been supplied if [ $# -lt 8 ]; then print_usage exit $STATE_UNKNOWN fi # Grab the command line arguments exitstatus=$STATE_WARNING #default while test -n "$1"; do case "$1" in --help) print_help exit $STATE_OK ;; -h) print_help exit $STATE_OK ;; -F) logfile=$2 shift ;; -O) oldlog=$2 shift ;; -C) CRITquery=$2 shift ;; -W) WARNquery=$2 shift ;; *) echo "Unknown argument: $1" print_usage exit $STATE_UNKNOWN ;; esac shift done # If the source log file doesn't exist, exit if [ ! -e $logfile ]; then echo "Log check error: Log file $logfile does not exist!" exit $STATE_UNKNOWN echo $STATE_UNKNOWN > $oldlog.STATE fi # If the dump/temp log file doesn't exist, this must be the first time # we're running this test, so copy the original log file over to # the old diff file and exit if [ ! -e $oldlog ]; then cat $logfile > $oldlog TEMPcount=0 let TEMPcount=$TEMPcount+$(tail -1 $logfile | grep -i $WARNquery | wc -l | awk '{print $1}') let TEMPcount=$TEMPcount+$(tail -1 $logfile | grep -i $CRITquery | wc -l | awk '{print $1}') if [ $TEMPcount -gt 0 ] then echo "Log check data initialized... Last line contained error message." echo $STATE_WARNING > $oldlog.STATE exit $STATE_WARNING else echo "Log check data initialized..." echo $STATE_OK > $oldlog.STATE exit $STATE_OK fi fi # A bug which was caught very late: # If newlog is shorter than oldlog, the diff used below will return # false positives for the query because the will be in $oldlog. Why? # Because $oldlog is not rolled over / rotated, like $newlog. I need # to fix this in a kludgy way. if [ `wc -l $logfile|awk '{print $1}'` -lt `wc -l $oldlog|awk '{print $1}'` ] then rm $oldlog cat $logfile > $oldlog TEMPcount=0 let TEMPcount=$TEMPcount+$(tail -1 $logfile | grep -i $WARNquery | wc -l | awk '{print $1}') let TEMPcount=$TEMPcount+$(tail -1 $logfile | grep -i $CRITquery | wc -l | awk '{print $1}') if [ $TEMPcount -gt 0 ] then echo "Log check data initialized... Last line contained error message." echo $STATE_WARNING > $oldlog.STATE exit $STATE_WARNING else echo "Log check data initialized..." echo $STATE_OK > $oldlog.STATE exit $STATE_OK fi fi # The oldlog file exists, so compare it to the original log now # The temporary file that the script should use while # processing the log file. if [ -x mktemp ]; then tempdiff=`mktemp /tmp/check_log.XXXXXXXXXX` else tempdate=`/bin/date '+%H%M%S'` tempdiff="/tmp/check_log.${tempdate}" touch $tempdiff fi diff $logfile $oldlog > $tempdiff if [ `wc -l $tempdiff | awk '{print $1}'` -eq 0 ] then rm $tempdiff touch $oldlog.STATE exitstatus=`cat $oldlog.STATE` echo "LOG FILE - No status change detected. Status = $exitstatus" exit $exitstatus fi # Count the number of matching log entries we have CRITcount=`grep -c "$CRITquery" $tempdiff` WARNcount=`grep -c "$WARNquery" $tempdiff` # Get the last matching entry in the diff file CRITlastentry=`grep "$CRITquery" $tempdiff | tail -1` WARNlastentry=`grep "$WARNquery" $tempdiff | tail -1` rm $tempdiff cat $logfile > $oldlog if [ "$CRITcount" -gt 0 ]; then echo "($CRITcount) $CRITlastentry" echo $STATE_CRITICAL > $oldlog.STATE exit $STATE_CRITICAL fi if [ "$WARNcount" -gt 0 ]; then echo "($WARNcount) $WARNlastentry" echo $STATE_WARNING > $oldlog.STATE exit $STATE_WARNING fi echo "Log check ok - 0 pattern matches found" exit $STATE_OK
echo "Starting clean" rm /tmp/foobar /usr/local/nagios/var/foobar* /usr/local/nagios/libexec/check_log3 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -C neko -W bla echo $? echo "" echo "Starting normally" echo "baka" echo "normal" >> /tmp/foobar /usr/local/nagios/libexec/check_log3 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -C neko -W bla echo $? echo "" echo "baka" echo "normal" >> /tmp/foobar /usr/local/nagios/libexec/check_log3 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -C neko -W bla echo $? echo "" echo "warning" echo "bla" >> /tmp/foobar /usr/local/nagios/libexec/check_log3 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -C neko -W bla echo $? echo "" echo "critical" echo "neko" >> /tmp/foobar /usr/local/nagios/libexec/check_log3 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -C neko -W bla echo $? echo "" echo "warning" echo "bla" >> /tmp/foobar /usr/local/nagios/libexec/check_log3 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -C neko -W bla echo $? echo "" echo "normal" echo "baka" >> /tmp/foobar /usr/local/nagios/libexec/check_log3 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -C neko -W bla echo $? echo "" echo "Log rotation with crit" rm /tmp/foobar /usr/local/nagios/libexec/check_log3 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -C neko -W bla echo $? echo "" echo "critical" echo "neko" >> /tmp/foobar /usr/local/nagios/libexec/check_log3 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -C neko -W bla echo $? echo "" echo "normal" echo "baka" >> /tmp/foobar /usr/local/nagios/libexec/check_log3 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -C neko -W bla echo $? echo "" echo "Log rotation with warn" rm /tmp/foobar /usr/local/nagios/libexec/check_log3 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -C neko -W bla echo $? echo "" echo "warning" echo "bla" >> /tmp/foobar /usr/local/nagios/libexec/check_log3 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -C neko -W bla echo $? echo "" echo "normal" echo "baka" >> /tmp/foobar /usr/local/nagios/libexec/check_log3 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -C neko -W bla echo $? echo "" echo "Normal log rotation" rm /tmp/foobar /usr/local/nagios/libexec/check_log3 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -C neko -W bla echo $? echo "" echo "normal" echo "baka" >> /tmp/foobar /usr/local/nagios/libexec/check_log3 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -C neko -W bla echo $? echo "" echo "normal" echo "baka" >> /tmp/foobar /usr/local/nagios/libexec/check_log3 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -C neko -W bla echo $? echo ""
kilala.nl tags: nagios, unix, programming,
View or add comments (curr. 2)
2006-06-01 00:00:00
This script was written at the time I was hired by UPC / Liberty Global.
Improved log checker for Solaris, with state retention.
I found that the version of check_log included in the default monitor package doesn't work perfectly on Solaris: it needs a bit of tweaking... Which is what I've done for the script.
Also, I've added state retention. It's a bit of a hack, but hey! I needed a quick solution.
The original script sends a Critical when it detects the string you've queried the log file for, but it clears that same Critical immediately if the same message is not repeated once the monitor runs again. Meaning that, if there are no updates to your log file, the Critical will only be around until the next time the monitor runs.
Not very handy if the Critical occurs during the night.
This new version of the script creates a file called $oldlog.STATE in /usr/local/nagios/var (which should be 755, nagios:nagios), which contains the exit status for the last detected _changed_ status... If there are no changes detected in your log file, this old exit state is repeated.
The script has been tested on Solaris 8, Mac OS X 10.4 and Redhat ES3.
UPDATE 19/06/2006:
Cleaned up the script a bit and added some checks that are considered the Right Thing to do. Should have done this -way- earlier!
Also stomped out a few horrendous bugs! I'm very sorry for putting out such a buggy script earlier... If you've started using the script in your environment, please download the latest version. Thanks to Ali Khan for pointing out these mistakes.
#!/bin/bash # # Log file pattern detector plugin for Nagios # Written by Ethan Galstad (nagios@nagios.org) # Last Modified: 07-31-1999 # Updated by Thomas Sluyter (nagiosATkilalaDOTnl) # Last Modified: 19-06-2006 # # Usage: ./check_log2 -F log_file -O old_log_file -Q pattern # # Description: # # This plugin will scan a log file (specified by the log_file option) # for a specific pattern (specified by the pattern option). Successive # calls to the plugin script will only report *new* pattern matches in the # log file, since an copy of the log file from the previous run is saved # to old_log_file. # # Output: # # On the first run of the plugin, it will return an OK state with a message # of "Log check data initialized". On successive runs, it will return an OK # state if *no* pattern matches have been found in the *difference* between the # log file and the older copy of the log file. If the plugin detects any # pattern matches in the log diff, it will return a CRITICAL state and print # out a message is the following format: "(x) last_match", where "x" is the # total number of pattern matches found in the file and "last_match" is the # last entry in the log file which matches the pattern. # # Notes: # # If you use this plugin make sure to keep the following in mind: # # 1. The "max_attempts" value for the service should be 1, as this # will prevent Nagios from retrying the service check (the # next time the check is run it will not produce the same results). # # 2. The "notify_recovery" value for the service should be 0, so that # Nagios does not notify you of "recoveries" for the check. Since # pattern matches in the log file will only be reported once and not # the next time, there will always be "recoveries" for the service, even # though recoveries really don't apply to this type of check. # # 3. You *must* supply a different old_file_log for each service that # you define to use this plugin script - even if the different services # check the same log_file for pattern matches. This is necessary # because of the way the script operates. # # 4. Changes to the script were made by Thomas Sluyter (nagios@kilala.nl). # The first set of changes will allow the script to run properly on Solaris, which # it did not do by default. The second set of changes will allow the following: # * State retention. If a NOK was generated at point A in time and it is not repeated # at A+1, then an OK is sent to Nagios. Not something that you would like to happen. # I've added the $oldlog.STATE trigger file which retains the last exitstatus. Should # there be no new lines added to the log, check_log will simply repeat the last state # instead of give an OK. # # Examples: # # Check for login failures in the syslog... # # check_log -F /var/log/messages -O /usr/local/nagios/var/check_log.badlogins.old -Q "LOGIN FAILURE" # # Check for port scan alerts generated by Psionic's PortSentry software... # # check_log -F /var/log/messages -O /usr/local/nagios/var/check_log.portscan.old -Q "attackalert" # # Paths to commands used in this script. These # may have to be modified to match your system setup. PATH="/usr/bin:/usr/sbin:/bin:/sbin" PROGNAME=`basename $0` PROGPATH=`echo $0 | sed -e 's,[\\/][^\\/][^\\/]*$,,'` #. $PROGPATH/utils.sh . /usr/local/nagios/libexec/utils.sh print_usage() { echo "Usage: $PROGNAME -F logfile -O oldlog -Q query" echo "Usage: $PROGNAME --help" } print_help() { echo "" print_usage echo "" echo "Log file pattern detector plugin for Nagios" echo "" support } # Make sure the correct number of command line # arguments have been supplied if [ $# -lt 6 ]; then print_usage exit $STATE_UNKNOWN fi # Grab the command line arguments exitstatus=$STATE_WARNING #default while test -n "$1"; do case "$1" in --help) print_help exit $STATE_OK ;; -h) print_help exit $STATE_OK ;; -F) logfile=$2 shift ;; -O) oldlog=$2 shift ;; -Q) query=$2 shift ;; *) echo "Unknown argument: $1" print_usage exit $STATE_UNKNOWN ;; esac shift done # If the source log file doesn't exist, exit if [ ! -e $logfile ]; then echo "Log check error: Log file $logfile does not exist!" exit $STATE_UNKNOWN echo $STATE_UNKNOWN > $oldlog.STATE fi # If the oldlog file doesn't exist, this must be the first time # we're running this test, so copy the original log file over to # the old diff file and exit if [ ! -e $oldlog ]; then cat $logfile > $oldlog if [ `tail -1 $logfile | grep -i $query | wc -l` -gt 0 ] then echo "Log check data initialized... Last line contained error message." echo $STATE_CRITICAL > $oldlog.STATE exit $STATE_CRITICAL else echo "Log check data initialized..." echo $STATE_OK > $oldlog.STATE exit $STATE_OK fi fi # A bug which was caught very late: # If newlog is shorter than oldlog, the diff used below will return # false positives for the query because the will be in $oldlog. Why? # Because $oldlog is not rolled over / rotated, like $newlog. I need # to fix this in a kludgy way. if [ `wc -l $logfile|awk '{print $1}'` -lt `wc -l $oldlog|awk '{print $1}'` ] then rm $oldlog cat $logfile > $oldlog if [ `tail -1 $logfile | grep -i $query | wc -l` -gt 0 ] then echo "Log check data re-initialized... Last line contained error message." echo $STATE_CRITICAL > $oldlog.STATE exit $STATE_CRITICAL else echo "Log check data re-initialized..." echo $STATE_OK > $oldlog.STATE exit $STATE_OK fi fi # Everything seems fine, so compare it to the original log now # The temporary file that the script should use while # processing the log file. if [ -x mktemp ]; then tempdiff=`mktemp /tmp/check_log.XXXXXXXXXX` else tempdate=`/bin/date '+%H%M%S'` tempdiff="/tmp/check_log.${tempdate}" touch $tempdiff fi diff $logfile $oldlog > $tempdiff if [ `wc -l $tempdiff|awk '{print $1}'` -eq 0 ] then rm $tempdiff touch $oldlog.STATE exitstatus=`cat $oldlog.STATE` echo "LOG FILE - No status change detected. Status = $exitstatus" exit $exitstatus fi # Count the number of matching log entries we have count=`grep -c "$query" $tempdiff` # Get the last matching entry in the diff file lastentry=`grep "$query" $tempdiff | tail -1` rm -f $tempdiff cat $logfile > $oldlog if [ "$count" = "0" ]; then # no matches, exit with no error echo "Log check ok - 0 pattern matches found" exitstatus=$STATE_OK else # Print total matche count and the last entry we found # echo "($count) $lastentry" echo "Log check NOK - $lastentry" exitstatus=$STATE_CRITICAL echo $STATE_CRITICAL > $oldlog.STATE fi exit $exitstatus
echo "Starting clean" rm /tmp/foobar /usr/local/nagios/var/foobar* /usr/local/nagios/libexec/check_log2 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -Q neko echo $? echo "" echo "Starting normally" echo "normal" echo "normal" >> /tmp/foobar /usr/local/nagios/libexec/check_log2 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -Q neko echo $? echo "" echo "normal" echo "normal" >> /tmp/foobar /usr/local/nagios/libexec/check_log2 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -Q neko echo $? echo "" echo "critical" echo "neko" >> /tmp/foobar /usr/local/nagios/libexec/check_log2 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -Q neko echo $? echo "" echo "normal" echo "baka" >> /tmp/foobar /usr/local/nagios/libexec/check_log2 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -Q neko echo $? echo "" echo "Log rotation with crit" rm /tmp/foobar /usr/local/nagios/libexec/check_log2 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -Q neko echo $? echo "" echo "critical" echo "neko" >> /tmp/foobar /usr/local/nagios/libexec/check_log2 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -Q neko echo $? echo "" echo "normal" echo "baka" >> /tmp/foobar /usr/local/nagios/libexec/check_log2 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -Q neko echo $? echo "" echo "Normal log rotation" rm /tmp/foobar /usr/local/nagios/libexec/check_log2 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -Q neko echo $? echo "" echo "normal" echo "baka" >> /tmp/foobar /usr/local/nagios/libexec/check_log2 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -Q neko echo $? echo "" echo "normal" echo "baka" >> /tmp/foobar /usr/local/nagios/libexec/check_log2 -F /tmp/foobar -O /usr/local/nagios/var/foobar.archive -Q neko echo $? echo ""
kilala.nl tags: nagios, unix, programming,
View or add comments (curr. 2)
2005-07-01 00:00:00
This script was written at the time I was hired by UPC / Liberty Global.
Basic monitor that checks if Postfix is up and running. It checks for a number of processes and ports.
This script was quickly hacked together for my current customer, as a Q&D solution for their monitoring needs. It's no beauty, but it works. Written in ksh and tested with:
The script sends a Critical if:
A) One or more processes are not running, or
B) One or more ports are not available for connections.
UPDATE 19/06/2006:
Cleaned up the script a bit and added some checks that are considered the Right Thing to do. Should have done this -way- earlier!
#!/usr/bin/bash # # Postfix process monitor plugin for Nagios # Written by Thomas Sluyter (nagiosATkilalaDOTnl) # By request of DTV Labs, Liberty Global, the Netherlands # Last Modified: 19-06-2006 # # Usage: ./check_postfix # # Description: # This plugin determines whether the Postfix SMTP server # is running properly. It will check the following: # * Are all required processes running? # * Are all the required TCP/IP ports open? # # Limitations: # Currently this plugin will only function correctly on Solaris systems. # # Output: # Script returns a CRIT when one of the abovementioned criteria is # not matched # # Host OS check and warning message if [ `uname` != "SunOS" ] then echo "WARNING:" echo "This script was originally written for use on Solaris." echo "You may run into some problems running it on this host." echo "" echo "Please verify that the script works before using it in a" echo "live environment. You can easily disable this message after" echo "testing the script." echo "" fi # You may have to change this, depending on where you installed your # Nagios plugins PATH="/usr/bin:/usr/sbin:/bin:/sbin" LIBEXEC="/usr/local/nagios/libexec" . $LIBEXEC/utils.sh print_usage() { echo "Usage: $PROGNAME" echo "Usage: $PROGNAME --help" } print_help() { echo "" print_usage echo "" echo "Postfix monitor plugin for Nagios" echo "" echo "This plugin not developped by the Nagios Plugin group." echo "Please do not e-mail them for support on this plugin, since" echo "they won't know what you're talking about :P" echo "" echo "For contact info, read the plugin itself..." } while test -n "$1" do case "$1" in --help) print_help; exit $STATE_OK;; -h) print_help; exit $STATE_OK;; *) print_usage; exit $STATE_UNKNOWN;; esac done check_processes() { PROCESS="0" PROCLIST="smtpd qmgr pickup master sendmail" for PROC in `echo $PROCLIST`; do if [ `ps -ef | grep $PROC | grep -v grep | wc -l` -lt 1 ]; then if [ $PROC == "smtpd" ]; then if [ `ps -ef | grep proxymap | grep -v grep | wc -l` -lt 1 ]; then PROCESS=1 else PROCESS=0 fi else PROCESS=1 fi fi done if [ $PROCESS -eq 1 ]; then echo "SMTP-S NOK - One or more processes not running" exitstatus=$STATE_CRITICAL exit $exitstatus fi } check_ports() { PORTS="0" PORTLIST="25" for NUM in `echo $PORTLIST`; do if [ `netstat -an | grep LISTEN | grep $NUM | grep -v grep | wc -l` -lt 1 ]; then PORTS=1;fi done if [ $PORTS -eq 1 ]; then echo "SMTP-S NOK - One or more TCP/IP ports not listening." exitstatus=$STATE_CRITICAL exit $exitstatus fi } check_processes check_ports echo "SMTP-S OK - Everything running like it should" exitstatus=$STATE_OK exit $exitstatus
kilala.nl tags: nagios, unix, programming,
View or add comments (curr. 0)
2005-07-01 00:00:00
This script was written in the time I was hired by UPC / Liberty Global.
The text I wrote on Nagios Exchange about this script has been lost. I guess it speaks for itself :)
#!/usr/bin/bash # # Squid process monitor plugin for Nagios # Written by Thomas Sluyter (nagiosATkilalaDOTnl) # By request of DTV Labs, Liberty Global, the Netherlands # Last Modified: 19-06-2006 # # Usage: ./check_squid # # Description: # This plugin determines whether the Squid proxy server # is running properly. It will check the following: # * Are all required processes running? # * Are all the required TCP/IP ports open? # # Limitations: # Currently this plugin will only function correctly on Solaris systems. # # Output: # The script returns a CRIT when the abovementioned criteria are # not matched # # Host OS check and warning message if [ `uname` != "SunOS" ] then echo "WARNING:" echo "This script was originally written for use on Solaris." echo "You may run into some problems running it on this host." echo "" echo "Please verify that the script works before using it in a" echo "live environment. You can easily disable this message after" echo "testing the script." echo "" fi # You may have to change this, depending on where you installed your # Nagios plugins PATH="/usr/bin:/usr/sbin:/bin:/sbin" LIBEXEC="/usr/local/nagios/libexec" . $LIBEXEC/utils.sh print_usage() { echo "Usage: $PROGNAME" echo "Usage: $PROGNAME --help" } print_help() { echo "" print_usage echo "" echo "Squid monitor plugin for Nagios" echo "" echo "This plugin not developped by the Nagios Plugin group." echo "Please do not e-mail them for support on this plugin, since" echo "they won't know what you're talking about :P" echo "" echo "For contact info, read the plugin itself..." } while test -n "$1" do case "$1" in --help) print_help; exit $STATE_OK;; -h) print_help; exit $STATE_OK;; *) print_usage; exit $STATE_UNKNOWN;; esac done check_processes() { PROCESS="0" if [ `ps -ef | grep squid | grep -v grep | grep -v nagios | wc -l` -lt 2 ]; then echo "SQUID NOK - One or more processes not running" exitstatus=$STATE_CRITICAL exit $exitstatus fi } check_ports() { PORTS=0 PORTLIST="8080 3128 3130" for NUM in `echo $PORTLIST`; do if [ `netstat -an | grep LISTEN | grep $NUM | grep -v grep | wc -l` -lt 1 ]; then PORTS=1;fi done if [ $PORTS -eq 1 ]; then echo "SQUID NOK - One or more TCP/IP ports not listening." exitstatus=$STATE_CRITICAL exit $exitstatus fi } check_processes check_ports echo "SQUID OK - Everything running like it should" exitstatus=$STATE_OK exit $exitstatus
kilala.nl tags: nagios, unix, programming,
View or add comments (curr. 0)
2005-07-01 00:00:00
This script was written at the time I was hired by UPC / Liberty Global.
Basic monitor that checks if the Retrospect client is up and running.
This script was quickly hacked together for my current customer, as a Q&D solution for their monitoring needs. It's no beauty, but it works. Written in ksh and tested with:
The script sends a Critical if the required process is not running.
UPDATE 19/06/2006:
Cleaned up the script a bit and added some checks that are considered the Right Thing to do. Should have done this -way- earlier!
#!/usr/bin/bash # # Retrospect Backup Client monitor plugin for Nagios # Written by Thomas Sluyter (nagiosATkilalaDOTnl) # By request of DTV Labs, Liberty Global, the Netherlands # Last Modified: 19-06-2006 # # Usage: ./check_retro_client # # Description: # This plugin determines whether the Retrospect backup client # is running properly. It will check the following: # * Are all required processes running? # # Limitations: # Currently this plugin will only function correctly on Solaris systems. # # Output: # The script returns a CRIT when the abovementioned criteria are # not matched # # Host OS check and warning message if [ `uname` != "SunOS" ] then echo "WARNING:" echo "This script was originally written for use on Solaris." echo "You may run into some problems running it on this host." echo "" echo "Please verify that the script works before using it in a" echo "live environment. You can easily disable this message after" echo "testing the script." echo "" fi # You may have to change this, depending on where you installed your # Nagios plugins PATH="/usr/bin:/usr/sbin:/bin:/sbin" LIBEXEC="/usr/local/nagios/libexec" . $LIBEXEC/utils.sh print_usage() { echo "Usage: $PROGNAME" echo "Usage: $PROGNAME --help" } print_help() { echo "" print_usage echo "" echo "Retrospect Backup Client monitor plugin for Nagios" echo "" echo "This plugin not developped by the Nagios Plugin group." echo "Please do not e-mail them for support on this plugin, since" echo "they won't know what you're talking about :P" echo "" echo "For contact info, read the plugin itself..." } while test -n "$1" do case "$1" in --help) print_help; exit $STATE_OK;; -h) print_help; exit $STATE_OK;; *) print_usage; exit $STATE_UNKNOWN;; esac done check_processes() { PROCESS="0" if [ `ps -ef | grep retroclient | grep -v grep | grep -v nagios | wc -l` -lt 1 ]; then echo "RETROSPECT NOK - One or more processes not running" exitstatus=$STATE_CRITICAL exit $exitstatus fi } check_processes echo "RETROSPECT OK - Everything running like it should" exitstatus=$STATE_OK exit $exitstatus
kilala.nl tags: nagios, unix, programming,
View or add comments (curr. 0)
2005-07-01 00:00:00
This script was written at the time I was hired by UPC / Liberty Global.
Basic monitor that checks if the server is up and running. It checks for a process and whether the server has drifted from its higher level Stratum server.
This script was quickly hacked together for my current customer, as a Q&D solution for their monitoring needs. It's no beauty, but it works. Written in ksh and tested with:
The script sends a Critical if:
A) One or more processes are not running, or
B) The server's clock has drifted too far from its higher level Stratum server.
Requires the "check_ntp" plugin which is part of the default monitor package.
UPDATE 19/06/2006:
Cleaned up the script a bit and added some checks that are considered the Right Thing to do. Should have done this -way- earlier!
#!/usr/bin/bash # # NTP server process monitor plugin for Nagios # Written by Thomas Sluyter (nagiosATkilalaDOTnl) # By request of DTV Labs, Liberty Global, the Netherlands # Last Modified: 19-06-2006 # # Usage: ./check_ntp_s # # Description: # This plugin determines whether the Nagios client is functioning # properly as an NTP server. It does this by checking: # * Are all required processes running? # * Is the server's time up to scratch with its higher stratum server? # # Limitations: # Currently this plugin will only function correctly on Solaris systems. # # Output: # The script returns a CRIT when one of the abovementioned criteria # is not matched. # # Host OS check and warning message if [ `uname` != "SunOS" ] then echo "WARNING:" echo "This script was originally written for use on Solaris." echo "You may run into some problems running it on this host." echo "" echo "Please verify that the script works before using it in a" echo "live environment. You can easily disable this message after" echo "testing the script." echo "" fi # You may have to change this, depending on where you installed your # Nagios plugins PATH="/usr/bin:/usr/sbin:/bin:/sbin" LIBEXEC="/usr/local/nagios/libexec" . $LIBEXEC/utils.sh print_usage() { echo "Usage: $PROGNAME" echo "Usage: $PROGNAME --help" } print_help() { echo "" print_usage echo "" echo "NTP server plugin for Nagios" echo "" echo "This plugin not developped by the Nagios Plugin group." echo "Please do not e-mail them for support on this plugin, since" echo "they won't know what you're talking about :P" echo "" echo "For contact info, read the plugin itself..." } while test -n "$1" do case "$1" in --help) print_help; exit $STATE_OK;; -h) print_help; exit $STATE_OK;; *) print_usage; exit $STATE_UNKNOWN;; esac done check_processes() { PROCESS="0" if [ `ps -ef | grep xntpd | grep -v grep | grep -v nagios | wc -l` -lt 1 ]; then PROCESS=1;fi if [ $PROCESS -eq 1 ]; then echo "NTP-S NOK - One or more processes not running" exitstatus=$STATE_CRITICAL exit $exitstatus fi } check_time() { TIME="0" #SERVERS="ntp0.nl.net ntp1.nl.net ntp2.nl.net" SERVERS="nl-ams99z-a02-01" for SERV in `echo $SERVERS`; do if [ `/usr/local/nagios/libexec/check_ntp -H $SERV | awk '{print $2}'` != "OK:" ]; then TIME=1 else TIME=0 break fi done if [ $TIME -eq 1 ]; then echo "NTP-S NOK - Time not in synch with higher Stratum." exitstatus=$STATE_CRITICAL exit $exitstatus fi } check_processes check_time echo "NTP-S OK - Everything running like it should" exitstatus=$STATE_OK exit $exitstatus
kilala.nl tags: nagios, unix, programming,
View or add comments (curr. 1)
2005-07-01 00:00:00
This script was written at the time I was hired by KPN i-Diensten. It is reproduced/shared here with their permission.
We are currently in the process of distributing a standard set of Nagios monitoring scripts to over 300 client systems. One of the metrics we would like to monitor is the three load averages (or as Dr. Gunther calls them: the LaLaLa triplets).
Since these 300 servers aren't all alike, we are bound to run into systems with one, two, four, eight or more processors. That way there is no nice way of making one standard configuration, since you'll have to define separate LA levels for WARN and CRIT. Why? Cause a quad system can take much more load than a single core system.
One way to get around this would be by defining separate host groups, based on the amount of processors in a system. You could then define a unique check_load command for each CPU host group.
I've gone the other way around though...
My work-around for this is by replacing check_load with check_load2. This script takes no command line parameters and works on the basis of standard multipliers. We are of the opinion that the number of processors multiplied by a certain factor (150%? 200%? and so on) is a good enough way to define these WARN and CRIT levels. These multipliers can easily be modified (at the top of the script) to fit what -you- think is a worrying level of activity.
This script was tested on Redhat ES3, Solaris 8 and Mac OS X 10.4. It should run on other versions of these OSes as well.
EDIT:
Oh! Just like my other recent Nagios scripts, check_load2 comes with a debugging option. Set $DEBUG at the top of the file to anything larger than zero and the script will dump information at various stages of its execution.
#!/usr/bin/bash # # CPU load monitor plugin for Nagios # Written by Thomas Sluyter (nagiosATkilalaDOTnl) # By request of KPN-IS, i-Provide, the Netherlands # Last Modified: 22-06-2006 # # Usage: ./check_load2 # # Description: # Ethan's original version of the check_load script is very flexible. # It allows you to specifically set WARN and CRIT levels regarding # the CPU load of the system you're monitoring. # However: flexibility is not always a good thing. Say for example that # you want to monitor the CPU load across a few hundred of systems having # various CPU configurations. You -could- define host groups for single, dual # quad (and so on) processor systems and assign unique check_load command # definitions to each group. # Or you could write a script which checks the amount of active CPUs and # then makes an educated guess at the WARN and CRIT levels for the system. # In most cases this should really be enough. # # Limitations: # This script should work properly on all implementations of Linux, Solaris # and Mac OS X. # # Output: # Depending on the levels defined at the top of the script, # the script returns an OK, WARN or CRIT to Nagios based on CPU load. # # Other notes: # If you ever run into problems with the script, set the DEBUG variable # to 1. I'll need the output the script generates to do troubleshooting. # See below for details. # I realise that all the debugging commands strewn throughout the script # may make things a little harder to read. But in the end I'm sure it was # well worth adding them. It makes troubleshooting so much easier. :3 # # You may have to change this, depending on where you installed your # Nagios plugins PATH="/usr/bin:/usr/sbin:/bin:/sbin" LIBEXEC="/usr/local/nagios/libexec" . $LIBEXEC/utils.sh ### DEBUGGING SETUP ### # Cause you never know when you'll need to squash a bug or two DEBUG="1" DEBUGFILE="/tmp/foobar" rm $DEBUGFILE ### REQUISITE NAGIOS COMMAND LINE STUFF ### print_usage() { echo "Usage: $PROGNAME" echo "Usage: $PROGNAME --help" } print_help() { echo "" print_usage echo "" echo "Semi-intelligent CPU load monitor plugin for Nagios" echo "" echo "This plugin not developped by the Nagios Plugin group." echo "Please do not e-mail them for support on this plugin, since" echo "they won't know what you're talking about :P" echo "" echo "For contact info, read the plugin itself..." } while test -n "$1" do case "$1" in --help) print_help; exit $STATE_OK;; -h) print_help; exit $STATE_OK;; *) print_usage; exit $STATE_UNKNOWN;; esac done ### SETTING UP THE WARN AND CRIT FACTORS ### # Please be aware that these are -factors- and not real load average values. # The numbers below will be multiplied by the amount of processors to come # to the desired WARN and CRIT levels. Feel free to adjust these factors, if # you feel the need to tweak them. WARN_1min="2.00" WARN_5min="1.50" WARN_15min="1.50" [ $DEBUG -gt 0 ] && echo "Factors: warning factors are at $WARN_1min, $WARN_5min, $WARN_15min." >> $DEBUGFILE CRIT_1min="3.00" CRIT_5min="2.00" CRIT_15min="2.00" [ $DEBUG -gt 0 ] && echo "Factors: critical factors are at $CRIT_1min, $CRIT_5min, $CRIT_15min." >> $DEBUGFILE ### DEFINING SUBROUTINES ### function gather_procs_linux() { NUMPROCS=`cat /proc/cpuinfo | grep ^processor | wc -l` [ $DEBUG -gt 0 ] && echo "Numprocs: Number of processors detected is $NUMPROCS." >> $DEBUGFILE } function gather_procs_sunos() { NUMPROCS=`/usr/bin/mpstat | grep -v CPU | wc -l` [ $DEBUG -gt 0 ] && echo "Numprocs: Number of processors detected is $NUMPROCS." >> $DEBUGFILE } function gather_procs_darwin() { NUMPROCS=`/usr/bin/hostinfo | grep "Default processor set" | awk '{print $8}'` [ $DEBUG -gt 0 ] && echo "Numprocs: Number of processors detected is $NUMPROCS." >> $DEBUGFILE } function gather_load_linux() { REAL_1min=`cat /proc/loadavg | awk '{print $1}'` REAL_5min=`cat /proc/loadavg | awk '{print $2}'` REAL_15min=`cat /proc/loadavg | awk '{print $3}'` [ $DEBUG -gt 0 ] && echo "Gather_load: Detected load averages are $REAL_1min, $REAL_5min, $REAL_15min." >> $DEBUGFILE } function gather_load_sunos() { REAL_1min=`w | grep "load average" | awk -F, '{print $4}' | awk '{print $3}'` REAL_5min=`w | grep "load average" | awk -F, '{print $5}'` REAL_15min=`w | grep "load average" | awk -F, '{print $6}'` [ $DEBUG -gt 0 ] && echo "Gather_load: Detected load averages are $REAL_1min, $REAL_5min, $REAL_15min." >> $DEBUGFILE } function gather_load_darwin() { REAL_1min=`sysctl -n vm.loadavg | awk '{print $1}'` REAL_5min=`sysctl -n vm.loadavg | awk '{print $2}'` REAL_15min=`sysctl -n vm.loadavg | awk '{print $3}'` [ $DEBUG -gt 0 ] && echo "Gather_load: Detected load averages are $REAL_1min, $REAL_5min, $REAL_15min." >> $DEBUGFILE } function check_load() { WARN="0"; CRIT="0" [ `echo "if(($NUMPROCS * $WARN_1min) > $REAL_1min) 0; if(($NUMPROCS * $WARN_1min) <= $REAL_1min) 1" | bc` -gt 0 ] && let WARN=$WARN+1 [ `echo "if(($NUMPROCS * $WARN_5min) > $REAL_5min) 0; if(($NUMPROCS * $WARN_5min) <= $REAL_5min) 1" | bc` -gt 0 ] && let WARN=$WARN+1 [ `echo "if(($NUMPROCS * $WARN_15min) > $REAL_15min) 0; if(($NUMPROCS * $WARN_15min) <= $REAL_15min) 1" | bc` -gt 0 ] && let WARN=$WARN+1 [ $DEBUG -gt 0 ] && echo "Check_load: warning levels are `echo "$NUMPROCS * $WARN_1min"|bc`, `echo "$NUMPROCS * $WARN_5min"|bc`, `echo "$NUMPROCS * $WARN_15min"|bc`," >> $DEBUGFILE [ `echo "if(($NUMPROCS * $CRIT_1min) > $REAL_1min) 0; if(($NUMPROCS * $CRIT_1min) <= $REAL_1min) 1" | bc` -gt 0 ] && let CRIT=$CRIT+1 [ `echo "if(($NUMPROCS * $CRIT_5min) > $REAL_5min) 0; if(($NUMPROCS * $CRIT_5min) <= $REAL_5min) 1" | bc` -gt 0 ] && let CRIT=$CRIT+1 [ `echo "if(($NUMPROCS * $CRIT_15min) > $REAL_15min) 0; if(($NUMPROCS * $CRIT_15min) <= $REAL_15min) 1" | bc` -gt 0 ] && let CRIT=$CRIT+1 [ $DEBUG -gt 0 ] && echo "Check_load: critical levels are `echo "$NUMPROCS * $CRIT_1min"|bc`, `echo "$NUMPROCS * $CRIT_5min"|bc`, `echo "$NUMPROCS * $CRIT_15min"|bc`," >> $DEBUGFILE [ $WARN -gt 0 ] && (echo "NOK: load averages are at $REAL_1min, $REAL_5min, $REAL_15min"; exit $STATE_WARNING) [ $CRIT -gt 0 ] && (echo "NOK: load averages are at $REAL_1min, $REAL_5min, $REAL_15min"; exit $STATE_CRITICAL) } ### FINALLY, THE MAIN ROUTINE ### NUMPROCS="0" case `uname` in Linux) gather_procs_linux; gather_load_linux; check_load;; Darwin) gather_procs_darwin; gather_load_darwin; check_load;; SunOS) gather_procs_sunos; gather_load_sunos; check_load;; *) echo "OS not supported by this check."; exit 1;; esac # Nothing caused us to exit early, so we're okay. echo "OK - load averages are at $REAL_1min, $REAL_5min, $REAL_15min" exit $STATE_OK
kilala.nl tags: nagios, unix, programming,
View or add comments (curr. 7)
2005-07-01 00:00:00
This script was written at the time I was hired by UPC / Liberty Global.
Basic monitor that checks if the Checkpoint Firewall-1 Management software is up and running. It checks for a number of processes and ports.
This script was quickly hacked together for my current customer, as a Q&D solution for their monitoring needs. It's no beauty, but it works. Written in ksh and tested with:
The script sends a Critical if:
A) One or more processes are not running, or
B) One or more ports are not available for connections.
UPDATE 19/06/2006:
Cleaned up the script a bit and added some checks that are considered the Right Thing to do. Should have done this -way- earlier!
#!/usr/bin/bash # # Firewall-1 process monitor plugin for Nagios # Written by Thomas Sluyter (nagiosATkilalaDOTnl) # By request of DTV Labs, Liberty Global, the Netherlands # Last Modified: 19-06-2006 # # Usage: ./check_fwm # # Description: # This plugin determines whether the Firewall-1 management # software is running properly. It will check the following: # * Are all required processes running? # * Are all the required TCP/IP ports open? # # Limitations: # Currently this plugin will only function correctly on Solaris systems. # # Output: # The script retunrs a CRIT when one of the criteria mentioned # above is not matched. # # Host OS check and warning message if [ `uname` != "SunOS" ] then echo "WARNING:" echo "This script was originally written for use on Solaris." echo "You may run into some problems running it on this host." echo "" echo "Please verify that the script works before using it in a" echo "live environment. You can easily disable this message after" echo "testing the script." echo "" fi # You may have to change this, depending on where you installed your # Nagios plugins PATH="/usr/bin:/usr/sbin:/bin:/sbin" LIBEXEC="/usr/local/nagios/libexec" . $LIBEXEC/utils.sh print_usage() { echo "Usage: $PROGNAME" echo "Usage: $PROGNAME --help" } print_help() { echo "" print_usage echo "" echo "Firewall-1 monitor plugin for Nagios" echo "" echo "This plugin not developped by the Nagios Plugin group." echo "Please do not e-mail them for support on this plugin, since" echo "they won't know what you're talking about :P" echo "" echo "For contact info, read the plugin itself..." } while test -n "$1" do case "$1" in --help) print_help; exit $STATE_OK;; -h) print_help; exit $STATE_OK;; *) print_usage; exit $STATE_UNKNOWN;; esac done check_processes() { PROCESS="0" # PROCLIST="cpd fwd fwm cpwd cpca cpmad cplmd cpstat cpshrd cpsnmpd" PROCLIST="cpd fwd fwm cpwd cpca cpmad cpstat cpsnmpd" for PROC in `echo $PROCLIST`; do if [ `ps -ef | grep $PROC | grep -v grep | wc -l` -lt 1 ]; then PROCESS=1;fi done if [ $PROCESS -eq 1 ]; then echo "FWM NOK - One or more processes not running" exitstatus=$STATE_CRITICAL exit $exitstatus fi } check_ports() { PORTS="0" PORTLIST="256 257 18183 18184 18187 18190 18191 18192 18196 18264" for NUM in `echo $PORTLIST`; do if [ `netstat -an | grep LISTEN | grep $NUM | grep -v grep | wc -l` -lt 1 ]; then PORTS=1;fi done if [ $PORTS -eq 1 ]; then echo "FWM NOK - One or more TCP/IP ports not listening." exitstatus=$STATE_CRITICAL exit $exitstatus fi } check_processes check_ports echo "FWM OK - Everything running like it should" exitstatus=$STATE_OK exit $exitstatus
kilala.nl tags: nagios, unix, programming,
View or add comments (curr. 0)
All content, with exception of "borrowed" blogpost images, or unless otherwise indicated, is copyright of Tess Sluijter. The character Kilala the cat-demon is copyright of Rumiko Takahashi and used here without permission.