Below are some of the more useful tidbits of Nagios and what things mean, how they work, etc.
Sometimes you want to write a simple script to check a few things and have this script work with Nagios. Well, one of the 1st things you need to understand are the return codes. Below are the basic return codes your script should return.
Read the following if you really want to start writing your own plugins. Let me know know too and I will even test them. Ref: http://nagiosplug.sourceforge.net/developer-guidelines.html
|
Numeric |
Services |
Status Description |
| 0 | OK | Plugin executed and results are within parameters |
| 1 | WARNING | Plugin executed and results are in warning range, or are not functioning properly |
| 2 | CRITICAL | Plugin detected the service is not running, or the service is in the critical threshold |
| 3 | UNKNOWN | Typically invalid check parameters, missing values, missing plugin components, etc. |
When creating alerts, I did not even realize at first there were functions that could more narrowly / flexibly define the thresholds such as this. There are a few cases when you want the to trigger an alert when a value is too low, or less than a certain level.
Ref: http://nagiosplug.sourceforge.net/developer-guidelines.html
Example ranges
|
Range definition |
Generate an alert if x... |
|---|---|
| 10 | < 0 or > 10, (outside the range of {0 .. 10}) |
| 10: | < 10, (outside {10 .. ∞}) |
| ~:10 | > 10, (outside the range of {-∞ .. 10}) |
| 10:20 | < 10 or > 20, (outside the range of {10 .. 20}) |
| @10:20 | ≥ 10 and ≤ 20, (inside the range of {10 .. 20}) |
Command line examples
|
Command line |
Meaning |
|---|---|
| check_stuff -w10 -c20 | Critical if "stuff" is over 20, else warn if over 10 (will be critical if "stuff" is less than 0) |
| check_stuff -w~:10 -c~:20 | Same as above. Negative "stuff" is OK |
| check_stuff -w10: -c20 | Critical if "stuff" is over 20, else warn if "stuff" is below 10 (will be critical if "stuff" is less than 0) |
| check_stuff -c1: | Critical if "stuff" is less than 1 |
| check_stuff -w~:0 -c10 | Critical if "stuff" is above 10; Warn if "stuff" is above zero |
| check_stuff -c5:6 | The only noncritical range is 5:6 |
| check_stuff -c10:20 | Critical if "stuff" is 10 to 20 |
NOTE: sometimes you need to escape the special characters. See the example below...
./check_mssql -H hostname -p 1433 -U username -P password -D database -w 2.0 -c 3.5 \ -q "exec database.dbo.StoredProcedure variable1, variable2" -W32\: -C10\: -s
The above will WARN when the value my stored procedure returns is LESS than 32 and CRITICAL when LESS than 10.
Nagios object inheritance can be a very confusing and/or tricky topic. First you should read the official documenation on nagios object inheritance here. Below is a very simple and straightforward example to illustrate a common application of inheritance you may want to use in your environment.
Your host/service definition entry uses a template...
use app-server
The app-server template has two contact groups configured...
contactgroup_name it,dev
Let's see what we can do in the host/service definition. Here are some examples and how they should work.
contactgroup_name +support
Using the "+" sign, the host/service definition uses the data in the template -and- adds the support group to the alerts.
contactgroup_name !dev
Using the "!" sign, the host/service definition uses the data in the template -and- will remove the dev group from the alerts. If dev was not specified in the template, this would have no effect.
contactgroup_name support
Using no modifier will override the template values and only the support group will receive these alerts.
contactgroup_name +support,!dev
You can use combinations of modifiers to get the desired results as well. Remember too if you use multiple templates you apply them in order. The nagios official documentation have a very useful flow chart on this to help you understand.
Some may ask why not a line like below to add a contact group line like this to the host/service definition:
contactgroup_name it,dev,finance
The negative to the above is when you want to add another contact group to ALL systems using the template, you then have to remember which systems have explicitly defined values.