Common Errors & Fixes

From my experiences with Nagios, NagiosGraph, Webinject and various other plugins and modules...
Please comment and even email me errors / fixes and I will add them and link back to a site, etc.  Or if you want to go for the gold, request to be a contributor to the site.

ERROR - GD, PNG, and/or JPEG libraries could not be located

Running through a new Nagios 3.2.0 install.  Downloaded all the components I'm going to need on my new VPS.  Installed gcc, php, and a few others I knew I would need.  However when I ran the configure I got the following error:

*** GD, PNG, and/or JPEG libraries could not be located... *********

I forgot to add gd-devel to my install checklist.  That was an easy one.

ERROR: "undefined symbol: art_alloc"

Having a problem getting graphs to generate? I found the following error in my /var/log/httpd/error_log:
[Mon Oct 06 16:54:56 2008] [error] [client 204.9.220.36] /usr/bin/perl: symbol lookup error: /usr/local/rrdtool/lib/librrd.so.2: undefined symbol: art_alloc, referer: http://monitor1.server.com/nagiosgraph/show.cgi?host=pg1&service=WebInject&geom=700x200

I had previously installed several versions libart_lgpl in my intial attempts to get NagiosGraph and RRDTool working. Turns out "art_alloc" is indeed undefined in libart_lgpl 2.3.17, but it is defined in 2.3.19. It may have been the default 2.3.17 rpm that I installed. However downloading and installing 2.3.19 fixed the issue after I sorted out the path issues.  Also you could shortcut and symlink any old files to the new install vs. configuring PATHs which always seem to get messy for me.

Here's how I checked the art_alloc symbol:


[root@monitor1 lib]# nm -D /usr/local/lib/libart_lgpl_2.so.2.3.19 |grep art_alloc
000033d0 T art_alloc
[root@monitor1 lib]# nm -D /usr/lib/libart_lgpl_2.so.2.3.17 |grep art_alloc
[root@monitor1 lib]#

ERROR: libart-2.0 - Missing libart when installing RRDtool

configure: WARNING:
----------------------------------------------------------------------------
* I found a copy of pkgconfig, but there is no libart-2.0.pc file around.
  You may want to set the PKG_CONFIG_PATH variable to point to its
  location.
----------------------------------------------------------------------------

The issue is the RRDtool configure script is looking in /usr/include vs. /usr/local/include for the libart files.  So there are a few ways to tweak this.

  1. You can symlink the files from where it is looking to where they are.  Can be very messy to get working, but you may unwittingly solve future issues where these files are needed.
  2. You could change where you install libart to not use /usr/local
  3. You can set the correct environment variable:
    export CPPFLAGS=' -I/usr/local/include/libart-2.0'

That solved my RRDTool installation issue.

FIX - CHECK_ESX3.PL Script

In my Nagios testing I was trying to work with the CHECK_ESX3.PL script to run some scripts against the ESX hosts in the environment. I installed the required VMware vSphere Perl SDK (latest from VMware's site). I was able to run "./check_esx3.pl" without any errors. However when I tried to run an actual check against a host I received:

CHECK_ESX3.PL CRITICAL - Server version unavailable at 'https://172.16.0.81:443/sdk/vimService.wsdl' at /usr/lib/perl5/5.8.8/VMware/VICommon.pm line 545.

I ran across the threads about how the the latest LWP does not like the self-signed certificates. So I added this line to the top of the perl script:
$ENV{PERL_LWP_SSL_VERIFY_HOSTNAME} = 0;

However that did not seem to correct the issue (but it turned out it was one of several issues). I noticed when I rebuilt the VMware vSphere Perl SDK I saw this error:

The following Perl modules were found on the system but may be too old to work
with vSphere CLI:

Compress::Zlib 2.005 or newer
HTML::Parser 3.60 or newer
URI 1.37 or newer
XML::SAX 0.16 or newer

I updated those perl modules as well without seeing a difference. What I eventually did was manually edit the VICommon.pm module. I looked at line 545 which tries to parse the response data. So I added a line to just print out the data prior to that step. That's when I saw the message about a proxy error. Turns out I had http_proxy, ftp_proxy, and https_proxy environment variables set from another idea I had been toying with. I removed the environment variables and I was off and running! So I actually had two issues, the self-signed certificates and the bad proxy environment variables.

Error - NSClient Counter Errors

***** Nagios *****

Notification Type: PROBLEM

Service: CPU Load
Host: BALLYs DBA-1
Address: ballys.hubteam.local
State: UNKNOWN

Date/Time: Mon Dec 28 12:53:20 CST 2009
Duration: 0d 0h 2m 1s
Additional Info:
NSClient - ERROR: Could not get data for 5 perhaps we dont collect data this far back?

-------------------------------------------------
-------------------------------------------------
***** Nagios ***** Notification Type: PROBLEM Service: Memory Usage Host: BALLYs DBA-1 Address: ballys.hubteam.local State: UNKNOWN Date/Time: Mon Dec 28 12:52:20 CST 2009 Duration: 0d 0h 1m 31s Additional Info: NSClient - ERROR: Failed to get PDH value.

I received several odd errors with the Memory, CPU and Uptime monitors on one of our new servers with NSClient++.  A little quick research led me to the solution below.  Do not forget to restart the NSCLIENT service too after resetting the counters.

lodctr /R

** Make sure the "/R" is an upper case R.