Monitor ESXi Free datastores

bmekler · Post by **bmekler** » Tue Apr 17, 2012 5:01 am

I have a number of ESXi 4.1 Free servers at various locations where I'd like to monitor datastore utilization using HostMonitor RMA. This being ESXi, I can't install anything on the service console, and with the free version, I don't have access to SNMP. However, I can enable SSH access to console and run commands remotely. For instance, if I run the following command, I get a simple two-digit integer number that gives me utilization percentage of a specific datastore:

df | grep '4f460e19' | grep -o [0-9][0-9]% | grep -o [0-9][0-9]

Where '4f460e19' is the datastore ID.

Question is, how do I get the RMA to execute this command against ESXi host via SSH, and then evaluate the result against a specific value - for instance, raise alert if the value returned is higher than 80. Is it even possible?

KS-Soft · Post by **KS-Soft** » Tue Apr 17, 2012 6:25 am

Quote from the manual
===============
Check for
SSH test allows you to check result of the command execution in 3 different ways:
1) ...
2) ...
3) Check for Shell Script result
HostMonitor checks output text for specially formatted result string. I.e. result of the command execution should follow the same rules as Shell Script
===============

Requirements to the script:
...
The script or external program must write to stdout (standard output stream) single result string. This string should contain 3 parts separated by colon (:).
First obligatory part - marker "scriptres" tells to HostMonitor or RMA that this string is the result string.
...
Second obligatory part represents the test status, it can take one of the following values (case insensitive):
...
Third optional part contains Reply value, HostMonitor displays this value in Reply field, writes to log files, uses to displays charts (Log Analyzer), etc. If you want Log Analyzer to process Reply values correctly and display charts, use one of the following formats for Reply value:
- decimal_number (like "123", "0", "6456.45". as decimal separator use symbol that is specified on your system, usually a dot (.) or a comma (,))
- decimal_number + space + "Kb" (like "512 Kb", "64 Kb")
- decimal_number + space + "Mb" (like "1024 Mb", "5 Mb")
- decimal_number + space + "Gb" (like "12 Gb", "4 Gb")
- decimal_number + space + "%" (like "50 %", "99 %")
- decimal_number + space + "ms" (like "100 ms", "5400 ms")

In your case script should return string like
scriptres:Ok:50 %
or
scriptres:Bad:88 %

Please check the manual or help file for details
- SSH test
http://www.ks-soft.net/hostmon.eng/mfra ... htm#chkssh
- Shell Script test
http://www.ks-soft.net/hostmon.eng/mfra ... m#chkShell

Regards
Alex

bmekler · Post by **bmekler** » Tue Apr 17, 2012 8:12 am

I'm not sure I can run shell scripts on ESXi remote tech support console. However, I wrote this batch file that runs on the Windows VM where RMA is installed:

set STATUS=Unknown
set DSUTIL=
for /f %%a in ('c:\scripts\plink.exe -ssh root@esx1.domain.local "df | grep '4f460e19' | grep -o [0-9][0-9]%% | grep -o [0-9][0-9]"') do set DSUTIL=%%a
echo %DSUTIL%

if /i %DSUTIL% GTR 80 set STATUS=Bad
if /i %DSUTIL% LEQ 80 set STATUS=Ok
set ERRORLEVEL=scriptres:%STATUS%:%DSUTIL% %%

When I run this batch file from command line, then echo %errorlevel%, it returns the following:

scriptres:Ok:63 %

However, when I plug it into an "External test" check with Hostmonitor, the status is always returned is 'Ok' (even if the script returns 'Bad'), and the value is always 0.

I tried plugging it into shell script test module in various ways, but that simply gives me errors "Unable to read from standard input: The handle is invalid."

KS-Soft Europe · Post by **KS-Soft Europe** » Tue Apr 17, 2012 8:59 am

Looks like plink.exe do not work correctly with Windows pipe mechanism.
Could you try to add/modify "SScript_UseWindowsPipe=0" line in [Misc] section of hostmon.ini file and restart HostMonitor.

Also, you should change your script in order to use with ShellScript test.
E.g.

Code: Select all

@echo off
set STATUS=Unknown 
set DSUTIL= 
for /f %%a in ('c:\scripts\plink.exe -ssh root@esx1.domain.local "df | grep '4f460e19' | grep -o [0-9][0-9]%% | grep -o [0-9][0-9]"') do set DSUTIL=%%a 

if /i %DSUTIL% GTR 80 set STATUS=Bad
if /i %DSUTIL% LEQ 80 set STATUS=Ok
echo scriptres:%STATUS%:%DSUTIL%%%

Start CMD: cmd /c %Script% %Params%

KS-Soft · Post by **KS-Soft** » Tue Apr 17, 2012 9:06 am

scriptres:Ok:63 %
However, when I plug it into an "External test" check with Hostmonitor, the status is always returned is 'Ok' (even if the script returns 'Bad'), and the value is always 0

"External" test method has different requirements. It does not check text output, it checks ErrorLevel code that has nothing to do with "errorlevel" string.

I tried plugging it into shell script test module in various ways,

What exactly command have you specified as "Start cmd" parameter?
Something like cmd /c c:\pathtoBATfile\batfilename.bat %Params%?

Have you added echo scriptres:%STATUS%:%DSUTIL% line into your script?
I do not see this command in your sample.

Regards
Alex

KS-Soft Europe · Post by **KS-Soft Europe** » Tue Apr 17, 2012 9:35 am

Could you try to add/modify "SScript_UseWindowsPipe=0" line in [Misc] section of hostmon.ini file and restart HostMonitor.

If test is performed by RMA agent, then add/modify "SScript_UseWindowsPipe=0" line in [Misc] section of rma.ini file and restart RMA agent.

bmekler · Post by **bmekler** » Tue Apr 17, 2012 11:50 pm

Thank you, it's working now. This is the final version of the shell script I'm running:

Code: Select all

@echo off
set STATUS=Unknown
set DSUTIL=
for /f %%a in ('c:\scripts\plink.exe -ssh -pw %3 %2@%1 "df | grep '%4' | grep -o [0-9][0-9]%% | grep -o [0-9][0-9]"') do set DSUTIL=%%a
if /i "%DSUTIL%" GTR "%5" set STATUS=Bad
if /i "%DSUTIL%" LEQ "%5" set STATUS=Ok
echo scriptres:%STATUS%:%DSUTIL% %%

Starts with cmd /c %Script% %Params%

Parameters passed are hostname, username, password, datastore ID and usage threshold (percentage). The script returns current utilization percentage allowing for tracking utilization trends over time. And yes, adding "SScript_UseWindowsPipe=0" to rma.ini on the agents is indeed crucial - without it the PuTTYLink (plink.exe) command doesn't work correctly.

KS-Soft · Post by **KS-Soft** » Wed Apr 18, 2012 6:02 am

Great

Thanks for the notice

Regards
Alex

bmekler · Post by **bmekler** » Wed Apr 18, 2012 1:38 pm

Okay, I spent some time on it today, and ended up with much better checks. There is a very capable perl script developed by op5, available here:

http://www.op5.com/how-to/monitoring-vm ... er-server/

It uses vSphere API to access performance counters and status checks.

On a Windows system, it misses a whole bunch of dependencies, but I was able to retrieve all of them from cpan.org. After installing VMware vSphere SDK for Perl and placing all the extra perl scripts under C:\Program Files (x86)\VMware\VMware vSphere CLI\Perl\site\lib\, I can now run scripts such as this (using cmd /c perl %script% %params%:

Code: Select all

#!/usr/bin/perl

$response = `c:/scripts/check_esx3.pl -H $ARGV[0] -u root -p $ARGV[1] -N $ARGV[2] -l io`;
chomp $response;

($load) = ($response =~ /io_read=(\d+\.\d\d)\MB/);

if ($load >= $ARGV[3]) {
	$status = "Bad";
} 
if ($load < $ARGV[3]) {
	$status = "Ok";
}

print "scriptres:$status:$load Mb\n";

This checks current disk write throughput on a specific virtual machine. Parameters are ESXi hostname, root password, VM name and write throughput threshold in MB/s.

Another example, this one checks the CPU load of the host, parameters are hostname, root password and load threshold:

Code: Select all

#!/usr/bin/perl

$response = `c:/scripts/check_esx3.pl -H $ARGV[0] -u root -p $ARGV[1] -l cpu -s usage`;
chomp $response;
($load) = ($response =~ /cpu_usage=(\d)\.\d\d\%/);

$load = int($load);

if ($load >= $ARGV[2]) {
	$status = "Bad";
} 
if ($load < $ARGV[2]) {
	$status = "Ok";
}

print "scriptres:$status:$load %\n";