periodic reminder of a failed test

Need new test, action, option? Post request here.
Steven
Posts: 44
Joined: Thu Feb 16, 2006 5:42 pm

periodic reminder of a failed test

Post by Steven »

I'd like to set up hostmonitor to periodically remind me when something is still down. Right now I'm just telling it to send a 2nd e-mail after 36 bad results, then a 3rd one after 96 bad results and a final one after 255. After each E-mail I reduce the testing interval to stretch the time.

I would prefer to keep original testing interval and to just trigger actions periodically depending on how long ( "<nnn> number of seconds/minutes/hours" ) the test has been in the current status. Perhaps there is a scripted approach? Even better would be to trigger action "after every <nnn> consecutive bad results".

Or perhaps you could just increase the max number of consecutively failed tests? ( from: "Start when <nnn> consecutive "bad" results occur" ...increase <nnn> to 2-byte value with max of 65536? )

I don't remember seeing this option in the manual (although this is a year and 5 upgrades later) or in the action profile settings.
KS-Soft Europe
Posts: 2832
Joined: Tue May 16, 2006 4:41 am
Contact:

Post by KS-Soft Europe »

If you want to start action on 36th, 72nd, 108th... iterations, when test is still down, you should use Advanced mode action and expression ("SimpleStatus" == DOWN) AND (%RECURRENCES% MOD 36==0)

You can find more information about advanced actions here: http://www.ks-soft.net/hostmon.eng/mfra ... ncedaction

Regards,
Max
Steven
Posts: 44
Joined: Thu Feb 16, 2006 5:42 pm

Post by Steven »

Thanx for you fast reply!
And that would work quite well. But I just noticed I mistyped my request :). What I meant, was not
trigger action "after every <nnn> consecutive bad results".
but rather:
trigger action "after every <nnn> number of seconds/minutes/hours"

Using advanced I could also use the %CurrentStatusDuration_sec% variable to create my desired outcome, so thanx for the pointer.
Your offered solution would be fine. Though I would still prefer an option in the drop down list to the effect of above example. Any chance that might make your "2 do" list?
Steven
Posts: 44
Joined: Thu Feb 16, 2006 5:42 pm

Post by Steven »

Code: Select all

("SimpleStatus" == DOWN) AND ((%CurrentStatusDuration_sec% MOD 600==3) AND (%CurrentStatusDuration_sec% <1800)) OR (%CurrentStatusDuration_sec%==3600) OR (%CurrentStatusDuration_sec%==36000) OR ((%CurrentStatusDuration_sec% MOD 86400==0) AND (%CurrentStatusDuration_sec% >60))
I'm just providing this as an example to anyone else who wants to do this. Have not tested it yet, so please let me know if you see anything immediate that needs to be edited.

This is what it should do:

Do "action" if down when:
1) every 10 minutes if under half an hour (including once right away [3 sec delay] )
2) after 1 hour
3) after 10 hours
4) once a day after that ( ">60" is just to ensure it doesn't go off twice right off the bat)

Here's my action:

Code: Select all

Action: Send e-mail
From: hostmonitor
To  : <destination address>
Subj: %testname% => %status% for %CurrentStatusDuration%
Mail template: Mail to admin
Condition to start: <see above>
KS-Soft Europe
Posts: 2832
Joined: Tue May 16, 2006 4:41 am
Contact:

Post by KS-Soft Europe »

Well, it seems to be correct. But there are two small mistakes, I suppose.
1. If you are using 3 sec delay, you should adjust 1800 sec value to 1810. Otherwise we would miss last 10 minutes action execution.
2. I think, you should wrap %CurrentStatusDuration% related expression part into additional brakets, to separate such part from ("SimpleStatus" == DOWN) expression.

So, outcome expression should be like this:
("SimpleStatus" == DOWN) AND (((%CurrentStatusDuration_sec% MOD 600==3) AND (%CurrentStatusDuration_sec% <1810)) OR (%CurrentStatusDuration_sec%==3600) OR (%CurrentStatusDuration_sec%==36000) OR ((%CurrentStatusDuration_sec% MOD 86400==0) AND (%CurrentStatusDuration_sec% >60)))

Regards,
Max
Steven
Posts: 44
Joined: Thu Feb 16, 2006 5:42 pm

Post by Steven »

I actually only want it to tell me 3 times on that schedule.
So:
after 3 seconds
after 603 seconds
after 1203 seconds
I could have that number anywhere from 1204 - 1802 I guess :)

But yes, you are definitely correct with the brackets... thanx for pointing that out. I will fix it... and restart my "trial-test"
KS-Soft Europe
Posts: 2832
Joined: Tue May 16, 2006 4:41 am
Contact:

Post by KS-Soft Europe »

Steven wrote:I could have that number anywhere from 1204 - 1802 I guess
Yes, you are correct. It's my fault. :roll:

Regards,
Max
Steven
Posts: 44
Joined: Thu Feb 16, 2006 5:42 pm

Post by Steven »

It's not working. I have a question: using these expressions in the "bad" action list, will it work using "exact" times? or does it only check the expression on a "bad" hit? (at that specific point in time?)
KS-Soft Europe
Posts: 2832
Joined: Tue May 16, 2006 4:41 am
Contact:

Post by KS-Soft Europe »

Steven wrote:It's not working.
Yes. Really. It does not work.
Steven wrote:I have a question: using these expressions in the "bad" action list, will it work using "exact" times? or does it only check the expression on a "bad" hit? (at that specific point in time?)
Executing action is depending on test time interval. When test has been performed, HostMonitor is starting to execute actions in action profile. So, action does not run every second and we are not able to use %CurrentStatusDuration_sec% variable in such circumstances :-(

Regards,
Max
Steven
Posts: 44
Joined: Thu Feb 16, 2006 5:42 pm

Post by Steven »

I came the the same conclusion testing on our system. :-? So how bout that "wish list" ? :D
KS-Soft Europe
Posts: 2832
Joined: Tue May 16, 2006 4:41 am
Contact:

Post by KS-Soft Europe »

Lets return to %RECURRENCES% variable. If your particular test is performed every 1 minute, it means that %RECURRENCES% == 1 is 1 minute of time, %RECURRENCES% == 10 is matched with 10 minutes, etc. So, you may compose expresion using %RECURRENCES% variable to trigger action every <nnn> number of seconds/minutes/hours. It might help.

Regards,
Max
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

There is another mistake - DOWN without quotes. Should be ('SimpleStatus' == 'DOWN')
And yes, using %CurrentStatusDuration_sec% and MOD operation will not work fine.
Probably you may use single action to check all test items - HostMonitor provides %HM_BadItems% macro variable. You may create additional test item and start alert every N hours if %HM_BadItems% > 0

Regards
Alex
Steven
Posts: 44
Joined: Thu Feb 16, 2006 5:42 pm

Post by Steven »

Good suggestions.
Here's what I've been working on (and yes, I noticed the missing ' s too :) ):

('%SimpleStatus%' == 'DOWN') AND (((%Recurrences% MOD 120==1) AND (%CurrentStatusDuration_sec% <1800)) OR (%Recurrences%==280) OR (%Recurrences%==820) OR (((%Recurrences%-240) MOD 1440==0) AND (%CurrentStatusDuration_sec% >1800)))

I also dynamically change the interval to test more often immediately after a test fails until 20 minutes down, then restore to 1 minute interval. This is getting pretty complicated.

I'll probably leave it like this, if it works, but I'd really appreciate it if the above option made the "2 do" list, understanding however that it may not be possible. :-?
Steven
Posts: 44
Joined: Thu Feb 16, 2006 5:42 pm

Post by Steven »

hmmm, seems when I do:

((%Recurrences%-240) MOD 1440==0)

the expression fails.

I just don't get anything.. although this is in an OR part..
Steven
Posts: 44
Joined: Thu Feb 16, 2006 5:42 pm

Post by Steven »

i guess when I saw
arithmetic operators div, mod
in the manual, I misread it thinking "+ - * /" would work too. Nevermind, I'll try it like this:
((%Recurrences%) MOD 1440==240)
Post Reply