periodic reminder of a failed test
periodic reminder of a failed test
I'd like to set up hostmonitor to periodically remind me when something is still down. Right now I'm just telling it to send a 2nd e-mail after 36 bad results, then a 3rd one after 96 bad results and a final one after 255. After each E-mail I reduce the testing interval to stretch the time.
I would prefer to keep original testing interval and to just trigger actions periodically depending on how long ( "<nnn> number of seconds/minutes/hours" ) the test has been in the current status. Perhaps there is a scripted approach? Even better would be to trigger action "after every <nnn> consecutive bad results".
Or perhaps you could just increase the max number of consecutively failed tests? ( from: "Start when <nnn> consecutive "bad" results occur" ...increase <nnn> to 2-byte value with max of 65536? )
I don't remember seeing this option in the manual (although this is a year and 5 upgrades later) or in the action profile settings.
I would prefer to keep original testing interval and to just trigger actions periodically depending on how long ( "<nnn> number of seconds/minutes/hours" ) the test has been in the current status. Perhaps there is a scripted approach? Even better would be to trigger action "after every <nnn> consecutive bad results".
Or perhaps you could just increase the max number of consecutively failed tests? ( from: "Start when <nnn> consecutive "bad" results occur" ...increase <nnn> to 2-byte value with max of 65536? )
I don't remember seeing this option in the manual (although this is a year and 5 upgrades later) or in the action profile settings.
-
- Posts: 2832
- Joined: Tue May 16, 2006 4:41 am
- Contact:
If you want to start action on 36th, 72nd, 108th... iterations, when test is still down, you should use Advanced mode action and expression ("SimpleStatus" == DOWN) AND (%RECURRENCES% MOD 36==0)
You can find more information about advanced actions here: http://www.ks-soft.net/hostmon.eng/mfra ... ncedaction
Regards,
Max
You can find more information about advanced actions here: http://www.ks-soft.net/hostmon.eng/mfra ... ncedaction
Regards,
Max
Thanx for you fast reply!
And that would work quite well. But I just noticed I mistyped my request
. What I meant, was not
trigger action "after every <nnn> number of seconds/minutes/hours"
Using advanced I could also use the %CurrentStatusDuration_sec% variable to create my desired outcome, so thanx for the pointer.
Your offered solution would be fine. Though I would still prefer an option in the drop down list to the effect of above example. Any chance that might make your "2 do" list?
And that would work quite well. But I just noticed I mistyped my request

but rather:trigger action "after every <nnn> consecutive bad results".
trigger action "after every <nnn> number of seconds/minutes/hours"
Using advanced I could also use the %CurrentStatusDuration_sec% variable to create my desired outcome, so thanx for the pointer.
Your offered solution would be fine. Though I would still prefer an option in the drop down list to the effect of above example. Any chance that might make your "2 do" list?
Code: Select all
("SimpleStatus" == DOWN) AND ((%CurrentStatusDuration_sec% MOD 600==3) AND (%CurrentStatusDuration_sec% <1800)) OR (%CurrentStatusDuration_sec%==3600) OR (%CurrentStatusDuration_sec%==36000) OR ((%CurrentStatusDuration_sec% MOD 86400==0) AND (%CurrentStatusDuration_sec% >60))
This is what it should do:
Do "action" if down when:
1) every 10 minutes if under half an hour (including once right away [3 sec delay] )
2) after 1 hour
3) after 10 hours
4) once a day after that ( ">60" is just to ensure it doesn't go off twice right off the bat)
Here's my action:
Code: Select all
Action: Send e-mail
From: hostmonitor
To : <destination address>
Subj: %testname% => %status% for %CurrentStatusDuration%
Mail template: Mail to admin
Condition to start: <see above>
-
- Posts: 2832
- Joined: Tue May 16, 2006 4:41 am
- Contact:
Well, it seems to be correct. But there are two small mistakes, I suppose.
1. If you are using 3 sec delay, you should adjust 1800 sec value to 1810. Otherwise we would miss last 10 minutes action execution.
2. I think, you should wrap %CurrentStatusDuration% related expression part into additional brakets, to separate such part from ("SimpleStatus" == DOWN) expression.
So, outcome expression should be like this:
("SimpleStatus" == DOWN) AND (((%CurrentStatusDuration_sec% MOD 600==3) AND (%CurrentStatusDuration_sec% <1810)) OR (%CurrentStatusDuration_sec%==3600) OR (%CurrentStatusDuration_sec%==36000) OR ((%CurrentStatusDuration_sec% MOD 86400==0) AND (%CurrentStatusDuration_sec% >60)))
Regards,
Max
1. If you are using 3 sec delay, you should adjust 1800 sec value to 1810. Otherwise we would miss last 10 minutes action execution.
2. I think, you should wrap %CurrentStatusDuration% related expression part into additional brakets, to separate such part from ("SimpleStatus" == DOWN) expression.
So, outcome expression should be like this:
("SimpleStatus" == DOWN) AND (((%CurrentStatusDuration_sec% MOD 600==3) AND (%CurrentStatusDuration_sec% <1810)) OR (%CurrentStatusDuration_sec%==3600) OR (%CurrentStatusDuration_sec%==36000) OR ((%CurrentStatusDuration_sec% MOD 86400==0) AND (%CurrentStatusDuration_sec% >60)))
Regards,
Max
I actually only want it to tell me 3 times on that schedule.
So:
after 3 seconds
after 603 seconds
after 1203 seconds
I could have that number anywhere from 1204 - 1802 I guess
But yes, you are definitely correct with the brackets... thanx for pointing that out. I will fix it... and restart my "trial-test"
So:
after 3 seconds
after 603 seconds
after 1203 seconds
I could have that number anywhere from 1204 - 1802 I guess

But yes, you are definitely correct with the brackets... thanx for pointing that out. I will fix it... and restart my "trial-test"
-
- Posts: 2832
- Joined: Tue May 16, 2006 4:41 am
- Contact:
-
- Posts: 2832
- Joined: Tue May 16, 2006 4:41 am
- Contact:
Yes. Really. It does not work.Steven wrote:It's not working.
Executing action is depending on test time interval. When test has been performed, HostMonitor is starting to execute actions in action profile. So, action does not run every second and we are not able to use %CurrentStatusDuration_sec% variable in such circumstancesSteven wrote:I have a question: using these expressions in the "bad" action list, will it work using "exact" times? or does it only check the expression on a "bad" hit? (at that specific point in time?)

Regards,
Max
-
- Posts: 2832
- Joined: Tue May 16, 2006 4:41 am
- Contact:
Lets return to %RECURRENCES% variable. If your particular test is performed every 1 minute, it means that %RECURRENCES% == 1 is 1 minute of time, %RECURRENCES% == 10 is matched with 10 minutes, etc. So, you may compose expresion using %RECURRENCES% variable to trigger action every <nnn> number of seconds/minutes/hours. It might help.
Regards,
Max
Regards,
Max
There is another mistake - DOWN without quotes. Should be ('SimpleStatus' == 'DOWN')
And yes, using %CurrentStatusDuration_sec% and MOD operation will not work fine.
Probably you may use single action to check all test items - HostMonitor provides %HM_BadItems% macro variable. You may create additional test item and start alert every N hours if %HM_BadItems% > 0
Regards
Alex
And yes, using %CurrentStatusDuration_sec% and MOD operation will not work fine.
Probably you may use single action to check all test items - HostMonitor provides %HM_BadItems% macro variable. You may create additional test item and start alert every N hours if %HM_BadItems% > 0
Regards
Alex
Good suggestions.
Here's what I've been working on (and yes, I noticed the missing ' s too
):
('%SimpleStatus%' == 'DOWN') AND (((%Recurrences% MOD 120==1) AND (%CurrentStatusDuration_sec% <1800)) OR (%Recurrences%==280) OR (%Recurrences%==820) OR (((%Recurrences%-240) MOD 1440==0) AND (%CurrentStatusDuration_sec% >1800)))
I also dynamically change the interval to test more often immediately after a test fails until 20 minutes down, then restore to 1 minute interval. This is getting pretty complicated.
I'll probably leave it like this, if it works, but I'd really appreciate it if the above option made the "2 do" list, understanding however that it may not be possible.
Here's what I've been working on (and yes, I noticed the missing ' s too

('%SimpleStatus%' == 'DOWN') AND (((%Recurrences% MOD 120==1) AND (%CurrentStatusDuration_sec% <1800)) OR (%Recurrences%==280) OR (%Recurrences%==820) OR (((%Recurrences%-240) MOD 1440==0) AND (%CurrentStatusDuration_sec% >1800)))
I also dynamically change the interval to test more often immediately after a test fails until 20 minutes down, then restore to 1 minute interval. This is getting pretty complicated.
I'll probably leave it like this, if it works, but I'd really appreciate it if the above option made the "2 do" list, understanding however that it may not be possible.
