KS-Soft. Network Management Solutions
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister    ProfileProfile    Log inLog in 

Wait for master triggers bad action?

 
Post new topic   Reply to topic    KS-Soft Forum Index -> Configuration, Maintenance, Troubleshooting
View previous topic :: View next topic  
Author Message
xcentric



Joined: 23 Oct 2010
Posts: 176

PostPosted: Sat Nov 26, 2011 8:30 pm    Post subject: Wait for master triggers bad action? Reply with quote

I was attempting to receive alerts for both UNKNOWN as well as DOWN SimpleStatus by using ('%SimpleStatus%'<>'UP") and (%Recurrences% mod 60==5).
Now bad actions are being triggered on "Wait for master" statuses. And causes good actions to trigger when Status goes from "Wait for master" to "Ok"with a PreviousStatus of "Warning".

I do not want to enable "Tread unknown status as bad" because this will increment the dead counters instead of the unknown counters.

Since "Wait for master" produces an INKNOWN SimpleStatus this is expected behavior. I don't understand why.

In the manual for "Status groups' it reads:

Code:
UNKNOWN        INACTIVE
---------------------------------------
Unknown        Disabled
Unknown host   Wait for master
               Out of schedule
               Paused

And for "SimpleStatus":

Code:
UNKNOWN
--------------------------------------
Unknown
Unknown host
Wait for master
Out of schedule
Paused

It would seem to benefit this type of scenario AND to be consistent with the status groups to add an additional SimpleStatus called INACTIVE and change from UNKNOWN "Wait for master", "Out of schedule" and "Paused" to INACTIVE.

If this were the case, I could specify in my bad action "DOWN" and UNKNOWN" SimpleStatus. This would leave out "Wait for master", "Out of schedule" and "Paused" which would now have an INACTIVE SimpleStatus. The appropriate counters will be incremented between Down and Unknown SimpleStatus. In fact now you could add an INACTIVE counter as well. I don't know what use that would have bu I'm just saying. Most important you would not have to use "Treat unknown as bad" to trigger a bad action from an UNKNOWN SimpleStatus.

Now I am convinced that Alex is going to find something wrong with this. Maybe there is another way that I have not considered. In any case an INACTIVE SimpleStatus for those particular statuses seems like a good idea.

Regards
Back to top
View user's profile Send private message
KS-Soft



Joined: 03 Apr 2002
Posts: 12795
Location: USA

PostPosted: Sun Nov 27, 2011 12:10 pm    Post subject: Reply with quote

Quote:
Now I am convinced that Alex is going to find something wrong with this. Maybe there is another way that I have not considered. In any case an INACTIVE SimpleStatus for those particular statuses seems like a good idea.

May be its good idea but changing behavior of the variable may lead to problems for other customers that already use various expressions with this variable.
I think you may use %Status% or %StatusID% variable instead.

Regards
Alex
Back to top
View user's profile Send private message Visit poster's website
xcentric



Joined: 23 Oct 2010
Posts: 176

PostPosted: Mon Nov 28, 2011 12:02 am    Post subject: Reply with quote

Quote:
May be its good idea but changing behavior of the variable may lead to problems for other customers that already use various expressions with this variable.

I understand completely. Thank you for listening.

Quote:
I think you may use %Status% or %StatusID% variable instead.


I am still having difficulty with certain behavior.
Maybe I should explain and see what you come up with because I am striking out here.

BAD
Set warning status below 4 recurrences and treat as bad.
To be alerted for unknown and bad status at 5 recurrences and repeat every hour.
To increment bad counter for bad status.
To increment unknown counter for unknown status.
Do not treat unknown as bad.

GOOD
To be alerted when bad or unknown changes to good status.

The problem is when a test goes into a warning or unknown status then to wait for master.
After some elapsed time when the test goes from wait for master to ok or host is alive because the previous state was warning or unknown the good action triggers.
The test did not complete 5 recurrences before going to wait for master. So how to overcome this?

These are the actions I thought would work and it does get rid of the good actions being fired by warning statuses but now the same thing happens for unknown which I do not want to get rid of.

Code:
Bad   = (('%SimpleStatus%'<>'UP') and not ('%Status%'=='Wait for Master')) and  (%Recurrences% mod 60==5)
Good   = (('%SimpleStatus%'=='UP') and not ('%LastStatus%'=='Warning')) and (%PreviousStatusDuration_sec% div %Interval_sec% >= 5) and (%Recurrences% == 5)


How does hm process wait for master?

This is what happens currently.

ok
ok
ok
warning or unknown
waitfor master
ok = Good actions are triggered because of the previouse warning or unknown status.

This is how I would like it to happen.

ok
ok
ok
warning or unknown
wait for master
ok = Last status was wait for master, use current status and go from here.

I cannot be the only one who has seen this behavior.

Regards
Back to top
View user's profile Send private message
KS-Soft



Joined: 03 Apr 2002
Posts: 12795
Location: USA

PostPosted: Mon Nov 28, 2011 8:26 am    Post subject: Reply with quote

Quote:
How does hm process wait for master?

HostMonitor does not do much when sets this status. HostMonitor does not reset Recurrences when sets this status, it does start start standard actions, it does not record event into log file.

Quote:
ok = Good actions are triggered because of the previouse warning or unknown status

Are you sure it is triggered at this moment? Recurrences should be 1 and action could not be triggered.
May be test results look like
ok
ok
ok
warning or unknown
waitfor master
ok
ok
ok
ok
ok = Good actions are triggered because of the previouse warning or unknown status.
?

Quote:
ok = Last status was wait for master, use current status and go from here.
I cannot be the only one who has seen this behavior

Actually not ('%LastStatus%'=='Warning') does not have much sense in this expression because there is ('%SimpleStatus%'=='UP') and (%Recurrences% == 5) in the same expression that means LastStatus is "good" status.

I assume you want to use %PreviousStatus% variable, probably (('%SimpleStatus%'=='UP') and ('%PreviousStatus%'<>'Warning')) and (%PreviousStatusDuration_sec% div %Interval_sec% >= 5) and (%Recurrences% == 5)

Quote from the manual
====================
To explain the following variables we need to explain difference between terms "PreviousStatus" and "LastStatus". "PreviousStatus" is status which test had before current status. "LastStatus" is status which test had after previous check. For example for the last 5 probes test had following statuses: #1-Bad, #2-Unknown, #3-Ok, #4-Ok, #5-Ok (current status is #5-Ok). In this case "PreviousStatus" is #2-Unknown but "LastStatus" is #4-Ok.
====================

Regards
Alex
Back to top
View user's profile Send private message Visit poster's website
xcentric



Joined: 23 Oct 2010
Posts: 176

PostPosted: Mon Nov 28, 2011 9:57 am    Post subject: Reply with quote

Quote:
Are you sure it is triggered at this moment? Recurrences should be 1 and action could not be triggered.
May be test results look like

I'm sorry it was a typo. Yes action triggered @ 5 recurrences. It does not matter though. It could be 1,2,3 or 4. Good action will be triggered at whatever recurrence is set in the expression when returning from wait for master.

Quote:
I assume you want to use %PreviousStatus% variable, probably (('%SimpleStatus%'=='UP') and ('%PreviousStatus%'<>'Warning')) and (%PreviousStatusDuration_sec% div %Interval_sec% >= 5) and (%Recurrences% == 5)

This is a better way to write it out and this does take care of false goods on a single warning status. But now I have to find out how to prevent false goods on a single unknown status. I could modify the warning expression to include unknown but then I am right back where I started by treating unknown as bad and incrementing bad counters instead of unknown counters. And if I disable treating warning as bad well now my bad recurrence will not increment to fire bad action.

It seems that if I include previous status unknown in the good expression this will work just like with warning. I wont receive good actions on unknown's coming back online but my unknown counters are preserved.

Wow. This is a vicious circle man.

I guess I cannot have my cake and eat it too?

Regards
Back to top
View user's profile Send private message
KS-Soft



Joined: 03 Apr 2002
Posts: 12795
Location: USA

PostPosted: Mon Nov 28, 2011 2:37 pm    Post subject: Reply with quote

Quote:
But now I have to find out how to prevent false goods on a single unknown status

This expression will not trigger "good" action after single unknown test result because of (%PreviousStatusDuration_sec% div %Interval_sec% >= 5)
If HostMonitor sets Unknown test status then WaitForMaster for some period of time then again Unknown or may be Ok status, that's different story

Regards
Alex
Back to top
View user's profile Send private message Visit poster's website
xcentric



Joined: 23 Oct 2010
Posts: 176

PostPosted: Mon Nov 28, 2011 4:32 pm    Post subject: Reply with quote

Quote:
If HostMonitor sets Unknown test status then WaitForMaster for some period of time then again Unknown or may be Ok status, that's different story

This is exactly what I have been trying to avoid. Imagine 100's of tests that depend on a single master test like a google ping. Some of those tests are in an unknown state when going to wait for master. When the master dependent comes online then BOOM all those tests now fire good action.

I'm trying to use expressions to prevent this behavior but it is difficult.

So to exclude warning unknown and unknown host how can this be written out. Like this? I have decided now that I don't need to receive good actions on unknown or warning at all. Doesn't seem possible. I only need to preserve the unknown counter and unknown bad alert.
Code:
(('%SimpleStatus%'=='UP') and not (('%PreviousStatus%'=='Warning') or ('%PreviousStatus%'=='Unknown') or ('%PreviousStatus%'=='Unknown host'))) and (%PreviousStatusDuration_sec% div %Interval_sec% >= 5) and (%Recurrences% == 1)


In fact since this expression would only process good action from 5 DOWN recurrences I can get rid of the previous status duration expression.

Regards
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    KS-Soft Forum Index -> Configuration, Maintenance, Troubleshooting All times are GMT - 6 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group

KS-Soft Forum Index