"Summary Tests" to reduced duplicate alerts

Need new test, action, option? Post request here.
Post Reply
villox
Posts: 5
Joined: Thu Nov 20, 2003 8:33 pm

"Summary Tests" to reduced duplicate alerts

Post by villox »

Hi,

There may be other ways to do this, and would appreciate guidance if there is. We have multiple, related tests that are not dependent on each other but may likely fail at the same time. Any of them could fail independently, so we need to monitor each, but we don't want to be notified twice when they do fail.

For example, we may monitor both CPU and Free memory on a box. If either one exceeds the threshold, we need an alert. But often, both might fire at the same time. We don't want to get both notifications.

Note that it's not enough to make CPU a master test of Memory or vice versa, because there isn't a direct relationship like that. If there were a way to associate these two with one another in a "Summary" alert whose status was a reflection of the worst state of all the other tests (preferable with all tests in a given folder) this would be ideal. Especially if it could actually include in its status the states of all the other tests. It would send a "down" when the first subtest fails and an "up" when all of the subtests are up again.

Plus, the summary alert would not need to actually monitor anything, except the other tests.

Alternatively, I know with current featureset we could create a test that is a master of the others and it would be the one to generate events by only being active when the other tests are bad (all other times, Waiting for Master), but it actually needs to monitor something and can't send an "Up" event when the other tests are okay again.

I'd like to not have to maintain complex scripts to do this.
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

You may use "advanced" actions: http://www.ks-soft.net/hostmon.eng/mfra ... ncedaction

E.g. If you have 2 tests and you want to receive single alert when both or any test fails, use expression like (('%::TestA::SimpleStatus%'=='DOWN') and (%::TestA::Recurrences%==1)) xor (('%::TestB::SimpleStatus%'=='DOWN') and (%::TestB::Recurrences%==1))

If you need single message when both or any test restores "good" status, use similar expression (('%::TestA::SimpleStatus%'=='UP') and (%::TestA::Recurrences%==1)) xor (('%::TestB::SimpleStatus%'=='UP') and (%::TestB::Recurrences%==1))

Regards
Alex
villox
Posts: 5
Joined: Thu Nov 20, 2003 8:33 pm

Post by villox »

The problem is that that gets very complicated. It may be fine for CPU and memory but that is just a simple example. In reality we have dozens of URLs that are interrelated and will potentially all fail at the same time.

I'd like to have one event for example, a folder of alert statuses.
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

Sorry, this option cannot be implemented very soon. May be in version 5.60, or later...

Regards
Alex
Post Reply