What about "severity level" ?

JuergenF · Post by **JuergenF** » Sun Nov 12, 2006 4:32 pm

FLynch wrote:Please please do not loose the core notion and functionality of 'warning' status.

Most monitoring systems have granularity in there status levels - 'warning' brings your attention to a potential problem before it becomes critical.

Very straightforward and will be a significant step forward for AHM.

Exactly what I think

Here are some of my thoughts http://www.ks-soft.net/cgi-bin/phpBB/vi ... 4146#14146

JuergenF · Post by **JuergenF** » Sun Nov 12, 2006 4:55 pm

AntonyP wrote:I believe that the warning status should not be added, simply because it would be easier to set the alarm trigger at an earlier stage.

E.g. on a disk usage test
hard disk has 2gb free space
I set 1gb free space alarm for HM

Now, what would be the meaning of having the warning status on 1.2bg? I can set the alarm at 1.2gb instead...

The difference is getting a phone call early in the morning for all red statuses.
So there is a need to separate warnings from real problems.
Warning means you have to do something not getting a real problem - but not immediately

genasea · Post by **genasea** » Mon Nov 13, 2006 8:39 am

Alex,

I had posed this question last year, and it is similar to the requests being asked for here. The vast majority of the tests that we have are performance based tests (not fault based tests). It would be very handy to have a qualification for when these types of tests actually go into alarm or are marked as 'Bad'. Like some of the other users on this forum have pointed out, many of the our IT staff are no longer looking at the alarms since there are so many performance based tests in alarm at any given time. We have about 15 alarms at any given time (mostly preformance based alarms (CPU, Disk time, Page faults, etc), And most are from different servers each minute (so the test is only in alarm during a single test interval), but typically only average a few faults a day.

My thoughts are to add a feature where the person setting up the tests could decide how many tests would need to be in a 'bad' state, prior to setting the test to bad, while keeping the same test interval. For example, if a server's CPU was 100% for 10 minutes in a row (or 10 - test cycles set at 60 second intervals), then set the test to 'bad' rather than just go bad the first time it gets a 100% test result.

I know that you stated that this would require additional coding to the core functionality. However, my organization is thinking on moving away from HM, and our two licenses, because they feel the product does not deal effectively with performance based tests.

Thank you for your consideration,

Scott

KS-Soft · Post by **KS-Soft** » Mon Nov 13, 2006 1:12 pm

Probably we should keep "basic" scheme as is and provide ability to set additional statuses using expressions. Just like we did with "standard" and "advanced" actions.
E.g. implement 2 new statuses and 2 options
[x] Use expression to set Warning status
[ ] Use expression to set Normal status
So you will be able to use expressions lke ('%Reply%'>'70 %') and ('%Reply%'<'90 %') and ('%MainRouter::SimpleStatus%'=='UP')
Warning/Normal statuses will be handled just like other bad/good statuses for statistics purposes. But such items can be displayed in different color, HostMonitor may apply different sorting order, generate separate HTML reports.
This way we keep "basic" setup simple enough and provide great flexibility when you really need that.

Regards
Alex

JuergenF · Post by **JuergenF** » Wed Nov 15, 2006 12:43 am

Dear Alex,

that sounds good to me.

- and it will be possible to set the test to bad after the 5th test between 70 and 90 % CPU ?
- So we can have an HTML report only showing warning and bad tests in different colours ?

KS-Soft · Post by **KS-Soft** » Wed Nov 15, 2006 8:54 pm

- and it will be possible to set the test to bad after the 5th test between 70 and 90 % CPU ?

H'm..
- expression like "('%Reply%'>70 %') and ('%Reply%'<=90 %')" will set Warning status when CPU Usage between 70 and 90 % (Bad status if CPU Usage over 90%)
- expression like "('%SimpleStatus%'=='DOWN') and (%Recurrences<5)" will set Warning status for 1st..4th failed probe (5th failed probe will use Bad status)
- you may combine condition, e.g. "('%Reply%'>70 %') and ('%Reply%'<=90 %') and ('%SimpleStatus%'=='DOWN') and (%Recurrences<5).
But its impossible to combine in your way (HostMonitor does not have history for all previous Reply values, except log of course). Unless Warning status resets Recurrences. In such case we will need to redesign actions related behaviour (don't really want to do that until version 8 or something).

- So we can have an HTML report only showing warning and bad tests in different colours ?

Sure

Regards
Alex

KS-Soft · Post by **KS-Soft** » Fri Dec 08, 2006 4:56 pm

Done. Version 6.50 Beta available at http://www.ks-soft.net/hostmon.eng/downpage.htm
What's new: http://www.ks-soft.net/hostmon.eng/news.htm

Regards
Alex

FLynch · Post by **FLynch** » Sat Dec 09, 2006 11:53 am

Downloaded and tried this functionality out and it is spot on - works well.

Many thanks for introducing this, it takes AHM to a new level!

Cheers

KS-Soft · Post by **KS-Soft** » Sat Dec 09, 2006 12:54 pm

You are welcome

Regards
Alex

KS-Soft

What about "severity level" ?

Re: Warning status...

Similar request