What about "severity level" ?

JuergenF · Post by **JuergenF** » Thu Oct 06, 2005 12:53 am

Dear all,

I hope that wasn't asked to many times before (Search function didn't help me)

Today from my pov there are two status conditions (Good and Bad). (Ok and "unknown" as well)

I have some tests that should report a status of "Warnung".

For example a disk space test should not go from "green" to "red" (Good to bad), but to "yellow" and maybe later to "red".

Is that possible today ?
Or planned ?

Thanks a lot

Juergen

KS-Soft · Post by **KS-Soft** » Thu Oct 06, 2005 7:01 pm

Its not possible today and its require significant changes in HostMonitor. Probably it will be implemented, in version 6.50 or later

Regards
Alex

JuergenF · Post by **JuergenF** » Sun Jun 04, 2006 12:54 pm

Dear Alex,

as you are working on 6.x please keep the wish for a "yellow" state on the ToDo list.

Many thanks

Juergen

KS-Soft Europe · Post by **KS-Soft Europe** » Sun Jun 04, 2006 1:06 pm

We have such task in "to do" list. But I don't know when it will be implemented.

Regards,
Max

JuergenF · Post by **JuergenF** » Sun Oct 08, 2006 2:15 pm

Any news when that feature may be available ?

Many thanks and keep on working.
HM is a great product and beats all of the tools I tested so far.

Even reading a bit in the forum solved two of my wishes today !!
- Reports & Statistics: Display Alive/Dead ratio of alive/dead time (instead of tests)
- disable "Show folder names" option

KS-Soft · Post by **KS-Soft** » Mon Oct 09, 2006 4:36 pm

Answer is the same - approximately in version 6.50

Many thanks and keep on working.
HM is a great product and beats all of the tools I tested so far.

You are welcome

Regards
Alex

Snucke · Post by **Snucke** » Tue Oct 24, 2006 6:51 am

We also have a "problem" with this and unfortunally it makes some people "ignorant" when using HM. Almost like false alerts.

For example, if we monitor 25 fairly busy servers for cpu usage, it will happen on occasion (too often maby?) that it sometimes hit 100% and thus makes the test red/failed. At the next test 1 minute later it goes back to OK again. If this happens too often the operator will start to ignore the alerts because "it´s the cpu again it will be ok soon". And when something really bad happens, we are too slow to react because of this.

I think the main problem is that we use the webinterface as an "visual alert" indicator but since it doesnt use the same criteria as an alert profile it´s useless from that point of view because it creates too much overhead work.

Perhaps an alert action that says "make red in gui/webinterface" would do the trick?

Of course we are using other alert methods as well but we´d rather not rely on a working mailserver or gsm network and ignore the console instead.

The other problem is when generating statistics, if an application is not responding at a given time but a couple of seconds later, its not "dead" but from a statistic view it is, making us look like we dont live up to our SLA becasue everything is black or white.

But HM is still VERY useful

KS-Soft · Post by **KS-Soft** » Tue Oct 24, 2006 8:24 pm

I see your point. May be "Warning" status will be implemented before New Year

Regards
Alex

thomasschmeidl · Post by **thomasschmeidl** » Mon Oct 30, 2006 2:37 pm

That's good news

Appreciating your work and looking forward to this New Year's present

Kind regards

Thomas

KS-Soft · Post by **KS-Soft** » Fri Nov 03, 2006 5:54 pm

I think HostMonitor should not calculate bad and good statistics counters (Alive%, Dead%, Alive Time, etc) when it sets Warning status. However HostMonitor should keep "warning" statistics and increment appropriate counters when test will be changed back to good or finaly to bad...
E.g.
1) good -> good (increment good counters) -> warning (initiate warning counters) -> good (good_counters+=warning_counters)
2) good -> good (increment good counters) -> warning (initiate warning counters) -> bad (bad_counters+=warning_counters)
Question is how to handle "Unknown" statuses

Probably Warning status should be used for "bad" conditions only...

And probably we should use Warning status when test comes from Bad to Ok as well: good -> good -> warning(bad) -> bad -> bad -> warning(good) -> good.

Or may be we don't really need new status? May be we can just add new color items for tests that recently changed status from good to bad and vice versa

Regards
Alex

AntonyP · Post by **AntonyP** » Wed Nov 08, 2006 2:53 am

I believe that the warning status should not be added, simply because it would be easier to set the alarm trigger at an earlier stage.

E.g. on a disk usage test

hard disk has 2gb free space

I set 1gb free space alarm for HM

Now, what would be the meaning of having the warning status on 1.2bg? I can set the alarm at 1.2gb instead...

KS-Soft · Post by **KS-Soft** » Wed Nov 08, 2006 11:52 am

Snucke needs Warning status for tests that fail just once (or twice) and then return back to "good" state.
So may be we should implement 2 new statuses (like warning and pre-positive). These statuses will be handled just like other bad/good statuses (Bad, No answer, Bad content / Ok, Host is alive) for statistics and alerting purposes. But such tests can be displayed in different color, HostMonitor may apply different sorting order, generate separate HTML reports.
How this sounds?

Regards
Alex

FLynch · Post by **FLynch** » Thu Nov 09, 2006 8:17 am

Please please do not loose the core notion and functionality of 'warning' status.

Most monitoring systems have granularity in there status levels - 'warning' brings your attention to a potential problem before it becomes critical.

Very straightforward and will be a significant step forward for AHM.

FLynch · Post by **FLynch** » Thu Nov 09, 2006 8:49 am

Sorry, my previous post was not very clear....can I give a real world example of why having a warning status is such an important feature:

Disk fragmentation: to run properly volumes need 15% free space. problem with having a single alert state is where to set it, ie: if at 15% it is to late, if at, say, 20%, IT Ops look at it, do nothing (!) and then don't get altered when it is a real problem.

Having a 'warning' status set at 20% and a 'down/alert' status at 15% resolves this issue. There are hundreds of this type of circumstance that occurs when managing and monitoring systems.

Cheers
Fergus

KS-Soft · Post by **KS-Soft** » Thu Nov 09, 2006 2:00 pm

Hey, we are talking about different options in the same topic.
Snucke's "warning" option can be implemented much easier than FLynch's "warning option

I may agree to implement both options. But.. there will be too many various statuses, too complicated alert conditions... This will lead to configuration problems and may be you will spend more time to manage HostMonitor instead of managing target systems

Regards
Alex

KS-Soft

What about "severity level" ?

What about "severity level" ?

Need this to...

Warning status...