What about "severity level" ?
What about "severity level" ?
Dear all,
I hope that wasn't asked to many times before (Search function didn't help me)
Today from my pov there are two status conditions (Good and Bad). (Ok and "unknown" as well)
I have some tests that should report a status of "Warnung".
For example a disk space test should not go from "green" to "red" (Good to bad), but to "yellow" and maybe later to "red".
Is that possible today ?
Or planned ?
Thanks a lot
Juergen
I hope that wasn't asked to many times before (Search function didn't help me)
Today from my pov there are two status conditions (Good and Bad). (Ok and "unknown" as well)
I have some tests that should report a status of "Warnung".
For example a disk space test should not go from "green" to "red" (Good to bad), but to "yellow" and maybe later to "red".
Is that possible today ?
Or planned ?
Thanks a lot
Juergen
-
- Posts: 2832
- Joined: Tue May 16, 2006 4:41 am
- Contact:
Any news when that feature may be available ?
Many thanks and keep on working.
HM is a great product and beats all of the tools I tested so far.
Even reading a bit in the forum solved two of my wishes today !!
- Reports & Statistics: Display Alive/Dead ratio of alive/dead time (instead of tests)
- disable "Show folder names" option
Many thanks and keep on working.
HM is a great product and beats all of the tools I tested so far.
Even reading a bit in the forum solved two of my wishes today !!
- Reports & Statistics: Display Alive/Dead ratio of alive/dead time (instead of tests)
- disable "Show folder names" option
Need this to...
We also have a "problem" with this and unfortunally it makes some people "ignorant" when using HM. Almost like false alerts.
For example, if we monitor 25 fairly busy servers for cpu usage, it will happen on occasion (too often maby?) that it sometimes hit 100% and thus makes the test red/failed. At the next test 1 minute later it goes back to OK again. If this happens too often the operator will start to ignore the alerts because "it´s the cpu again it will be ok soon". And when something really bad happens, we are too slow to react because of this.
I think the main problem is that we use the webinterface as an "visual alert" indicator but since it doesnt use the same criteria as an alert profile it´s useless from that point of view because it creates too much overhead work.
Perhaps an alert action that says "make red in gui/webinterface" would do the trick?
Of course we are using other alert methods as well but we´d rather not rely on a working mailserver or gsm network and ignore the console instead.
The other problem is when generating statistics, if an application is not responding at a given time but a couple of seconds later, its not "dead" but from a statistic view it is, making us look like we dont live up to our SLA becasue everything is black or white.
But HM is still VERY useful
For example, if we monitor 25 fairly busy servers for cpu usage, it will happen on occasion (too often maby?) that it sometimes hit 100% and thus makes the test red/failed. At the next test 1 minute later it goes back to OK again. If this happens too often the operator will start to ignore the alerts because "it´s the cpu again it will be ok soon". And when something really bad happens, we are too slow to react because of this.
I think the main problem is that we use the webinterface as an "visual alert" indicator but since it doesnt use the same criteria as an alert profile it´s useless from that point of view because it creates too much overhead work.
Perhaps an alert action that says "make red in gui/webinterface" would do the trick?
Of course we are using other alert methods as well but we´d rather not rely on a working mailserver or gsm network and ignore the console instead.
The other problem is when generating statistics, if an application is not responding at a given time but a couple of seconds later, its not "dead" but from a statistic view it is, making us look like we dont live up to our SLA becasue everything is black or white.
But HM is still VERY useful

-
- Posts: 166
- Joined: Sat Apr 15, 2006 2:14 pm
- Location: Germany, Bavaria
I think HostMonitor should not calculate bad and good statistics counters (Alive%, Dead%, Alive Time, etc) when it sets Warning status. However HostMonitor should keep "warning" statistics and increment appropriate counters when test will be changed back to good or finaly to bad...
E.g.
1) good -> good (increment good counters) -> warning (initiate warning counters) -> good (good_counters+=warning_counters)
2) good -> good (increment good counters) -> warning (initiate warning counters) -> bad (bad_counters+=warning_counters)
Question is how to handle "Unknown" statuses
Probably Warning status should be used for "bad" conditions only...
And probably we should use Warning status when test comes from Bad to Ok as well: good -> good -> warning(bad) -> bad -> bad -> warning(good) -> good.
Or may be we don't really need new status? May be we can just add new color items for tests that recently changed status from good to bad and vice versa
Regards
Alex
E.g.
1) good -> good (increment good counters) -> warning (initiate warning counters) -> good (good_counters+=warning_counters)
2) good -> good (increment good counters) -> warning (initiate warning counters) -> bad (bad_counters+=warning_counters)
Question is how to handle "Unknown" statuses

And probably we should use Warning status when test comes from Bad to Ok as well: good -> good -> warning(bad) -> bad -> bad -> warning(good) -> good.
Or may be we don't really need new status? May be we can just add new color items for tests that recently changed status from good to bad and vice versa

Regards
Alex
I believe that the warning status should not be added, simply because it would be easier to set the alarm trigger at an earlier stage.
E.g. on a disk usage test
hard disk has 2gb free space
I set 1gb free space alarm for HM
Now, what would be the meaning of having the warning status on 1.2bg? I can set the alarm at 1.2gb instead...
E.g. on a disk usage test
hard disk has 2gb free space
I set 1gb free space alarm for HM
Now, what would be the meaning of having the warning status on 1.2bg? I can set the alarm at 1.2gb instead...
Snucke needs Warning status for tests that fail just once (or twice) and then return back to "good" state.
So may be we should implement 2 new statuses (like warning and pre-positive). These statuses will be handled just like other bad/good statuses (Bad, No answer, Bad content / Ok, Host is alive) for statistics and alerting purposes. But such tests can be displayed in different color, HostMonitor may apply different sorting order, generate separate HTML reports.
How this sounds?
Regards
Alex
So may be we should implement 2 new statuses (like warning and pre-positive). These statuses will be handled just like other bad/good statuses (Bad, No answer, Bad content / Ok, Host is alive) for statistics and alerting purposes. But such tests can be displayed in different color, HostMonitor may apply different sorting order, generate separate HTML reports.
How this sounds?
Regards
Alex
Warning status...
Please please do not loose the core notion and functionality of 'warning' status.
Most monitoring systems have granularity in there status levels - 'warning' brings your attention to a potential problem before it becomes critical.
Very straightforward and will be a significant step forward for AHM.
Most monitoring systems have granularity in there status levels - 'warning' brings your attention to a potential problem before it becomes critical.
Very straightforward and will be a significant step forward for AHM.
Sorry, my previous post was not very clear....can I give a real world example of why having a warning status is such an important feature:
Disk fragmentation: to run properly volumes need 15% free space. problem with having a single alert state is where to set it, ie: if at 15% it is to late, if at, say, 20%, IT Ops look at it, do nothing (!) and then don't get altered when it is a real problem.
Having a 'warning' status set at 20% and a 'down/alert' status at 15% resolves this issue. There are hundreds of this type of circumstance that occurs when managing and monitoring systems.
Cheers
Fergus
Disk fragmentation: to run properly volumes need 15% free space. problem with having a single alert state is where to set it, ie: if at 15% it is to late, if at, say, 20%, IT Ops look at it, do nothing (!) and then don't get altered when it is a real problem.
Having a 'warning' status set at 20% and a 'down/alert' status at 15% resolves this issue. There are hundreds of this type of circumstance that occurs when managing and monitoring systems.
Cheers
Fergus
Hey, we are talking about different options in the same topic.
Snucke's "warning" option can be implemented much easier than FLynch's "warning option
I may agree to implement both options. But.. there will be too many various statuses, too complicated alert conditions... This will lead to configuration problems and may be you will spend more time to manage HostMonitor instead of managing target systems
Regards
Alex
Snucke's "warning" option can be implemented much easier than FLynch's "warning option

I may agree to implement both options. But.. there will be too many various statuses, too complicated alert conditions... This will lead to configuration problems and may be you will spend more time to manage HostMonitor instead of managing target systems

Regards
Alex