How to alert on High Latency?

All questions related to installations, configurations and maintenance of Advanced Host Monitor (including additional tools such as RMA for Windows, RMA Manager, Web Servie, RCC).
Post Reply
sneader
Posts: 90
Joined: Thu Dec 22, 2005 3:32 pm

How to alert on High Latency?

Post by sneader »

I have read the tests documentation, and also searched for the word LATENCY here on the forum (only 7 matches, none of which describe how to test for latency).

So... will ask the community... how does one test for High Latency? Let's say that I want to be alerted when the link is showing 300ms or more latency for 30 seconds (as an example)?

My guess is that I can use the Ping test, alter the Timeout setting, maybe use a larger number of packets... but maybe someone has a good "how-to" formula for doing this. And then it will be here for generations to come... :D

- Scott
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

Yes, you may use Ping with specified Timeout...
Also you may use Trace test with "maximum reply time longer than.." option

Regards
Alex
sneader
Posts: 90
Joined: Thu Dec 22, 2005 3:32 pm

Post by sneader »

Hi Alex. Thanks for the quick reply -- as always, very impressive and very much appreciated.

Can I beg of you to spell it out in more detail, with regards to using the ping test? Given my example, how would you do it?

TIA!

- Scott
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

For example you may setup HostMonitor to perform ping test with 2 sec timeout every 3 min and add "advanced" action into alert profile assigned to this test.
www.ks-soft.net/hostmon.eng/mframe.htm# ... properties

Use condition like (%AverageReply% > 300) and (%recurrences% mod 10 == 1) to start action every 30 min if average reply time is over 300 ms.
Also you need another "advanced" mode action to launch "Execute HMScript" action with ResetTest command to reset statistics (AverageReply) for this test item. Use condition like (%recurrences% mod 10 == 1) to start this action
And probably you need "standard" action to start alert if ping does not respond at all.

Regards
Alex
sneader
Posts: 90
Joined: Thu Dec 22, 2005 3:32 pm

Post by sneader »

Hi Alex. OK, I have done something similar and it appears to be meeting my needs... except... I can't get it to actually alert me. The alert goes RED, but no action takes place. Obviously I'm doing something wrong, but I have looked at it a dozen times before realizing I'm never going to figure it out without some help.

I've exported the test, as follows (edited IP address for privacy)

;-----------------------------------------------------------------------------
;- HostMonitor`s export/import file -
;- Generated by HostMonitor at 5/8/2006 10:27:21 AM -
;- Source file: C:\Program Files\HostMonitor5\Examples\example1.hml -
;- Generation mode: Selected_Tests -
;-----------------------------------------------------------------------------


; ------- Test #01 -------


Method = Ping
;--- Common properties ---
;DestFolder = Root\Internet\
Title = Internet Latency
Comment = 6mb ATM IMA
RelatedURL =
ScheduleMode= Regular
Schedule = 7 Days, 24 Hours
Interval = 180
Alerts = Email Scott Internet Latency
ReverseAlert= No
UnknownIsBad= Yes
UseCommonLog= No
PrivateLog = C:\HostMon Logs\Internet.htm
PrivLogMode = Default
CommLogMode = Default
SyncCounters= No
SyncAlerts = No
DependsOn = list
MasterTest-Alive = Juniper 1
MasterTest-Alive = Juniper 2
;--- Test specific properties ---
Host = 123.123.123.123
Timeout = 2000
Retries = 10
MaxLostRatio= 100
DisplayMode = time

;-----------------------------------------------------------------------------
; Exported 1 tests



Here is the Action Profile:

"Bad" status actions:

Action type: Send e-mail
Action name: e-mail Scott Desk
Execute by: HostMonitor
Condition to Start Action - Advanced mode selected
Logical Expression: (%AverageReply% > 100)
Time Restriction: (none)
Action Parameters:
From: (from e-mail -- deleted for privacy)
To: (to e-mail -- deleted for privacy)
Subject: %TestName% High %AverageReply%ms
Body Template: Email Detail Latency
Attach File (not selected)

"Good" status actions

none created yet -- I can't see how to make a Good one set to "depends on bad one" -- I'm confused on this as well.


Thanks for any pointers you can provide.

- Scott
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

The alert goes RED, but no action takes place.
...
Logical Expression: (%AverageReply% > 100)
Your expression checks AverageReply, it does not check the status.
none created yet -- I can't see how to make a Good one set to "depends on bad one" -- I'm confused on this as well.
This option available for "standard" actions only

Regards
Alex
sneader
Posts: 90
Joined: Thu Dec 22, 2005 3:32 pm

Post by sneader »

Thanks for the quick reply -- do you ever sleep? :D

Gosh, I'm still lost on this. I'm determined to get this to work, but I have to ask, and please take this only in a constructive way... Should checking and alerting on high latency be so difficult? I would think this is a basic test that everyone would want to use. If normal latency is 50ms, and I am seeing > 500ms latency for some period of time (30 second average perhaps), that is a problem that I want to know about. Am I being crazy to want this to be fairly simple to implement?

Anyway, sorry for rambling...

In the previous instructions, you had me do a ping test with 2 seconds timeout. So, unless the pings all timeout, this is never going to show a Status of "down", correct? So, testing for the Status isn't going to work, right?

Maybe I'm making this too difficult. What if I do this...


Ping Test
Timeout 500ms
Packets 10
Status is bad when 80% or more of packets lost
Display: Reply Time

This will alert me when 8 of 10 packets are 500ms or more of latency, correct? And I can perform this test every few minutes and when two tests are bad, I can fire up the pagers.

I think this is going to work fine for my needs. But I would still like to learn more about the Advanced Actions. I've read the manual of course, but it's not making sense to me.

One question... on Action Properties, it says "Action will be executed when parameters meet specified conditions". Since I have "('%AverageReply%' > 100) " -- if that condition is met, why isn't the action executed? Why do I also need to test for the status? Sorry for being such a blockhead.

- Scott
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

Gosh, I'm still lost on this. I'm determined to get this to work, but I have to ask, and please take this only in a constructive way... Should checking and alerting on high latency be so difficult? I would think this is a basic test that everyone would want to use. If normal latency is 50ms, and I am seeing > 500ms latency for some period of time (30 second average perhaps), that is a problem that I want to know about. Am I being crazy to want this to be fairly simple to implement?
If you want to make it simple, you may set timeout to 500ms and use simple standard mode action. E.g. if you perform test every 10 sec and want to start alert after 30 sec, then use "Start when 3 consecutive bad results occur". That's it.
One question... on Action Properties, it says "Action will be executed when parameters meet specified conditions". Since I have "('%AverageReply%' > 100) " -- if that condition is met, why isn't the action executed? Why do I also need to test for the status?
In previous post you said "The alert goes RED". I assume this means test failed with status "Bad" or "No answer". "No answer" does not equivalent to "AverageReply>100" because in such case you simply do not have any reply.
Timeout 500ms
Packets 10
Status is bad when 80% or more of packets lost
Display: Reply Time
This will alert me when 8 of 10 packets are 500ms or more of latency, correct?
Corerct

Regards
Alex
sneader
Posts: 90
Joined: Thu Dec 22, 2005 3:32 pm

Post by sneader »

KS-Soft wrote:In previous post you said "The alert goes RED". I assume this means test failed with status "Bad" or "No answer". "No answer" does not equivalent to "AverageReply>100" because in such case you simply do not have any reply.
Whoops, you got me on that. As it was, not only was latency high, but the circuit was DOWN. So, the 2 second pings you had me create were failing, causing the test to be "Bad" / "No answer".

Anyway, the simple test that I thought of, and you confirmed, will work perfectly fine for my needs. I hope this is helpful to others that wonder by here looking for help with testing for latency.

I would still like to better understand this one question I posed, though...
sneader wrote:One question... on Action Properties, it says "Action will be executed when parameters meet specified conditions". Since I have "('%AverageReply%' > 100) " -- if that condition is met, why isn't the action executed? Why do I also need to test for the status?
If the average reply was greater than 100ms, shouldn't it have executed my action, which was to send me an e-mail? (that is exactly how it reads above, so I'm just trying to understand what went wrong)

Thanks for the great support and great product, Alex!!!!

- Scott
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

If the average reply was greater than 100ms, shouldn't it have executed my action, which was to send me an e-mail? (that is exactly how it reads above, so I'm just trying to understand what went wrong)
It should.
Please note: %AverageReply% is not what you see in Reply field of the test. Its what you see in "Average reply" field - average result of previous checks.
Thanks for the great support and great product, Alex!!!!
You are welcome :)

Regards
Alex
Post Reply