KS-Soft. Network Management Solutions
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister    ProfileProfile    Log inLog in 

Status reset after RMA connection error

 
Post new topic   Reply to topic    KS-Soft Forum Index -> Configuration, Maintenance, Troubleshooting
View previous topic :: View next topic  
Author Message
Kapz



Joined: 06 Dec 2004
Posts: 216
Location: Denmark

PostPosted: Mon Mar 21, 2005 4:33 am    Post subject: Status reset after RMA connection error Reply with quote

Hi !

HM5.10 on Win2003 but also seen in HM4.86:

* A test has status Bad
* HM executes the alert profile as supposed
* HM looses for some reason the connection to the remote agent resulting in a status RMA Connection Error
* The connection to the remote agent i reestablished
* HM detects that the test still has status Bad
* HM executes the alert profile again

Is there a way to tell HM that an RMA Connection Error should not reset the tests state ?
This behavior only occurs on test with status Bad. If a test with status Ok gets status RMA Connection Error and then is reconnected the profile isn't triggered.

Note, that we do belive that HM 4.70 and earlier did not behave this way - we have an idea that it wasn't until 4.86 that an RMA Connection Error triggered a reset of a Bad test's state.

Thanks !

Kasper :O)
Back to top
View user's profile Send private message
KS-Soft



Joined: 03 Apr 2002
Posts: 12795
Location: USA

PostPosted: Mon Mar 21, 2005 11:50 am    Post subject: Reply with quote

I think you have changed "Treat Unknown status as Bad" option (Test Properties dialog). If this option enabled, HostMonitor will not reset "Recurrences" counter when status changes Unknown<->Bad

Regards
Alex
Back to top
View user's profile Send private message Visit poster's website
Kapz



Joined: 06 Dec 2004
Posts: 216
Location: Denmark

PostPosted: Mon Mar 21, 2005 6:07 pm    Post subject: Reply with quote

Alex,

> I think you have changed "Treat Unknown status as Bad" option (Test Properties dialog). If this option enabled, HostMonitor will not reset "Recurrences" counter when status changes Unknown<->Bad

Yes, the could very well be it.
On all our tests, that depends on an agent I have disabled the "Treat Unknown status as Bad" option as we got quite a few yellow Unknown statuses when nothing really was wrong.

Is there a hidden switch somewhere that can prevent HM from resetting the Recurrences counter even when "Treat Unknows status as Bad" isn't enabled - or will this break some logic somewhere ?

Thanks !

Kasper :O)
Back to top
View user's profile Send private message
KS-Soft



Joined: 03 Apr 2002
Posts: 12795
Location: USA

PostPosted: Tue Mar 22, 2005 12:38 pm    Post subject: Reply with quote

Quote:
On all our tests, that depends on an agent I have disabled the "Treat Unknown status as Bad" option


You should ENABLE this option. If you want to keep Recurrences counter when test status changes from Unknown to Bad, you should enable the option.

Quote:
we got quite a few yellow Unknown statuses when nothing really was wrong


Usually if HostMonitor cannot perform some test, it displays error message in "Reply" field. What do you see in "Reply" field?

Quote:
Is there a hidden switch somewhere that can prevent HM from resetting the Recurrences counter even when "Treat Unknows status as Bad" isn't enabled - or will this break some logic somewhere ?


It will break logic. When this option is disabled, HostMonitor does not start alerts for Unknown status. What actions should be executed when test changes status to Unknown and then (e.g. after 2 Unknown statuses) to Bad without resetting Recurrences counter? HM will launch actions that should be executed after 3rd, 4th... consecutive bad result. It means 1st and 2nd action never will be started.

Regards
Alex
Back to top
View user's profile Send private message Visit poster's website
KS-Soft



Joined: 03 Apr 2002
Posts: 12795
Location: USA

PostPosted: Tue Mar 22, 2005 12:41 pm    Post subject: Reply with quote

BTW: If you want to keep status/statistics for some tests (TestA, TestB) that depend on RMA functionality, add TCP test to check HostMonitor<->RMA connection and setup this test as Master test for TestA and TestB

Regards
Alex
Back to top
View user's profile Send private message Visit poster's website
Kapz



Joined: 06 Dec 2004
Posts: 216
Location: Denmark

PostPosted: Tue Mar 22, 2005 1:29 pm    Post subject: Reply with quote

Alex,

>> On all our tests, that depends on an agent I have disabled the "Treat Unknown status as Bad" option
>You should ENABLE this option. If you want to keep Recurrences counter when test status changes from Unknown to Bad, you should enable the option.
Yes, that's what I learned from your answers.

>> we got quite a few yellow Unknown statuses when nothing really was wrong
> Usually if HostMonitor cannot perform some test, it displays error message in "Reply" field. What do you see in "Reply" field?
Sorry for not beeing accurate in my post, with yellow Unknows I simply meant RMA Connection Errors.

>> Is there a hidden switch somewhere that can prevent HM from resetting the Recurrences counter even when "Treat Unknows status as Bad" isn't enabled - or will this break some logic somewhere ?
> It will break logic. When this option is disabled, HostMonitor does not start alerts for Unknown status. What actions should be executed when test changes status to Unknown and then (e.g. after 2 Unknown statuses) to Bad without resetting Recurrences counter? HM will launch actions that should be executed after 3rd, 4th... consecutive bad result. It means 1st and 2nd action never will be started.
Point taken. We only operate with one single action for all our tests and that is triggered after the third consecutive Bad result so I didn't take multiple actions into consideration.

> BTW: If you want to keep status/statistics for some tests (TestA, TestB) that depend on RMA functionality, add TCP test to check HostMonitor<->RMA connection and setup this test as Master test for TestA and TestB
We already do that, but there is no guarantee that even though the RMA Agent answers on a given TCP port a test performed by the agent won't result in a RMA Connection Error. This is actually the reason why we unchecked "Treat Unknown Status as Bad" in the first place.
While we can always count on a simple TCP test to the agent RMA connection errors aren't that rare on tests performed *by* the agent.

So, bottom line: Unless I enable "Treat Unknown Status as Bad" I will still experience my Bad Profile beeing triggered again upon an RMA connection error on a test that already had status Bad.
This will on the other hand result in an SMS sent to me saying "TestA Bad, RMA Connection Error" whenever a test with status Ok experiences three consecutive RMA Connection Errors - perhaps simply because the remote server is busy rather than the test is actually Bad.

Can you verify my conclusion ?
Please note, that I'm not trying to sound grumpy; I'm simply trying to get this right in my head

Kasper :O)
Back to top
View user's profile Send private message
KS-Soft



Joined: 03 Apr 2002
Posts: 12795
Location: USA

PostPosted: Tue Mar 22, 2005 4:06 pm    Post subject: Reply with quote

Quote:
We already do that, but there is no guarantee that even though the RMA Agent answers on a given TCP port a test performed by the agent won't result in a RMA Connection Error. This is actually the reason why we unchecked "Treat Unknown Status as Bad" in the first place.
While we can always count on a simple TCP test to the agent RMA connection errors aren't that rare on tests performed *by* the agent.


So, sometimes you see "RMA Connection Error" and than next check returns "good" (or "bad") result? Looks like you should increase timeout specified for the agent (Agent Connection Parameters dialog).

Regards
Alex
Back to top
View user's profile Send private message Visit poster's website
Kapz



Joined: 06 Dec 2004
Posts: 216
Location: Denmark

PostPosted: Wed Mar 23, 2005 3:27 am    Post subject: Reply with quote

Alex,

> So, sometimes you see "RMA Connection Error" and than next check returns "good" (or "bad") result?
Yes, that's what happens.

> Looks like you should increase timeout specified for the agent (Agent Connection Parameters dialog).
Our agents already have a time out value of 120 seconds, so it shouldn't be a timeout issue (also the answer RMA Connection Error occurs way before 120 seconds).
Trouble is, that I cannot detect what exactly is causing these errors as the reply"RMA Connection Error" is quite generic and the agent down't put anything in its bad log. I'd be happy to help tracking down what exactly goes wrong so if some logs could potentially reveal anything let me know where to look - or if you have some switch that enables logging for development purposes that can be enabled I can do that.

Kasper :O)
Back to top
View user's profile Send private message
KS-Soft



Joined: 03 Apr 2002
Posts: 12795
Location: USA

PostPosted: Wed Mar 23, 2005 12:01 pm    Post subject: Reply with quote

Quote:
and the agent down't put anything in its bad log.


Looks like agent does not receive connection. What system is running agent? Windows workstation? or server edition?
How many test items performed by the agent?

Probably this problem described here: http://www.ks-soft.net/hostmon.eng/rma-win/index.htm#problems

Regards
Alex
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic    KS-Soft Forum Index -> Configuration, Maintenance, Troubleshooting All times are GMT - 6 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group

KS-Soft Forum Index