Possibility to keep current status

Need new test, action, option? Post request here.
Post Reply
Kapz
Posts: 216
Joined: Mon Dec 06, 2004 2:33 pm
Location: Denmark

Possibility to keep current status

Post by Kapz »

Hi !

I have a problem that seems to be chasing its own tail no matter how I turn it around so perhaps someone here is clever than me :wink:

Initially what we experienced was this:

A test carried out by an agent fails with an RMA connection error as reply and status changes to Bad (please forget about *why* connection errors happen for now).
If this happens three times in a row the alert profiles Bad-actions are triggered and we're alerted that a test has status Bad due to connectivity errors with the agent. Often the connection with the agent will be okay one or two minutes later, the test changes status to OK and the alert profiles Good-actions are triggered.

To avoid having these "false" alarms send I ticked 'Use "Warning" status if:' and entered '('RMA:' in '%SuggestedReply%')' as condition. I also unticked 'Treat Warning status as Bad' to avoid having the "false" alarms send.

This does the job and we receive no more "false" alarms telling about RMA connection errors - as long as the test has status OK.

Trouble is that this setup does exactly the opposite if a test has status Bad, e.g. a drive with 10 GB free and a threshold on the test saying 20 GB.
In this case the test already has status Bad when the RMA connection errors occur, test changes status to Warning as supposed - but as I described Warning is *not* treated as Bad meaning that the test now changes status from Bad to Warning and once the connection errors are gone and the test changes status back to Bad the alert profiles Bad-actions are triggered once again.

Now, there is no doubt that everything within HM does exactly what it should do with the setup mentioned above.
However as this does not solve my problem I've been trying to find a way to tell HM to simply keep the test's current state in case of an RMA connection failure - so far without any luck.
What I actually need is functionality similar to a checkbox under 'Optional status processing' labeled 'Keep current status if' where I could put the string '('RMA:' in '%SuggestedReply%')' as condition - but this checkbox doesn't exist 8)

Perhaps the answer to problem is quite obvious (I hope !) and can actually be achieved through some regular expressions somewhere that I just can't seem to find the logic in - can you give me an advise ?

Thanks in advance !

/Kasper
KS-Soft Europe
Posts: 2832
Joined: Tue May 16, 2006 4:41 am
Contact:

Post by KS-Soft Europe »

You may setup 1 Master test item for each RMA. E.g.
ping localhost using agent. Then setup all other test items performed by
this agent as dependant tests so HostMonitor will not execute them when
connection to agent cannot be established.

http://www.ks-soft.net/hostmon.eng/mfra ... htm#Master

Regards,
Tom
Kapz
Posts: 216
Joined: Mon Dec 06, 2004 2:33 pm
Location: Denmark

Post by Kapz »

Hi Tom !

Thanks for your reply.

Although I see the idea in your suggestion it probably won't do the job as we already use dependencies widely.

Our standard method for monitoring a server is this:

1) Can HM itself ping the server at all ?
2) If yes, can HM itself make a TCP connection to the RMA's port ?
3) If yes, let the RMA perform the rest of the tests

The individual tests (3) already depend on HM's ability to connect to the RMA's port (2) - and the test that connects to the RMA's port depends on HM's ability to ping the server (1).

When we receive the RMA connection errors we don't have any problems pinging the server that the RMA runs on and also the RMA's port answers - thus the individual tests are beeing performed by the RMA and we risk receiving RMA connection errors under certain circumstances.

Hope I was able to clarify the "path" in our concept - I can post a text dump from HM with examples if needed.

Thanks in advance and best regards

Kasper
KS-Soft Europe
Posts: 2832
Joined: Tue May 16, 2006 4:41 am
Contact:

Post by KS-Soft Europe »

Hi Kasper,

In your concept I can see the possible problem in point #2. Is there any way you can check that it is RMA is the responder to your TCP connection request ? I mean there is possibly some other software which can use the same port and point #2 will give "Good" result, but you still won't be able to connect to RMA. Method which I suggested avoids such a problem because if there's no connection between RMA and HM (I mean applications/services, not hosts), then HM will not receive ping test results from RMA and won't start dependent tests.

Regards,
Tom
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

Probably you just need to decrease value of "Consider status of the master test obsolete after N seconds" parameter. Option located on Behavior page in the Options dialog.

Regards
Alex
Kapz
Posts: 216
Joined: Mon Dec 06, 2004 2:33 pm
Location: Denmark

Post by Kapz »

Hi !

Sorry for not getting back before now :oops:

Tom:
> Is there any way you can check that it is RMA is the responder to your
> TCP connection request ?
I am pretty sure that it is in fact an RMA that answers on out chosen TCP port 3456.
I do see the point in your suggestion about other applications interferring on the port, but my issue have been seen on so many different servers and setup that it is impossible that it's a question of other applications interferring.

Alex:
> Probably you just need to decrease value of "Consider status of the
> master test obsolete after N seconds" parameter. Option located on
> Behavior page in the Options dialog.
This could be a workaround. I've decreased the value from 60 seconds to 30 and will see if that makes a difference.

Thanks both for your inputs so far ! :P

/Kasper
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

60 -> 30 sec is not a big difference.
If you do not have too many master-dependant relations, you may set 3-5 sec.
If you are not sure about this, you may try to set some value (e.g. 5 sec), then use Auditing Tools to check for posible problems. Auditing Tools will warn you when you have too many relations and HostMonitor will not be able to perform all master tests within specific time.

Regards
Alex
Post Reply