Execute external program through RMA

Kapz · Post by **Kapz** » Fri Aug 05, 2005 6:02 am

Hi !

I have an Alert profile that contains an action that executes an external program through the agent performing tests for this particular server ("Execute by Test Performer").

This profile is used on some 50+ servers as they all have the same directory structure thus making it possible for me to use the same command line for the action.

However on one of the servers the command line (that starts a .bat file on the server) isn't executed. Nothing seems to happen and when I monitor the server with NTFileMon nothing happens on the drive that contains the .bat file so there shouldn't be a problem with the path to the .bat file.
Nothing is written to the "Bad" log but a line with Connection established is added to the "Good" log.

The agent is configured to allow connections from our main monitor server and from itself. All test methods are marked but for actions only Execute external command is marked.

Running HM 5.38 on Win2003 (no SP1) and agent 3.14 (both agent and configurator) on Win2003 (no SP1).

Can I enable some extended logging options somewhere or do you have an idea of where I could begin and what I should look for in order to debug on this one ?

Thanks !

Kasper :O)

KS-Soft · Post by **KS-Soft** » Fri Aug 05, 2005 10:19 am

You should check HostMonitor's system log (log specified on Advanced Logs page in the Options dialog)

Regards
Alex

Kapz · Post by **Kapz** » Fri Aug 05, 2005 2:09 pm

Alex,

> You should check HostMonitor's system log (log specified on Advanced Logs page in the Options dialog)
My System log was set to to use the common log file but when I searched the common log all I saw was that the test failed as supposed in HM.
I tried changing the setting so that the System log logged to an independent log file but nothing is logged for this particular test.

Kasper :O)

KS-Soft · Post by **KS-Soft** » Fri Aug 05, 2005 4:41 pm

Could you enable "success" log as well?
If both options ("Record info about successful actions" and "Record info about failed actions") are enabled and you do not see any records in the log, it means HostMonitor "thinks" action should not be executed.

But you said this profile is used for many test items, all works except one.. May be problem is somewhere else? What you have in BAT file? Probably RMA starts BAT file but BAT file cannot execute specified commands, e.g. due to luck of permissions... Could you add some simple command like "time /t >>c:\testlog.txt" into BAT file? This way you will know script was started or didn't.

Regards
Alex

Kapz · Post by **Kapz** » Fri Aug 05, 2005 10:21 pm

Hi Alex,

> Could you enable "success" log as well?
Done

> If both options ("Record info about successful actions" and "Record info
> about failed actions") are enabled and you do not see any records in
> the log, it means HostMonitor "thinks" action should not be executed.
Hum, I don't see any records although the test goes red in HM and gets status Bad.

> But you said this profile is used for many test items, all works except
> one..
That's the scenario, yes.

> May be problem is somewhere else? What you have in BAT file?
During all this testing all I have is:

echo > D:\SystemMaintenance\SystemMaintenanceDone.txt

> Probably RMA starts BAT file but BAT file cannot execute specified
> commands, e.g. due to luck of permissions...
I don't think so. I haven't tweaked with the permissions on this server and I should be able to see any permission errors in NTFileMon but no access whatsoever is tried to the directory D:\SystemMaintenance which is also the directory where the .bat fil RynSystemMaintenance.bat resides.
Also, during testing I tried to get RMA to execute the file C:\a.bat instead but with same results.

> Could you add some simple command like "time /t >>c:\testlog.txt"
> into BAT file? This way you will know script was started or didn't.
According to the above testing with my System log it seems like HM gives the test status Bad but never executes the profile so I guess the error is somewhere in HM on our Monitor server.

Any ideas on what could be next move ?

Thanks !

Kasper :O)

KS-Soft · Post by **KS-Soft** » Sun Aug 07, 2005 9:43 pm

Hum, I don't see any records although the test goes red in HM and gets status Bad.

What start conditions do you use? Alert should be started when test changes status from Ok to Bad? Or you are using more complicated conditions? You are using "standard" or "avanced" mode for this action?

Regards
Alex

Kapz · Post by **Kapz** » Mon Aug 08, 2005 12:17 am

Alex,

> What start conditions do you use?
> Alert should be started when test changes status from Ok to Bad?

Yes, when status changes from Ok to Bad.
Here is the export of the test that is supposed to run the .bat file.
It tests if TCP port 3456 (Remote agents port - we don't use 1055) answers but the test has been reversed so a positive answer results in status Bad which then triggers the .bat file.
As master test it has the test "SuperBest02 Port 3456 (RMA)" that checks if TCP port 3456 (Remote agents port) answers. With this master test I'm pretty sure that my test will go bad as long as TCP port 3456 does in fact answer on the master test.

Method = Tcp
;--- Common properties ---
;DestFolder = Root\DC Kunder\SuperBest\SuperBest02\SystemState\
RMAgent = SuperBest02
Title = SuperBest02 Action: Run SystemMaintenance.bat
Comment =
RelatedURL =
ScheduleMode= Regular
Schedule = KasperTest
Interval = 60
Alerts = KasperTest
ReverseAlert= Yes
UnknownIsBad= No
UseCommonLog= Yes
PrivLogMode = Default
CommLogMode = Default
SyncCounters= Yes
SyncAlerts = No
DependsOn = list
MasterTest-Alive = SuperBest02 Port 3456 (RMA)
;--- Test specific properties ---
Host = 195.41.139.129
Port = 3456

> Or you are using more complicated
> conditions?
I shouldn't think so.

> You are using "standard" or "avanced" mode for this action?
Hmm - I don't even know how to switch between these two modes so probably it's standard. Where can I see this ?

Thanks !

Kasper :O)

KS-Soft · Post by **KS-Soft** » Mon Aug 08, 2005 12:36 pm

It tests if TCP port 3456 (Remote agents port - we don't use 1055) answers but the test has been reversed so a positive answer results in status Bad which then triggers the .bat file.
As master test it has the test "SuperBest02 Port 3456 (RMA)" that checks if TCP port 3456 (Remote agents port) answers. With this master test I'm pretty sure that my test will go bad as long as TCP port 3456 does in fact answer on the master test.
...
SyncCounters= Yes
SyncAlerts = No
DependsOn = list
MasterTest-Alive = SuperBest02 Port 3456 (RMA)

Master and dependant test checks the same agent???
Of course HostMonitor does not execute action. Because dependant test never has "Bad" status (unless time interval is more short then "up-to-date" interval specified for master tests). It will have "WaitForMaster" or "Host is alive" status.

If you set "Sync status&alerts" option for dependant test, it should help.
But truth to say I do not understand why do you need that dependant test at all. Why don't you use this action as good action assigned for master test?

Regards
Alex

Kapz · Post by **Kapz** » Mon Aug 08, 2005 1:33 pm

Alex,

> Master and dependant test checks the same agent???
Sure - what's the problem ?

1) I want to see if IP 1.2.3.4 answers on TCP port 3456. Actually this is not the real purpose of the test - the real purpose is to make the test go Bad so my external script execution can take place as part of the "Bad" actions for the alert profile

2) Before my test fires I have a Master test that checks if IP 1.2.3.4 answers on TCP port 3456. If it does answers my test is fired.

3) When fired the test determine that IP 1.2.3.4 answers on TCP port 3456. Nu surprise as the Master test revealed the same.

4) As my test is reversed the positive answer from 1.2.3.4 on port 3456 is turned into a Bad result and my test goes red.

5) On some 50+ servers this works like a charm - but on a single one it doesn't although the test gets status Bad and turns red.

> Of course HostMonitor does not execute action.
> Because dependant test never has "Bad" status (unless time interval is
> more short then "up-to-date" interval specified for master tests).
> It will have "WaitForMaster" or "Host is alive" status.
No, it gets status Bad - no doubt about that.
I don't understand why HM will not launch execute action when the test goes red and gets status Bad - and why it works on all other servers then ?

> But truth to say I do not understand why do you need that dependant
> test at all. Why don't you use this action as good action assigned for
> master test?
The master test is solely used for testing if the agent answers and it runs once every minute 24 hours a day. I don't think launching my script that often would be a succes

The dependant test that executes the action runs only one time once every 24 hours and is actually ment as a replacement for Windows own quite unreliable Scheduler that used to fire the .bat file. But sure - the Master test isn't really needed.
However, in order to use the same alert profile across multiple servers using "Execute by Test Performer" in the alert profile I have to involve the remote agent one way or another and I just choose to let it test if it ran itself.

But it seems like I'm missing something obvious here or ... ?

Thanks for your help !

Kasper :O)

KS-Soft · Post by **KS-Soft** » Mon Aug 08, 2005 2:01 pm

I don't understand why HM will not launch execute action when the test goes red and gets status Bad - and why it works on all other servers then ?

If agent responds, HostMonitor sets "Host is alive" status for master test. Then it checks dependand test and set "Bad" status. Right? So far so good.
What happens if agent does not respond? Master test changes status to "No answer", dependant test changes status to "WaitForMaster".
If after a while agent respons, statuses will be changed to "Host is alive" and "Bad" again. But action will not be started because dependant test never had "good" status. It never changed status from "good" to "bad".

Its just one scenario. I still don't know action settings that you are using. If you are using "Repeat: until status changes" action option, action should be started.

Solution? Set "Sync status&alerts" test option or set "Repeat: until status changes" action option or use "advanced" mode action with condition like "('%SimplyStatus%'=='DOWN')

and why it works on all other servers then ?

May be dependant tests had "good" status before you assigned "master" test or may be dependant test was checked without master test checking because "up-to-date" time was not expired yet....
Its very diffucult to find what is going on on your system when I don't have access to your screen.

The dependant test that executes the action runs only one time once every 24 hours and is actually ment as a replacement for Windows own quite unreliable Scheduler that used to fire the .bat file. But sure - the Master test isn't really needed.

Ok, now I know what do you need. My recommendations:
1) remove link to master test
2) unmark "reverse alert" option
3) remove "bad" action from profile
4) add the same "good" action to the profile
5) use "repeat: until status changes" option for the action
Actually step 2, 3 and 4 is not necessary, but I think its better to see "good" test when everything works fine.

Regards
Alex

Kapz · Post by **Kapz** » Mon Aug 08, 2005 2:48 pm

Alex,

> If agent responds, HostMonitor sets "Host is alive" status for master
> test. Then it checks dependand test and set "Bad" status. Right? So far
> so good.
Yes, I agree.

> What happens if agent does not respond? Master test changes status
> to "No answer", dependant test changes status to "WaitForMaster".
Agree too.

> If after a while agent respons, statuses will be changed to "Host is
> alive" and "Bad" again.
Yes.

> But action will not be started because
> dependant test never had "good" status.
> It never changed status from "good" to "bad".
But in my alert profile for the dependant test the first "Bad" action is a HMScript that runs the line "ResetRecurrencesTest %TestName%".
Shouldn't this have the effect that the dependant test doesn't need a last status of "Good" to execute it's "Bad" actions again as it's reccurences counter is set to 0 ?

> Its just one scenario. I still don't know action settings that you are
> using. If you are using "Repeat: until status changes" action option,
> action should be started.
I have two "Bad" actions:

1) The HMScript containing "ResetRecurrencesTest %TestName%"
2) The one executing the external program - my .bat file.

Fpr both the conditions to start action is set to Standard mode, Start when 1 consecutive "Bad" results occur and Repeat 1 time(s). "until status changes" is not marked for both.
I don't have any "Good" actions.

>> and why it works on all other servers then ?
> May be dependant tests had "good" status before you
> assigned "master" test or may be dependant test was checked without
> master test checking because "up-to-date" time was not expired yet....
Dunno. I just created the test once for all and after having tested it over a few days it has simply been copied again and again to the rest of the servers.

> Its very diffucult to find what is going on on your system when I don't
> have access to your screen.
Absolutely !
Would it be of any help if I sent you my entire HostMonitor directory (9 MB zipped) - if you are interested that is of course

> Ok, now I know what do you need. My recommendations:
> 1) remove link to master test
> 2) unmark "reverse alert" option
> 3) remove "bad" action from profile
> 4) add the same "good" action to the profile
> 5) use "repeat: until status changes" option for the action
> Actually step 2, 3 and 4 is not necessary, but I think its better to
> see "good" test when everything works fine.
Yes, I can clean up a lot of stuff and also avoid many tests going red - when they in fact only do the job they are supposed to. Good idea.

Thanks !

Kasper :O)

KS-Soft · Post by **KS-Soft** » Mon Aug 08, 2005 7:32 pm

But in my alert profile for the dependant test the first "Bad" action is a HMScript that runs the line "ResetRecurrencesTest %TestName%".
Shouldn't this have the effect that the dependant test doesn't need a last status of "Good" to execute it's "Bad" actions again as it's reccurences counter is set to 0 ?

Yes, this should do the trick.
But I think you don't need that script. You may use "Repeat: until status changes" option instead.
Lets simplify you configuration: remove unnecessary master tests, unmark reverse alert, remove script...

If after this changes action will not be started, I would like to see all your settings. I will need HML file with tests, all *.LST files, and hostmon.ini file

Regards
Alex

Kapz · Post by **Kapz** » Tue Aug 16, 2005 7:08 am

Alex,

First of all - everything works now

I've simplified the setup, so now this is how it looks like:

* Schedule used only perform checks one minute (=one check) once every 24 hours

* Let remote agent at IP 1.2.3.4 port 3456 test if ip 1.2.3.4 port 3456 answers

* If it answers - which is expected - launch Profile A

* Profile A doesn't contain any "Bad" actions but two "Good" actions:
1) ResetReccurences %TESTNAME%
2) Execute external script on Test Performer
Both actions is set to repeat until status changes.

I have a question though.

If I set Repeat to 1 time(s) for both actions - which I would belive was the setting to use - instead of "until status changes" my script doesn't seem to be executed. Why is that?
I guess that the only reason that I can use "until status changes" is that the schedule used for both actions only runs for one minute every day meaning that HM never gets the chance to perform a second test - which would eventually result in my ResetReccurences %TESTNAME% and my external script beeing executed once more, right ?

Thanks !

Kasper :O)

KS-Soft · Post by **KS-Soft** » Tue Aug 16, 2005 11:09 am

1) ResetReccurences %TESTNAME%
2) Execute external script on Test Performer
Both actions is set to repeat until status changes.

I think you could remove "ResetRecurrences" as well

If I set Repeat to 1 time(s) for both actions - which I would belive was the setting to use - instead of "until status changes" my script doesn't seem to be executed. Why is that?

Actions will be executed when test status CHANGES from "bad" to "good". The same about ResetRecurrences action - it will be executed when test status changes. If test never changed status...

Regards
Alex