Unreliable Process test results and process test vs WMI

All questions related to installations, configurations and maintenance of Advanced Host Monitor (including additional tools such as RMA for Windows, RMA Manager, Web Servie, RCC).
Post Reply
ironcurtain
Posts: 34
Joined: Mon Apr 28, 2008 10:15 am

Unreliable Process test results and process test vs WMI

Post by ironcurtain »

We have a Host Monitor deployment we use for monitoring our production environment.

Host Monitor is running on a domain member server in the corporate interior.

One of the tests monitors processes on a machine in the DMZ. The DMZ machine is not in the domain, it is in a workgroup (of one).

We are trying to use Host Monitor to determine whether a process on the DMZ machine is running.

The tests in Host Monitor yield unreliable results for the Process test method. When we use a straight forward count of the number of instances of the process the results returned are 97% Unknown status over a specific time that we have been tracking this test. The test returns an 'Unkown' status even when the process is running continuously. We are not clear on the reasons for this.

As an alternative we have tried a WMI test, that counts whether number of threads in the process > 1. This reliably comes back either OK if the process is running, or unknown if the process has stopped. This at least enables us to test if the process is running and report. These tests have returned 100% 'Ok' over the same specific time that we have been tracking the Process test.

Can anyone suggest why we might be having a problem with the Process test method?
Is there a WMI test that can be used to test for the instances of a process running, that returns a "bad" test result (not "unknown") if the process has stopped?
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

Can anyone suggest why we might be having a problem with the Process test method?
Process test method uses Performance Counter technology. This means a lot of components are involved in data processing: HostMonitor <-> Windows API <-> PDH.DLL <-> RPC service <-> Remote Registry Service <-> PerfProc.DLL on remote system. Actually Process test does not use the most unreliable link of this chain: PDH.DLL. HostMonitor uses our own code and in most cases this test works pretty reliable.

What version of HostMonitor do you use?
Windows on local system? Service Pack?
Windows on remote system? Service Pack?
In order to investigate this problem you may setup CPU Usage test against the same remote system. It works in similar way however this test will provide some diagnostic information in Reply field of the test
Is there a WMI test that can be used to test for the instances of a process running, that returns a "bad" test result (not "unknown") if the process has stopped?
Sure, use the following options of the test
- Alert if row count is < than 1
- If instance is not available, set Bad status

Regards
Alex
ironcurtain
Posts: 34
Joined: Mon Apr 28, 2008 10:15 am

Post by ironcurtain »

HostMonitor Version: 7.22C
Local Host Windows OS: Server 2003 Standard Edition, Service Pack 2
Remote Host Windows OS: Server 2003 R2 Enterprise Edition, SP2

Note: Since Tuesday this week I have not been getting any Access id denied errors from the WMI test with the following query:

SELECT ThreadCount FROM Win32_Process WHERE Name = 'ServiceName.exe'


I have created a CPU Usage test against the same target host, as suggested and will monitor these results over the next few days, to check for any diagnostic information in the Reply value.

I also have a performance counter test against the same target host:
\\{ServerName}.{DOMAIN}.{NAME}.COM\Memory\Available MBytes

I also have WMI Tests to check drive free space on the target host:
select FreeSpace from Win32_LogicalDisk where DeviceID = 'C:'
ironcurtain
Posts: 34
Joined: Mon Apr 28, 2008 10:15 am

Post by ironcurtain »

My results came back sonner than expected :)

In Test properties I have not set anything for "Connect as" field for both of these tests.
I am using Connection Manager to store the details of the server I am connecting to:
- Resource (UNC): \\{hostname}
- Server or Domain: {hostname}
- Login: User with admin rights
- Use the account as default for test methods:
Process, Service, NT Event Log, CPU Usage, Performance Counter, WMI, Dominant Process.



CPU Usage test method Results
------------------------------------

The CPU Usage test method is returning a reply "Cannot connect to remote Registry. Code #5".
I understand that Code #5 means Access is denied, is that correct?


Memory Performance Counter Results
--------------------------------------------

The Memory Performance Counter returns the following error:

Error: Unable to access the desired machine or service. Check the permissions and authentication of the log service or the interactive user session against those on the machine or service being monitored.


WMI free space Results
--------------------------------------------

The WMI tests for free disk space have not yet returned "Unkown" status.


I have read in other postings some suggestions:
1. Try to use the IP address instead of the hostname, it sometimes helps.
2. TCP ports above 1023 should be opened on firewall
3. Use RMA
4. Install the RPC service. What is the RPC service and where can I get this to install on the target server, please?

Please would you comment on the above and advise what I may do for resolving this problem?

Thank you
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

1. Try to use the IP address instead of the hostname, it sometimes helps.
2. TCP ports above 1023 should be opened on firewall
3. Use RMA
4. Install the RPC service. What is the RPC service and where can I get this to install on the target server, please?
Its not related to item #2 or #4. If you can perform the test (at least sometimes) or test returns "Acess denied" this means ports are accessible and RPC service is running. BTW: RPC - Remote Procedure Call, this service is included and started on every Windows machine by default.
Why the tests work and returns "access denied" from time to time? Hard to explain :( Windows is not our product.
I would recommend to check security event log on target system. Any error messages? Please make sure failure auditing is enabled on that system

#1 Yes, I heard that using of IP address instead of hostname sometimes can help. Comments? We cannot explain this, you should ask Microsoft.

#3 RMA? Sure, RMA installed on target system should help. You may setup HostMonitor to use RMA to perform the tests directly on target system.

Regards
Alex
Post Reply