Issues with Performance Counter tests

All questions related to installations, configurations and maintenance of Advanced Host Monitor (including additional tools such as RMA for Windows, RMA Manager, Web Servie, RCC).
Post Reply
MSC
Posts: 11
Joined: Mon Nov 28, 2005 3:36 pm

Issues with Performance Counter tests

Post by MSC »

Alex,

I am having some problems with the Performace Counter tests.

With the test mode set to "External", I am getting the following reply:
"Error: script timed out.||"
This seems to affect about 10 servers.

With the test mode set to "Smart", I am getting the following reply:
"Error: Unable to connect to specified machine or machine is off line.||"
or:
"Error The wait operation timed out.||"
This seems to affect all servers.

With the test mode set to "OneByOne", I am getting the following reply:
"Error: The specified object is not found on the system.||"
or:
"Error: Unable to connect to specified machine or machine is off line.||"
This seems to affect 2, 3, 4 or 5 servers, but usually 2.

With the test mode set to "MultiThread", I am getting the following reply:
"Error: The wait operation timed out.||"
This seems to affect the majority of servers.


In addition to many other tests, I am running around 400 Performance Counter tests against 50 servers. The "Estimate load" window tells me that the current load is 2.8 tests/sec and that the system is able to perform this without significant load. I am noticing as well that it seems to take a while to perform the test. Here are some examples, with the test mode set to "OneByOne":

Server A: It takes nearly 3 minutes from status "Checking..." to get the reply "Error: Unable to connect to specified machine or machine is off line.||".
Server B: It takes nearly 2.5 minutes from status "Checking..." to get the "OK" status.

For example, UNC tests against both Server A and Server B take less than a second.

As stated above, in test mode "OneByOne" only a few servers are affected. Server A is one of the affected servers; Server B is one of the servers that is not affected. Both Server A and Server B are on the same subnet so network connectivity to the servers should be equal.

I am running HM 5.66 on Windows Server 2003 Enterprise Edition with a total of 1535 active tests. Your expert advise on this matter is greatly appreciated.

Regards, /|/|arc.
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

What the difference between ServerA (that doesn't work) and ServerB (that works fine)?
Different Windows? The same Windows but different Service Pack? The same Windows, SP but different 3rd party software?

Usually Performance Counter test returns result within several seconds (when Windows and all necessary services and DLLs work correctly) or returns error within 30 sec (if something doesn't work).
In your case servers do not respond at all (within RPC timeouts which is several minutes, I think). Why it happens? I don't know. May be havy loaded network? Havy loaded servers? Malfunctioning router? I cannot say, its your network.
What about other kind of tests, like Service, Processes, CPU Usage, UNC? How these tests work?

Regards
Alex
MSC
Posts: 11
Joined: Mon Nov 28, 2005 3:36 pm

More info...

Post by MSC »

Alex,

Both servers are running Windows 2000 Server, SP4. As for installed software, both are running Lotus Domino and MS SQL Server. As for available memory, Server A has 4 GB and Server B has 2 GB. The memory commit charge on Server A is 3.3 GB and on Server B is 2.4 GB. So, if anything, Server B is running low on memory.

Both servers are on the same subnet and the path over the network from the HM server to Server A and Server B is the same; packets travel through the same routers and switches. I therefore do not think that the network is the problem.

Both servers are identical hardware (so, the same network cards, drivers, etc). In addition to the Performance Counter tests, I am running 12 UNC tests, 2 NT Event Log tests and 1 ping test against each machine. All these tests are running fine against both Server A and Server B.

I also have one CPU Usage test for each server, which seems to fail against Server A without any explanation (Reply field stays blank). The CPU Usage test against Server B works without any problems.

Regards, /|/|arc.
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

So network works fine. NT Event test works fine, it means RPC calls are processed by remote system. CPU Usage test against SystemA fails, it means problem is not related to pdh.dll.

Looks like some problem with performance counters on SystemA. Could you check status of performance DLLs on SystemA? Use Microsoft Extensible Counter List utility, available at http://www.microsoft.com/downloads/deta ... laylang=en
Normally system with disabled DLLs should return error right away, but who knows, lets check this.

Also, check if Remote Registry Service is started on SystemA

Regards
Alex
MSC
Posts: 11
Joined: Mon Nov 28, 2005 3:36 pm

Post by MSC »

Alex,

I have downloaded and installed the tool on Server A. I started the program and every extensible counter listed had the tick "Performance Counter Enabled".
KS-Soft wrote:So network works fine. NT Event test works fine, it means RPC calls are processed by remote system. CPU Usage test against SystemA fails, it means problem is not related to pdh.dll.
The CPU Usage test returns no reply, whereas the Performance Counters are currently returning: "Error: Unable to connect to specified machine or machine is off line.||"

Do you have any other ideas?

Regards, /|/|arc.
ericm
Posts: 40
Joined: Tue Feb 10, 2004 6:29 am

Simular Issue

Post by ericm »

I am having a simular issue where I thought it was a network problem also but have found it it is not. MY problem is if I have \\servername in the test method and a test errors by "unable to connect to specified machine" I go in and put the ip address and the test will work. Or vise versa. If I have the ip address and the test fails I change the ip to the computer name and it works. BUt recently I have had no luck there. I have had to go into test,click on the address book icon, reselect the same computer, and reselect the test. I am not changing anything. The test is exactly the same as the original. SO I do not understand what is going on. It looks like maybe the file that stores the test methods is not being read correctly by HM or something and when you redo the settings the file gets rewrittten correctly again. (That is just a assumption) In my case it only happens on the memory tests. All other server tests regaurdless of being a performance counter or not work fine.
This is a everyday issue where I have to go in and reset these tests at least 2 to 3 times a day.
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

>MSC

Have you checked Remote Registry service?

Regards
Alex
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

>Ericm

Unfortunately Windows Performance Counters implementation is very unreliable. IMHO its the worst part of the system. We spent a lot of time triyng to find some workaround and often this "workaround" works. As you see HostMonitor offers 4 options for Performance Counter test:
- MultiThread mode;
- OneByOne mode;
- Smart mode;
- External mode
(see Misc page in the Options dialog).
But some systems just don't work regardless of anything. We can redisign HostMonitor but we cannot fix Windows :(

That's why now we are testing WMI, looks like it works stable. We want to implement WMI test method in nearest versions

Regards
Alex
MSC
Posts: 11
Joined: Mon Nov 28, 2005 3:36 pm

Post by MSC »

Alex,

The remote registry service on Server A is started and set to automatic startup. I also (previously) experienced a similar thing as ericm. I am running all my tests against the IP address. Whenever a test would become unknown, I would edit the test, reselect the same performance counter (I would get a message saying that a connection to the machine could not be established); then I would manually type the ip address again and after a while I was able to select the same performance counter and the test would work again. I noticed that this would clear up any issues with other unknown perormance counter tests as well.

This method no longer seems to work. The interesting thing is, that reselecting the performance counter now does not give me the message saying "unable to connect to remote machine".

I must add that this reselection behaviour worked for me in HostMonitor 3.78 and I believe in 5.38. I have since upgraded to 5.66.

>ericm: have you also upgraded?

Regards, /|/|arc.
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

Its not because you have updated HostMonitor. We have not changed anything in this test method for year. And I believe this problem has nothing with HostMonitor, its how Microsoft implemented Perf Counters.

Regards
Alex
MSC
Posts: 11
Joined: Mon Nov 28, 2005 3:36 pm

Post by MSC »

Alex,

From what I hear, we will have to wait for the WMI implementation to be available in HostMonitor, which hopefully is a more reliable way of testing than the Performance Counters.

When do you expect the WMI implementation to be available?

Regards, /|/|arc.
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

Probably January 2006

Regards
Alex
MSC
Posts: 11
Joined: Mon Nov 28, 2005 3:36 pm

Post by MSC »

Alex,

Excellent! I will be looking forward to it!

Regards, /|/|arc.
Post Reply