Bogus Error Messages?

sadek76 · Post by **sadek76** » Wed Aug 03, 2005 11:02 am

I have two machines running HM 5.38, one running Server 2000 SP4, 2-450MHz CPU's and 512mb phy. mem. running about 2000 tests. The other running XP SP2 Professional, dual 2.9 GHz cpu and 1gig phy. mem. running 35 tests.

The one running the 35 tests had no problems with any of the test all night. The one running the 2000 tests including the exact same 35 tests from the other machine reported that these test fail every few minutes.

The Last Replay is showing "An unexpected network error occurred" or "Timed out". The reply time on the faster machine is around 1000ms, the replay time on the slower more loaded machine is around 3000000ms. These are UNC tests on both machines.

I put a sniffer on the slower machine to try and catch these error messages. None of them showed up in my capture files.

The slower machine is scheduled to be upgraded to faster hardware soon, but my question is are these messages generated from the HM application and why does it show up only on a few tests?

KS-Soft · Post by **KS-Soft** » Wed Aug 03, 2005 12:07 pm

but my question is are these messages generated from the HM application

How it works:
1) HostMonitor send request to Windows API: retrieve information about UNC resource
2) Windows sends request to network client
3) Network client tries to establish communication with remote host and returns information about resource or error CODE (e.g. Timeout error - this case is interesting to us)
4) Windows returns error CODE to HostMonitor
5) HostMonitor request Windows to show description of that error code and display test result and error description.

It means sniffer never ever can show you "Timeout" text error. Error happens when there are NO packets received.
In case of some other errors (e.g. authentication error) sniffer will show some packets but usually you will not see any text there (it depends on network protocol).

Regards
Alex

sadek76 · Post by **sadek76** » Wed Aug 03, 2005 1:30 pm

Another question. I can complete a UNC mount from the cmd prompt in a few seconds, but the HM application takes almost 10 minutes (1010610ms), why is this?

And where does the message: "The specified network name is no longer available." come from? I use the IP address on all of my UNC tests.

KS-Soft · Post by **KS-Soft** » Wed Aug 03, 2005 2:53 pm

And where does the message: "The specified network name is no longer available." come from?

I think Windows reads this message from some resource file.
If you are asking about error code, I think its returned by remote system.

Another question. I can complete a UNC mount from the cmd prompt in a few seconds, but the HM application takes almost 10 minutes (1010610ms), why is this?

May be HostMonitor performs many requests at the same time and network client cannot process many requests correctly...
Try to switch "UNC test mode" to "OnePerServer" or "OneByOne". Option located on Misc page in the Options dialog.

Regards
Alex

sadek76 · Post by **sadek76** » Thu Aug 11, 2005 2:25 pm

I upgraded my machine to a Dell Power Edge 1850 running 4-3.0GHz CPU's, 3Gigs of memory on 2003 server. HM ver 5.38. Kind of frequently I receive either one of two error messages from a UNC test: "The specified network name is no longer available" or "Logon failure: unknown user name or bad password" The device that HM is checking is a filer running Data Ontap 6.4.4P7. This filer has a share setup on it called hostmon$. In my UNC test I have the UNC box set to \\165.168.25.217\hostmon$, the Connect as box checked with my domainname\username and a password in the password box. Again I have a sniffer setup on the LAN and I don't see either error message coming back from the filer.
I also have the UNC tests under Misc set to Normal. I'm running over 1500 tests and running OnePerServer or OneByOne did not cut it, also the UNC test retries is set to 1.

For some strange reason out of the 1500 tests, I only receive these error messages for four tests and they just started about a month ago. Any ideas?

One other thing, I setup a batch job that runs once a minute. It's command line is:
net use \\165.168.25.217\hostmon$ /user:domainname\username password
sleep 2
net use \\165.168.25.217\hostmon$ /del
The results are sent to a log file. These tests never fail, even when the HM test fails.
Is this the same command that HM does on it's UNC test?

KS-Soft · Post by **KS-Soft** » Thu Aug 11, 2005 2:51 pm

Again I have a sniffer setup on the LAN and I don't see either error message coming back from the filer.

Please read my previous posts. Quote
AK>>It means sniffer never ever can show you "Timeout" text error. Error happens when there are NO packets received.
AK>>In case of some other errors (e.g. authentication error) sniffer will show some packets but usually you will not see any TEXT there because network client returns error CODE, not a text.

I also have the UNC tests under Misc set to Normal. I'm running over 1500 tests and running OnePerServer or OneByOne did not cut it, also the UNC test retries is set to 1.

Yes, OneByOne cannot be useful in your case. But probably OnePerServer is appropriate setting, unless you have 100 UNC tests for one server.

net use \\165.168.25.217\hostmon$ /del
The results are sent to a log file. These tests never fail, even when the HM test fails.
Is this the same command that HM does on it's UNC test?

HostMonitor does not use command line utilities, it sends requests to network client using Windows API.
Try to use OnePerServer option

As I understand you experience the problem with single device. In this case there is another solution:
- move tests that check problem device into separate folder
- select this folder, click "Properties" button and mark "Non-simultaneously test execution" on "Specials" page.

Regards
Alex

sadek76 · Post by **sadek76** » Thu Aug 11, 2005 3:26 pm

I changed UNC setting to OnePerServer and created a seperate folder with the problem tests (8 total, 2 for each server). I also set to "Non-simultaneously test execution" on "Specials" page. I still see the failures on my slower server, but do not see them on my new faster server. I will let run over night.

KS-Soft · Post by **KS-Soft** » Thu Aug 11, 2005 4:09 pm

If you set "Non-simultaneously test execution" option, you don't need to use OnePerServer option

Regards
Alex

sadek76 · Post by **sadek76** » Thu Aug 11, 2005 4:12 pm

OK, set UNC tests back to Normal

sadek76 · Post by **sadek76** » Wed Aug 31, 2005 9:43 am

This change fixed the problem. I have now started to break my main folder up into smaller ones to prevent this from happing again. Sorry for the late reply and Thanks for your help.

KS-Soft · Post by **KS-Soft** » Wed Aug 31, 2005 11:45 am

So my assumtion is right - some bug in network client.
Thank you for feedback

Regards
Alex