Server crashes after a number of hostmonitor tests

All questions related to installations, configurations and maintenance of Advanced Host Monitor (including additional tools such as RMA for Windows, RMA Manager, Web Servie, RCC).
Post Reply
e.lalas
Posts: 2
Joined: Wed Apr 02, 2008 9:58 am

Server crashes after a number of hostmonitor tests

Post by e.lalas »

Hello to everyone.

First of all, congratulations for the great software! Here's a strange problem that we are facing:

We have a server (Compaq Proliant ML370 G3) that runs VMware ESX Server 3.0.1. That ESX server has 6 virtual machines - all of them are Windows 2003 Server R2 and have various purposes like file/print services, domain controller, imaging, application distribution etc. This server is located in a remote site, to which we (the main site) are connected through a 1 Mbps leased line.

In the main site we have installed HostMonitor 7.50, and it performs various tests to the above mentioned physical and virtual servers. Those are:
- 15 UNC tests (every minute)
- 7 ping tests (every minute)
- 1 URL request test (every 10 minutes)
- 1 service test (every 10 minutes)

The above test are performed from HostMonitor, and not from RMA.

The problem is that, for over a year now we were experiencing sudden reboots of the physical (compaq) server, without any visible errors. Those reboots were happening almost every day. We changed some hardware (eg memory) but with no luck. Until the time that we stopped HostMonitor performing the UNC tests. The problem diassapeared at once! We hadn't had any reboots the following weeks. In order to see if that was really the cause, we activated the UNC tests again, and the problem reappeared! As soon as we deactivated them again, we haven't seen any reboot since.

Is it possible that the unc tests cause the reboots of the server, as the above indications show ? Have you seen this situation before ? Do you suggest any action in order to workaround this problem ? For example, is it better to perform the tests through RMA ?

Thanks in advance.

Efthymios Lalas
KS-Soft Europe
Posts: 2832
Joined: Tue May 16, 2006 4:41 am
Contact:

Post by KS-Soft Europe »

It is pretty strange, because your HostMonitor configuration does not look so heavy. :roll:

I think, the problem appears due to TCP/IP connection settings.
Could you read the following post, please? You may play with some TCP/IP settings, especially with TcpNumConnection, MaxUserPort and TcpTimedWaitDelay.
http://www.ks-soft.net/cgi-bin/phpBB/vi ... php?t=4521

Can you check resource usage for the system? You may use standard Windows Task Manager to check Handles, GDI and USER objects. What is the total resource usage on your system? How many Handles, GDI and USER objects are used by hostmon.exe process? What exact process does use the most of resources?

Do you see any error messages in Event Viewer (Start > Settings > Control Panel > Administrative Tools > Event Viewer applet)?

Do you see any error messages in System Log (file is specified in menu "Options" -> "System Log"). You may access the System Log using menu "View" - > "System Log".


Regards,
Max
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

Could you please check your action profiles? May be you have assgined profile with "Reboot local machine" or "Reboot remote machine" action to some of these UNC tests?

Also I recall similar problem caused by old Novell Netware network client. What network clients do you have installed? May be some VMWare network software leads to problem :roll:

Regards
Alex
e.lalas
Posts: 2
Joined: Wed Apr 02, 2008 9:58 am

Post by e.lalas »

Max,Alex thanks for your answers!

The machine that reboots is the remote ESX server, the one that is being monitored. We haven't got any "reboot local machine" as action profile. But the fact is that we do have a netware client installed on one machine of the ESX server that reboots. Version is "netware client for Windows 4.91 SP2". I will try to narrow down the problem to this machine, to see if the netware client causes it.

I will also try to monitor the resource usage, that Max mentions. I will get back to you as soon as I have some results.

Thanks again!
User avatar
greyhat64
Posts: 246
Joined: Fri Mar 14, 2008 9:10 am
Location: USA

Post by greyhat64 »

Regarding the potential Novell problem - look into the Advanced Settings (Novell Client properties). In particular is the UNC Path Filter, which
When enabled, UNC path queries sent to the Microsoft Redirector will first be filtered by the Novell Client to see if the server name is known by the Novell Client. If it is known, then a name resolve will not be attempted by the Microsoft Redirector. If the server name is not known, then the usual name resolution process will occur.
You might want to turn this off first and see if that makes a difference.
Also related, you might try modifying the Protocol Preference to eliminate unnecessary protocols and set the default protocol to IP instead of IPX (if it's set that way). I've had problems in the past with those settings, typically performance related, but you never know. :wink:

Good luck and let us know what you find out!
Post Reply