hostmonitor service hangs randomly every 5 to 7 days.

All questions related to installations, configurations and maintenance of Advanced Host Monitor (including additional tools such as RMA for Windows, RMA Manager, Web Servie, RCC).
Post Reply
zendesigner
Posts: 40
Joined: Thu Mar 03, 2005 3:03 am

hostmonitor service hangs randomly every 5 to 7 days.

Post by zendesigner »

The ks advanced service quits every few days randomly with following description


The KS Advanced Host Monitor service terminated unexpectedly. It has done this 1 time(s).

with event id: 7034


Both my primary and my backup server (is not testing, only testing the primary is there) both exhibit this error every 5 to 7 days.

The result is either:
that the service still is started but that the webservice says it's busy nad can't connect to it.
Or
that the complete server crashes with memeory errors.

Any help is appreciated :)
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

We know severeal reasons:
1) HostMonitor is running on Windows 2003 and you are using SNMP Get or Traffic Monitor test method. Error is caused by mgmtapi.dll, Microsoft offer patch: http://www.ks-soft.net/cgi-bin/phpBB/vi ... php?t=1301
2) You have installed Norton Antivirus or McAfee monitoring module. There is some error in these monitors that leads application to crahs. Scanner modules works fine
3) If you are using ODBC Logging or ODBC Query test, some ODBC driver may crash application.

What version of HostMonitor do you have installed?
Windows version?
Service Pack?

Regards
Alex
zendesigner
Posts: 40
Joined: Thu Mar 03, 2005 3:03 am

Post by zendesigner »

KS-Soft wrote:We know severeal reasons:
1) HostMonitor is running on Windows 2003 and you are using SNMP Get or Traffic Monitor test method. Error is caused by mgmtapi.dll, Microsoft offer patch: http://www.ks-soft.net/cgi-bin/phpBB/vi ... php?t=1301
2) You have installed Norton Antivirus or McAfee monitoring module. There is some error in these monitors that leads application to crahs. Scanner modules works fine
3) If you are using ODBC Logging or ODBC Query test, some ODBC driver may crash application.

What version of HostMonitor do you have installed?
Windows version?
Service Pack?

Regards
Alex
Hostmonitor 5.30
Windows 2003 , no service pack
mccafee v8.0 with on acces scanning and regular scans.

hostmonitor only reports to the event log no logging or other actions are taking place.

It seems like a memory leak as hostmonitor starts to raise alerts due to itself becoming more and more clogged untill the whole server completely locks up. Even access with terminal services or on the console is then useless. the server needs to be cold booted then.
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

Probably we are talking about the same problem in near by topic http://www.ks-soft.net/cgi-bin/phpBB/vi ... php?t=2205
Could you check how many Handles, GDI and User objects is used by HostMonitor and other applications. You may use standard Windows Task Manager

Regards
Alex
zendesigner
Posts: 40
Joined: Thu Mar 03, 2005 3:03 am

Post by zendesigner »

thanks alex , i'll keep monitoring regularly untill they fail again. currently i have 4 hostmonitors in different sites installed with the same kit so if both primaries should fail identical i have more chance of troubleshooting.
zendesigner
Posts: 40
Joined: Thu Mar 03, 2005 3:03 am

Post by zendesigner »

failed again this weekend. i don't have a report on gi's and handles but whilest running they look ok. for hostmon.exe process

Handles between 200-300
gdi object = 0
threads between 10-100

memory = 322.200 kb on a server than runs for a week now.
memory = 26.000 kb on a server rebooted 4 hours ago.
memory = 12.000 kb on a server just rebooted now.
it appears to go up slowly suggesting a memory leak.

These are the event logs when it crashed

Event Type: Error
Event Source: hostmon.exe
Event Category: None
Event ID: 0
Date: 26/08/2005
Time: 00:39:36
User: N/A
Computer: SCTW0046
Description:
The description for Event ID ( 0 ) in Source ( hostmon.exe ) cannot be found. The local computer may not have the necessary registry information or message DLL files to display messages from a remote computer. You may be able to use the /AUXSOURCE= flag to retrieve this description; see Help and Support for details. The following information is part of the event: List index out of bounds (19).


then

Event Type: Error
Event Source: hostmon.exe
Event Category: None
Event ID: 0
Date: 26/08/2005
Time: 00:39:41
User: N/A
Computer: SCTW0046
Description:
The description for Event ID ( 0 ) in Source ( hostmon.exe ) cannot be found. The local computer may not have the necessary registry information or message DLL files to display messages from a remote computer. You may be able to use the /AUXSOURCE= flag to retrieve this description; see Help and Support for details. The following information is part of the event: Access violation at address 004022EC in module 'hostmon.exe'. Write of address 00000000.

then

Event Type: Warning
Event Source: Userenv
Event Category: None
Event ID: 1524
Date: 26/08/2005
Time: 00:40:06
User: TECHNP\IPHM001X
Computer: SCTW0046
Description:
Windows cannot unload your classes registry file - it is still in use by other applications or services. The file will be unloaded when it is no longer in use.



then

Event Type: Warning
Event Source: Userenv
Event Category: None
Event ID: 1517
Date: 26/08/2005
Time: 00:40:07
User: NT AUTHORITY\SYSTEM
Computer: SCTW0046
Description:
Windows saved user TECHNP\IPHM001X registry while an application or service was still using the registry during log off. The memory used by the user's registry has not been freed. The registry will be unloaded when it is no longer in use.

This is often caused by services running as a user account, try configuring the services to run in either the LocalService or NetworkService account.

For more information, see Help and Support Center at http://go.microsoft.com/fwlink/events.asp.
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

Handles between 200-300
gdi object = 0
threads between 10-100
Handles and Threads - Ok. But 0 GDI??!! Its impossible. May be you looked at some other process?
memory = 322.200 kb on a server than runs for a week now.
memory = 26.000 kb on a server rebooted 4 hours ago.
memory = 12.000 kb on a server just rebooted now.
it appears to go up slowly suggesting a memory leak.
May be, may be not. You have checked Mem Usage counter, right?
Quote from Microsoft:
\Working Set is the current number of bytes in the Working Set of this process. The Working Set is the set of memory pages touched recently by the threads in the process. If free memory in the computer is above a threshold, pages are left in the Working Set of a process even if they are not in use. When free memory falls below a threshold, pages are trimmed from Working Sets. If they are needed they will then be soft-faulted back into the Working Set before they leave main memory.
It means HostMonitor could release memory but Windows did not release pages because system has enough free memory. It will be released when some other application requests memory...
Event Source: hostmon.exe
Event Category: None
List index out of bounds (19).
It looks like some error in HostMonitor. But where? This error appeared at 00:39:36, next error time 00:39:41. And HostMonitor crashed right after that? Nobody worked (locally or remotely) with HostMonitor at that time?
Access violation at address 004022EC in module 'hostmon.exe'. Write of address 00000000.
Looks like error happened when HostMonitor tried to release some resource. Probably this error was caused by 1st error (above)...
Could you upgrade to version 5.38?

Regards
Alex
zendesigner
Posts: 40
Joined: Thu Mar 03, 2005 3:03 am

Post by zendesigner »

I'll try to upgrade to 5.38 and will get back to you after that. For now my kit is going into production anyway so i have to make a new release.

Bart
zendesigner
Posts: 40
Joined: Thu Mar 03, 2005 3:03 am

Post by zendesigner »

Handles and Threads - Ok. But 0 GDI??!! Its impossible. May be you looked at some other process?
No it's because hostmonitor is running as a service under a technical account . so no screen display
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

Right.
I just wonder why Windows allocates handles when service running under local system account with "Allow service to interact with desktop" option disabled...

Regards
Alex
zendesigner
Posts: 40
Joined: Thu Mar 03, 2005 3:03 am

Post by zendesigner »

Hi alex,

I have one test server running now with HM5.38 which unfortunatly shows the same problems.

I'm doing performance monitoring on all of my servers now. if you want i can send them to you.

The service runs under a technical account from another domain (same forest) it's not a system account. the webservice runs under the local system account.

It just slowly accumulates memory untill the server locks up. then or you get a blue screen, or the service just quits without event notification.

I'll try your 5.6 beta later.
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

I'll try your 5.6 beta later.
I am afraid it will not help.
Could you send your configuration files to support@ks-soft.net? We need HML file with tests, all *.LST and *.INI files.
If we are lucky, we reproduce such problem on our system. Yes, I know we will not be able to access your systems but we still have some chance...

Regards
Alex
zendesigner
Posts: 40
Joined: Thu Mar 03, 2005 3:03 am

Post by zendesigner »

mail is on it's way :wink:
Post Reply