hostmonitor freezing

All questions related to installations, configurations and maintenance of Advanced Host Monitor (including additional tools such as RMA for Windows, RMA Manager, Web Servie, RCC).
losisoft
Posts: 43
Joined: Fri Mar 21, 2008 4:02 am

hostmonitor freezing

Post by losisoft »

Hi,

Our old hardware is expiring, so I started to move hostmon to a new server.
Old server was a 2x dual core AMD opteron 275 HP (385 G1)
new one is a 2xdual core AMD opteron 2218 HP (385 G2)

Operating system is windows server 2003 sp2 standard

Originally we only installed hostmon 7.18, but that was freezing.
We tried 7.42 beta, and now 7.50 and they are producing the same problem.

Hostmonitor is running as a service. There is one min. delay configured for testing so we don't get problems with the active rma agents.
That's just fine. But usually in 1-3 hours hostmon is freezing.
I notice that the RCC client is disconnecting, and I can not connect back.

If I try to stop the service - it's not possible. I need to kill it via process explorer. Then I can start it again and works fine for another 1-3 hour.

I tried to restart the server, it produce the same issue.

So far this is the times when I restarted it:
5:49, 9:37, 12:03, 12:51, 14:56, 17:04

There is nothing in the event log, there is no memory leak, hostmon is using only 60mb memory.

In the hostmon syslog there is no entry about freezing, just the entries when I start it.

In the RMA agent site I see only this in the log file:
Connection error


Any idea, how to debug, what is happening inside the program? Why is it freezing?

ps: Currently I try a bios upgrade, but I doubt, that it will help.

Thanks and regards,
Joe
losisoft
Posts: 43
Joined: Fri Mar 21, 2008 4:02 am

Post by losisoft »

Bios upgrade did not help. Currently we are restarting hostmon every 2 hour.
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

- Do you use ODBC logging or ODBC test method? If yes, what ODBC driver do you use? Could you try to disable ODBC tests and ODBC logging?
- Do you have installed some antivirus monitors, personal firewall, content monitoring software? Non stanard winsock components?
- What other test methods do you use? Could you send your configuration files to support@ks-soft.net? We need HML file with tests + *.LST and *.INI files

Regards
Alex
losisoft
Posts: 43
Joined: Fri Mar 21, 2008 4:02 am

Post by losisoft »

> Do you use ODBC logging or ODBC test method? If yes, what ODBC driver do you use? Could you try to disable ODBC tests and ODBC logging?
Logging to ODBC - MS-SQL native client ver 2005.90.1399
Query via ODBC - Oracle 10.02.0.003

>- Do you have installed some antivirus monitors, personal firewall, content monitoring software? Non stanard winsock components?
Symantec antivirus.

>- What other test methods do you use? Could you send your configuration files to support@ks-soft.net? We need HML file with tests + *.LST and *.INI files
Sure

But the the above software components are the same as the original server.
-------------------------------------------------

We have narrowed the problem down to Active RMA - backup agents.
Since we switched off the backup agents on the active rma agent the system is stable.
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

We have narrowed the problem down to Active RMA - backup agents.
Since we switched off the backup agents on the active rma agent the system is stable
H'm, :roll: we will check our code

Regards
Alex
hsq
Posts: 15
Joined: Thu Jun 28, 2007 8:23 am

Post by hsq »

Hi Alex,

Following the thread...

Unfortunately once we removed the backup agents from the active RMAs, the system got freezed again. It happened later (after 6-7 hours instead of the previous 1-3) but still there is someting around the active agents I think.

Dont you have some kind of debug build what we can use which logs very verbosely somewhere about the system is actually doing to dig into the deep of this thing?

It would help a lot for us as we're slowly out of ideas with this.

Regards,
Gabor
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

Don't see what exactly data software can log to help in such case :( Log everything, every data manipulation, every system call? Its impossible.
If we reproduce the problem then we will be able to fix it.

What version of the agent do you use? Can you use Passive RMA instead? Passive RMA works fine for years while HostMonitor uses relatively new code to work with Active RMA.

Also, could you try to disable ODBC logging and ODBC tests? Uninstall or disable antivirus monitor?

Can you check resource usage for each process? You may use standard Windows Task Manager to check Handles, GDI and USER objects. How many handles, objects are used by HostMonitor? What the total resource usage on the system?

Regards
Alex
hsq
Posts: 15
Joined: Thu Jun 28, 2007 8:23 am

Post by hsq »

Hi Alex,

ActiveRMA version is 3.33 everywhere. In theory, we can go with passiveRMA it is not a problem. But. The question is which one puts more load on the Hostmonitor the active or the passive. I'm asking it network and CPU wise. We thought that the active will work more on its own and takes away some load from the hostmon. Isnt it true?

Switching off the ODBC is not possible as more than half of our tests are ODBC based and this is the basement of our system. However I dont think the ODBC is the problem since the old system is 99% the same and we not experiencing any issues there. The only 1% difference is that the new system using purely activeRMA while the old one is more relies on the passive ones.

Some technical info about hostmon:
Handles 544 (245 GDI /162 user)
Average context switching delta ~600 / sec
Memory ~60MB private (peak 61968k)
~180MB virtual

About the debug build...

It is clear that impossible to log EVERYTHING, but at least something which should be useful to catch the freeze more precisely. For example logging the RMA activities, ODBC calls (your favourite :)) and we can compare more freezes. Are they happening in the same moment / event or is there any similarities between them. With the current logging level I cannot see anything and you cannot reproduce our systems & testing behavior easily so I feel it somehow a catch of 22 currently :) Correct me if I'm wrong.

Regards,
Gabor
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

The question is which one puts more load on the Hostmonitor the active or the passive. I'm asking it network and CPU wise. We thought that the active will work more on its own and takes away some load from the hostmon. Isnt it true?
Well, CPU load and network traffic produced by HM<->PassiveRMA and HM<->ActiveRMA almost the same. Active RMA a little more efficient however basic principle is the same: HostMonitor encrypts and sends test parameters, RMA decrypts test settigns, perform test, encrypts and sends results to HostMonitor, HostMonitor decrypts results...
Some technical info about hostmon:
Handles 544 (245 GDI /162 user)
Average context switching delta ~600 / sec
Memory ~60MB private (peak 61968k)
~180MB virtual
Looks Ok
It is clear that impossible to log EVERYTHING, but at least something which should be useful to catch the freeze more precisely. For example logging the RMA activities, ODBC calls (your favourite ) and we can compare more freezes.
Usually such logging does not help. Most of problems/bugs caused by 2 basic reasons
- bug in 3rd party software, like ODBC driver or antivirus monitor.
- bug in our code, some elementary operations like wrong pointer, usage of some "already released" memory block, etc. Its impossible to log each operation. Instead software and Windows use exeption handlers that should provide information about the problem. Sometimes exception handler cannot be called..
BTW: Do you see any errors in system log (system log file specified on System Log page in the Options dialog)? Any error in NT Event Log?

Regards
Alex
hsq
Posts: 15
Joined: Thu Jun 28, 2007 8:23 am

Post by hsq »

Nope, I see nothing suspicious.

In the syslog only some executed events (reports created, RMA connected, RCC session opened, etc...) not a word about any errors.

In the event log absolutely nothing but "Informational" regarding hostmonitor. (Service stopped / started / Paging lib inited, etc..)

Any other ideas then?

Regards,
Gabor
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

I think we need to narrow the problem. Can you do the following
- replace Active RMA to Passive RMA
- uninstall antivirus

If this will not help, we can return to ODBC and other test methods. E.g. could you do the following
- copy entire HostMonitor folder (e.g. "c:\program files\hostmonitor -> c:\test\hostmonitor)
- start hostmonitor (copy) using "hostmon.exe /stop" command line parameter
- disable logging (Options dialog)
- disable actions (menu Monitoring -> Disable)
- load copied HML file (e.g. c:\test\hostmonitor\main.hml)
- disable ODBC tests
- start monitoring
As result you will have 2 instances of HostMonitor: production copy and testing copy that can be modified at any time without problems.

Regards
Alex
mos-eisley
Posts: 76
Joined: Wed Mar 21, 2007 5:51 am
Location: Klarup (AAlborg), Demark

Post by mos-eisley »

Just my few cents....

ODBC is the place to start, I am pretty sure of that - I have some experience with Oracle ODBC - but will keep my opinion to myself :o

My last attempt to use a version 10 ODBC driver became pretty nasty, so I am on a stable v9 driver and that works fine with v10 databases.

Så start looking into the ODBC...
hsq
Posts: 15
Joined: Thu Jun 28, 2007 8:23 am

Post by hsq »

Alex,

We changed the test execution agents to passive everywhere, but the actives were still configured in RMA_MGR and connected to the Hostmonitor. We had another freeze 5 hours after the change.

I finally disabled activeRMA server completely in Hostmonitor, since then it runs perfectly without any freezes (uptime is now more than a day...)

So the problem is definitely around activeRMA as I suggested in the very beginning.

Regards,
Gabor
losisoft
Posts: 43
Joined: Fri Mar 21, 2008 4:02 am

Post by losisoft »

Hi Alex,

Unfortunately It was only a short victory. The system is freezing again, with no active agents. I have removed all agents. Using only passive at the moment.

Regards,
Jozsef
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

:-?
We made some changes in the code in order to retrieve more information about "active rma" problem and started several copies of HostMonitor using 45 Active RMA for each HostMonitor. Unfortunately(?) all copies of HostMonitor still working fine...

So, we return to begining? :(
What about ODBC and antivirus monitor?

Thank you for your support and understanding. We cannot check all source code, its just too big. So we should narrow the problem somehow.

Regards
Alex
Post Reply