View previous topic :: View next topic |
Author |
Message |
losisoft
Joined: 21 Mar 2008 Posts: 43
|
Posted: Sat Aug 09, 2008 9:48 am Post subject: hostmonitor freezing |
|
|
Hi,
Our old hardware is expiring, so I started to move hostmon to a new server.
Old server was a 2x dual core AMD opteron 275 HP (385 G1)
new one is a 2xdual core AMD opteron 2218 HP (385 G2)
Operating system is windows server 2003 sp2 standard
Originally we only installed hostmon 7.18, but that was freezing.
We tried 7.42 beta, and now 7.50 and they are producing the same problem.
Hostmonitor is running as a service. There is one min. delay configured for testing so we don't get problems with the active rma agents.
That's just fine. But usually in 1-3 hours hostmon is freezing.
I notice that the RCC client is disconnecting, and I can not connect back.
If I try to stop the service - it's not possible. I need to kill it via process explorer. Then I can start it again and works fine for another 1-3 hour.
I tried to restart the server, it produce the same issue.
So far this is the times when I restarted it:
5:49, 9:37, 12:03, 12:51, 14:56, 17:04
There is nothing in the event log, there is no memory leak, hostmon is using only 60mb memory.
In the hostmon syslog there is no entry about freezing, just the entries when I start it.
In the RMA agent site I see only this in the log file:
Connection error
Any idea, how to debug, what is happening inside the program? Why is it freezing?
ps: Currently I try a bios upgrade, but I doubt, that it will help.
Thanks and regards,
Joe |
|
Back to top |
|
|
losisoft
Joined: 21 Mar 2008 Posts: 43
|
Posted: Mon Aug 11, 2008 2:18 am Post subject: |
|
|
Bios upgrade did not help. Currently we are restarting hostmon every 2 hour. |
|
Back to top |
|
|
KS-Soft
Joined: 03 Apr 2002 Posts: 12792 Location: USA
|
Posted: Mon Aug 11, 2008 5:38 am Post subject: |
|
|
- Do you use ODBC logging or ODBC test method? If yes, what ODBC driver do you use? Could you try to disable ODBC tests and ODBC logging?
- Do you have installed some antivirus monitors, personal firewall, content monitoring software? Non stanard winsock components?
- What other test methods do you use? Could you send your configuration files to support@ks-soft.net? We need HML file with tests + *.LST and *.INI files
Regards
Alex |
|
Back to top |
|
|
losisoft
Joined: 21 Mar 2008 Posts: 43
|
Posted: Mon Aug 11, 2008 6:35 am Post subject: |
|
|
> Do you use ODBC logging or ODBC test method? If yes, what ODBC driver do you use? Could you try to disable ODBC tests and ODBC logging?
Logging to ODBC - MS-SQL native client ver 2005.90.1399
Query via ODBC - Oracle 10.02.0.003
>- Do you have installed some antivirus monitors, personal firewall, content monitoring software? Non stanard winsock components?
Symantec antivirus.
>- What other test methods do you use? Could you send your configuration files to support@ks-soft.net? We need HML file with tests + *.LST and *.INI files
Sure
But the the above software components are the same as the original server.
-------------------------------------------------
We have narrowed the problem down to Active RMA - backup agents.
Since we switched off the backup agents on the active rma agent the system is stable. |
|
Back to top |
|
|
KS-Soft
Joined: 03 Apr 2002 Posts: 12792 Location: USA
|
Posted: Mon Aug 11, 2008 6:39 am Post subject: |
|
|
Quote: | We have narrowed the problem down to Active RMA - backup agents.
Since we switched off the backup agents on the active rma agent the system is stable |
H'm, we will check our code
Regards
Alex |
|
Back to top |
|
|
hsq
Joined: 28 Jun 2007 Posts: 15
|
Posted: Tue Aug 12, 2008 7:00 am Post subject: |
|
|
Hi Alex,
Following the thread...
Unfortunately once we removed the backup agents from the active RMAs, the system got freezed again. It happened later (after 6-7 hours instead of the previous 1-3) but still there is someting around the active agents I think.
Dont you have some kind of debug build what we can use which logs very verbosely somewhere about the system is actually doing to dig into the deep of this thing?
It would help a lot for us as we're slowly out of ideas with this.
Regards,
Gabor |
|
Back to top |
|
|
KS-Soft
Joined: 03 Apr 2002 Posts: 12792 Location: USA
|
Posted: Tue Aug 12, 2008 7:18 am Post subject: |
|
|
Don't see what exactly data software can log to help in such case Log everything, every data manipulation, every system call? Its impossible.
If we reproduce the problem then we will be able to fix it.
What version of the agent do you use? Can you use Passive RMA instead? Passive RMA works fine for years while HostMonitor uses relatively new code to work with Active RMA.
Also, could you try to disable ODBC logging and ODBC tests? Uninstall or disable antivirus monitor?
Can you check resource usage for each process? You may use standard Windows Task Manager to check Handles, GDI and USER objects. How many handles, objects are used by HostMonitor? What the total resource usage on the system?
Regards
Alex |
|
Back to top |
|
|
hsq
Joined: 28 Jun 2007 Posts: 15
|
Posted: Tue Aug 12, 2008 7:48 am Post subject: |
|
|
Hi Alex,
ActiveRMA version is 3.33 everywhere. In theory, we can go with passiveRMA it is not a problem. But. The question is which one puts more load on the Hostmonitor the active or the passive. I'm asking it network and CPU wise. We thought that the active will work more on its own and takes away some load from the hostmon. Isnt it true?
Switching off the ODBC is not possible as more than half of our tests are ODBC based and this is the basement of our system. However I dont think the ODBC is the problem since the old system is 99% the same and we not experiencing any issues there. The only 1% difference is that the new system using purely activeRMA while the old one is more relies on the passive ones.
Some technical info about hostmon:
Handles 544 (245 GDI /162 user)
Average context switching delta ~600 / sec
Memory ~60MB private (peak 61968k)
~180MB virtual
About the debug build...
It is clear that impossible to log EVERYTHING, but at least something which should be useful to catch the freeze more precisely. For example logging the RMA activities, ODBC calls (your favourite ) and we can compare more freezes. Are they happening in the same moment / event or is there any similarities between them. With the current logging level I cannot see anything and you cannot reproduce our systems & testing behavior easily so I feel it somehow a catch of 22 currently Correct me if I'm wrong.
Regards,
Gabor |
|
Back to top |
|
|
KS-Soft
Joined: 03 Apr 2002 Posts: 12792 Location: USA
|
Posted: Tue Aug 12, 2008 9:42 am Post subject: |
|
|
Quote: | The question is which one puts more load on the Hostmonitor the active or the passive. I'm asking it network and CPU wise. We thought that the active will work more on its own and takes away some load from the hostmon. Isnt it true? |
Well, CPU load and network traffic produced by HM<->PassiveRMA and HM<->ActiveRMA almost the same. Active RMA a little more efficient however basic principle is the same: HostMonitor encrypts and sends test parameters, RMA decrypts test settigns, perform test, encrypts and sends results to HostMonitor, HostMonitor decrypts results...
Quote: | Some technical info about hostmon:
Handles 544 (245 GDI /162 user)
Average context switching delta ~600 / sec
Memory ~60MB private (peak 61968k)
~180MB virtual |
Looks Ok
Quote: | It is clear that impossible to log EVERYTHING, but at least something which should be useful to catch the freeze more precisely. For example logging the RMA activities, ODBC calls (your favourite ) and we can compare more freezes. |
Usually such logging does not help. Most of problems/bugs caused by 2 basic reasons
- bug in 3rd party software, like ODBC driver or antivirus monitor.
- bug in our code, some elementary operations like wrong pointer, usage of some "already released" memory block, etc. Its impossible to log each operation. Instead software and Windows use exeption handlers that should provide information about the problem. Sometimes exception handler cannot be called..
BTW: Do you see any errors in system log (system log file specified on System Log page in the Options dialog)? Any error in NT Event Log?
Regards
Alex |
|
Back to top |
|
|
hsq
Joined: 28 Jun 2007 Posts: 15
|
Posted: Tue Aug 12, 2008 9:53 am Post subject: |
|
|
Nope, I see nothing suspicious.
In the syslog only some executed events (reports created, RMA connected, RCC session opened, etc...) not a word about any errors.
In the event log absolutely nothing but "Informational" regarding hostmonitor. (Service stopped / started / Paging lib inited, etc..)
Any other ideas then?
Regards,
Gabor |
|
Back to top |
|
|
KS-Soft
Joined: 03 Apr 2002 Posts: 12792 Location: USA
|
Posted: Tue Aug 12, 2008 10:54 am Post subject: |
|
|
I think we need to narrow the problem. Can you do the following
- replace Active RMA to Passive RMA
- uninstall antivirus
If this will not help, we can return to ODBC and other test methods. E.g. could you do the following
- copy entire HostMonitor folder (e.g. "c:\program files\hostmonitor -> c:\test\hostmonitor)
- start hostmonitor (copy) using "hostmon.exe /stop" command line parameter
- disable logging (Options dialog)
- disable actions (menu Monitoring -> Disable)
- load copied HML file (e.g. c:\test\hostmonitor\main.hml)
- disable ODBC tests
- start monitoring
As result you will have 2 instances of HostMonitor: production copy and testing copy that can be modified at any time without problems.
Regards
Alex |
|
Back to top |
|
|
mos-eisley
Joined: 21 Mar 2007 Posts: 76 Location: Klarup (AAlborg), Demark
|
Posted: Wed Aug 13, 2008 3:57 am Post subject: |
|
|
Just my few cents....
ODBC is the place to start, I am pretty sure of that - I have some experience with Oracle ODBC - but will keep my opinion to myself
My last attempt to use a version 10 ODBC driver became pretty nasty, so I am on a stable v9 driver and that works fine with v10 databases.
Så start looking into the ODBC... |
|
Back to top |
|
|
hsq
Joined: 28 Jun 2007 Posts: 15
|
Posted: Tue Aug 19, 2008 2:38 am Post subject: |
|
|
Alex,
We changed the test execution agents to passive everywhere, but the actives were still configured in RMA_MGR and connected to the Hostmonitor. We had another freeze 5 hours after the change.
I finally disabled activeRMA server completely in Hostmonitor, since then it runs perfectly without any freezes (uptime is now more than a day...)
So the problem is definitely around activeRMA as I suggested in the very beginning.
Regards,
Gabor |
|
Back to top |
|
|
losisoft
Joined: 21 Mar 2008 Posts: 43
|
Posted: Wed Aug 20, 2008 2:50 am Post subject: |
|
|
Hi Alex,
Unfortunately It was only a short victory. The system is freezing again, with no active agents. I have removed all agents. Using only passive at the moment.
Regards,
Jozsef |
|
Back to top |
|
|
KS-Soft
Joined: 03 Apr 2002 Posts: 12792 Location: USA
|
Posted: Wed Aug 20, 2008 7:52 am Post subject: |
|
|
We made some changes in the code in order to retrieve more information about "active rma" problem and started several copies of HostMonitor using 45 Active RMA for each HostMonitor. Unfortunately(?) all copies of HostMonitor still working fine...
So, we return to begining?
What about ODBC and antivirus monitor?
Thank you for your support and understanding. We cannot check all source code, its just too big. So we should narrow the problem somehow.
Regards
Alex |
|
Back to top |
|
|
|