KS-Soft. Network Management Solutions
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister    ProfileProfile    Log inLog in 

hostmonitor freezing
Goto page 1, 2  Next
 
Post new topic   Reply to topic    KS-Soft Forum Index -> Configuration, Maintenance, Troubleshooting
View previous topic :: View next topic  
Author Message
losisoft



Joined: 21 Mar 2008
Posts: 43

PostPosted: Sat Aug 09, 2008 9:48 am    Post subject: hostmonitor freezing Reply with quote

Hi,

Our old hardware is expiring, so I started to move hostmon to a new server.
Old server was a 2x dual core AMD opteron 275 HP (385 G1)
new one is a 2xdual core AMD opteron 2218 HP (385 G2)

Operating system is windows server 2003 sp2 standard

Originally we only installed hostmon 7.18, but that was freezing.
We tried 7.42 beta, and now 7.50 and they are producing the same problem.

Hostmonitor is running as a service. There is one min. delay configured for testing so we don't get problems with the active rma agents.
That's just fine. But usually in 1-3 hours hostmon is freezing.
I notice that the RCC client is disconnecting, and I can not connect back.

If I try to stop the service - it's not possible. I need to kill it via process explorer. Then I can start it again and works fine for another 1-3 hour.

I tried to restart the server, it produce the same issue.

So far this is the times when I restarted it:
5:49, 9:37, 12:03, 12:51, 14:56, 17:04

There is nothing in the event log, there is no memory leak, hostmon is using only 60mb memory.

In the hostmon syslog there is no entry about freezing, just the entries when I start it.

In the RMA agent site I see only this in the log file:
Connection error


Any idea, how to debug, what is happening inside the program? Why is it freezing?

ps: Currently I try a bios upgrade, but I doubt, that it will help.

Thanks and regards,
Joe
Back to top
View user's profile Send private message
losisoft



Joined: 21 Mar 2008
Posts: 43

PostPosted: Mon Aug 11, 2008 2:18 am    Post subject: Reply with quote

Bios upgrade did not help. Currently we are restarting hostmon every 2 hour.
Back to top
View user's profile Send private message
KS-Soft



Joined: 03 Apr 2002
Posts: 12792
Location: USA

PostPosted: Mon Aug 11, 2008 5:38 am    Post subject: Reply with quote

- Do you use ODBC logging or ODBC test method? If yes, what ODBC driver do you use? Could you try to disable ODBC tests and ODBC logging?
- Do you have installed some antivirus monitors, personal firewall, content monitoring software? Non stanard winsock components?
- What other test methods do you use? Could you send your configuration files to support@ks-soft.net? We need HML file with tests + *.LST and *.INI files

Regards
Alex
Back to top
View user's profile Send private message Visit poster's website
losisoft



Joined: 21 Mar 2008
Posts: 43

PostPosted: Mon Aug 11, 2008 6:35 am    Post subject: Reply with quote

> Do you use ODBC logging or ODBC test method? If yes, what ODBC driver do you use? Could you try to disable ODBC tests and ODBC logging?
Logging to ODBC - MS-SQL native client ver 2005.90.1399
Query via ODBC - Oracle 10.02.0.003

>- Do you have installed some antivirus monitors, personal firewall, content monitoring software? Non stanard winsock components?
Symantec antivirus.

>- What other test methods do you use? Could you send your configuration files to support@ks-soft.net? We need HML file with tests + *.LST and *.INI files
Sure

But the the above software components are the same as the original server.
-------------------------------------------------

We have narrowed the problem down to Active RMA - backup agents.
Since we switched off the backup agents on the active rma agent the system is stable.
Back to top
View user's profile Send private message
KS-Soft



Joined: 03 Apr 2002
Posts: 12792
Location: USA

PostPosted: Mon Aug 11, 2008 6:39 am    Post subject: Reply with quote

Quote:
We have narrowed the problem down to Active RMA - backup agents.
Since we switched off the backup agents on the active rma agent the system is stable

H'm, we will check our code

Regards
Alex
Back to top
View user's profile Send private message Visit poster's website
hsq



Joined: 28 Jun 2007
Posts: 15

PostPosted: Tue Aug 12, 2008 7:00 am    Post subject: Reply with quote

Hi Alex,

Following the thread...

Unfortunately once we removed the backup agents from the active RMAs, the system got freezed again. It happened later (after 6-7 hours instead of the previous 1-3) but still there is someting around the active agents I think.

Dont you have some kind of debug build what we can use which logs very verbosely somewhere about the system is actually doing to dig into the deep of this thing?

It would help a lot for us as we're slowly out of ideas with this.

Regards,
Gabor
Back to top
View user's profile Send private message
KS-Soft



Joined: 03 Apr 2002
Posts: 12792
Location: USA

PostPosted: Tue Aug 12, 2008 7:18 am    Post subject: Reply with quote

Don't see what exactly data software can log to help in such case Log everything, every data manipulation, every system call? Its impossible.
If we reproduce the problem then we will be able to fix it.

What version of the agent do you use? Can you use Passive RMA instead? Passive RMA works fine for years while HostMonitor uses relatively new code to work with Active RMA.

Also, could you try to disable ODBC logging and ODBC tests? Uninstall or disable antivirus monitor?

Can you check resource usage for each process? You may use standard Windows Task Manager to check Handles, GDI and USER objects. How many handles, objects are used by HostMonitor? What the total resource usage on the system?

Regards
Alex
Back to top
View user's profile Send private message Visit poster's website
hsq



Joined: 28 Jun 2007
Posts: 15

PostPosted: Tue Aug 12, 2008 7:48 am    Post subject: Reply with quote

Hi Alex,

ActiveRMA version is 3.33 everywhere. In theory, we can go with passiveRMA it is not a problem. But. The question is which one puts more load on the Hostmonitor the active or the passive. I'm asking it network and CPU wise. We thought that the active will work more on its own and takes away some load from the hostmon. Isnt it true?

Switching off the ODBC is not possible as more than half of our tests are ODBC based and this is the basement of our system. However I dont think the ODBC is the problem since the old system is 99% the same and we not experiencing any issues there. The only 1% difference is that the new system using purely activeRMA while the old one is more relies on the passive ones.

Some technical info about hostmon:
Handles 544 (245 GDI /162 user)
Average context switching delta ~600 / sec
Memory ~60MB private (peak 61968k)
~180MB virtual

About the debug build...

It is clear that impossible to log EVERYTHING, but at least something which should be useful to catch the freeze more precisely. For example logging the RMA activities, ODBC calls (your favourite ) and we can compare more freezes. Are they happening in the same moment / event or is there any similarities between them. With the current logging level I cannot see anything and you cannot reproduce our systems & testing behavior easily so I feel it somehow a catch of 22 currently Correct me if I'm wrong.

Regards,
Gabor
Back to top
View user's profile Send private message
KS-Soft



Joined: 03 Apr 2002
Posts: 12792
Location: USA

PostPosted: Tue Aug 12, 2008 9:42 am    Post subject: Reply with quote

Quote:
The question is which one puts more load on the Hostmonitor the active or the passive. I'm asking it network and CPU wise. We thought that the active will work more on its own and takes away some load from the hostmon. Isnt it true?

Well, CPU load and network traffic produced by HM<->PassiveRMA and HM<->ActiveRMA almost the same. Active RMA a little more efficient however basic principle is the same: HostMonitor encrypts and sends test parameters, RMA decrypts test settigns, perform test, encrypts and sends results to HostMonitor, HostMonitor decrypts results...

Quote:
Some technical info about hostmon:
Handles 544 (245 GDI /162 user)
Average context switching delta ~600 / sec
Memory ~60MB private (peak 61968k)
~180MB virtual

Looks Ok

Quote:
It is clear that impossible to log EVERYTHING, but at least something which should be useful to catch the freeze more precisely. For example logging the RMA activities, ODBC calls (your favourite ) and we can compare more freezes.

Usually such logging does not help. Most of problems/bugs caused by 2 basic reasons
- bug in 3rd party software, like ODBC driver or antivirus monitor.
- bug in our code, some elementary operations like wrong pointer, usage of some "already released" memory block, etc. Its impossible to log each operation. Instead software and Windows use exeption handlers that should provide information about the problem. Sometimes exception handler cannot be called..
BTW: Do you see any errors in system log (system log file specified on System Log page in the Options dialog)? Any error in NT Event Log?

Regards
Alex
Back to top
View user's profile Send private message Visit poster's website
hsq



Joined: 28 Jun 2007
Posts: 15

PostPosted: Tue Aug 12, 2008 9:53 am    Post subject: Reply with quote

Nope, I see nothing suspicious.

In the syslog only some executed events (reports created, RMA connected, RCC session opened, etc...) not a word about any errors.

In the event log absolutely nothing but "Informational" regarding hostmonitor. (Service stopped / started / Paging lib inited, etc..)

Any other ideas then?

Regards,
Gabor
Back to top
View user's profile Send private message
KS-Soft



Joined: 03 Apr 2002
Posts: 12792
Location: USA

PostPosted: Tue Aug 12, 2008 10:54 am    Post subject: Reply with quote

I think we need to narrow the problem. Can you do the following
- replace Active RMA to Passive RMA
- uninstall antivirus

If this will not help, we can return to ODBC and other test methods. E.g. could you do the following
- copy entire HostMonitor folder (e.g. "c:\program files\hostmonitor -> c:\test\hostmonitor)
- start hostmonitor (copy) using "hostmon.exe /stop" command line parameter
- disable logging (Options dialog)
- disable actions (menu Monitoring -> Disable)
- load copied HML file (e.g. c:\test\hostmonitor\main.hml)
- disable ODBC tests
- start monitoring
As result you will have 2 instances of HostMonitor: production copy and testing copy that can be modified at any time without problems.

Regards
Alex
Back to top
View user's profile Send private message Visit poster's website
mos-eisley



Joined: 21 Mar 2007
Posts: 76
Location: Klarup (AAlborg), Demark

PostPosted: Wed Aug 13, 2008 3:57 am    Post subject: Reply with quote

Just my few cents....

ODBC is the place to start, I am pretty sure of that - I have some experience with Oracle ODBC - but will keep my opinion to myself

My last attempt to use a version 10 ODBC driver became pretty nasty, so I am on a stable v9 driver and that works fine with v10 databases.

Så start looking into the ODBC...
Back to top
View user's profile Send private message MSN Messenger
hsq



Joined: 28 Jun 2007
Posts: 15

PostPosted: Tue Aug 19, 2008 2:38 am    Post subject: Reply with quote

Alex,

We changed the test execution agents to passive everywhere, but the actives were still configured in RMA_MGR and connected to the Hostmonitor. We had another freeze 5 hours after the change.

I finally disabled activeRMA server completely in Hostmonitor, since then it runs perfectly without any freezes (uptime is now more than a day...)

So the problem is definitely around activeRMA as I suggested in the very beginning.

Regards,
Gabor
Back to top
View user's profile Send private message
losisoft



Joined: 21 Mar 2008
Posts: 43

PostPosted: Wed Aug 20, 2008 2:50 am    Post subject: Reply with quote

Hi Alex,

Unfortunately It was only a short victory. The system is freezing again, with no active agents. I have removed all agents. Using only passive at the moment.

Regards,
Jozsef
Back to top
View user's profile Send private message
KS-Soft



Joined: 03 Apr 2002
Posts: 12792
Location: USA

PostPosted: Wed Aug 20, 2008 7:52 am    Post subject: Reply with quote


We made some changes in the code in order to retrieve more information about "active rma" problem and started several copies of HostMonitor using 45 Active RMA for each HostMonitor. Unfortunately(?) all copies of HostMonitor still working fine...

So, we return to begining?
What about ODBC and antivirus monitor?

Thank you for your support and understanding. We cannot check all source code, its just too big. So we should narrow the problem somehow.

Regards
Alex
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic    KS-Soft Forum Index -> Configuration, Maintenance, Troubleshooting All times are GMT - 6 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group

KS-Soft Forum Index