hostmonitor freezing
Well, mos-eisley gave his two cents, I'm gonna toss in mine.
If this were my issue I'd start a divide and conquer approach. Using another box I'd install a fresh 7.5 load, export sections of my testing to that box, and validate, adding additional tests until it breaks. If the test box won't handle the full load, swap exports in/out as needed.
If it never breaks then the focus should be your 'new' environment.
Meanwhile, depending on your test list, you could start playing with some of the 'performance' options. For instance, the [Options]\Behavior\ selection for "Don't start more than {xx} tests per second" as well as other specific limits that can be applied, such as the SNMP traps and Performance Counter tests limits on the \Miscellaneous\ tab. If they don't fix the issue they may reveal something if the frequency of freezing changes.
Other thoughts:
I know you've checked and rechecked, but ODBC is still a common suspect.
Could the 'freeze' be associated with an action taken instead of a test performed? The reason for the randomness may be due to an automated action kicking in. Off the top of my head, I'm not sure how you would debug that, but it's something to think about. Maybe check your logs to see if a failure time syncs with the freeze.
Are you moving from a single core to a multi-core processor? Could it be an issue with affinity. You could try forcing processor affinity.
Alex,
This is likely a moot point and I'm sure this has been tested in multi-core environments, but to satisfy my own curiosity, do you progamatically select a core (a crap shoot I know
). And what about multi-threaded tasks, scripts and external calls - are they 'tied' to the applications core or allowed to run independently and auto-select.
Also, could there be a 'bug' associated with processor make (Intel, AMD) or model? (In case they've changed processor platforms)
I only ask these things because, from the sound of it, the only new thing is the hardware involved.
If this were my issue I'd start a divide and conquer approach. Using another box I'd install a fresh 7.5 load, export sections of my testing to that box, and validate, adding additional tests until it breaks. If the test box won't handle the full load, swap exports in/out as needed.
If it never breaks then the focus should be your 'new' environment.
Meanwhile, depending on your test list, you could start playing with some of the 'performance' options. For instance, the [Options]\Behavior\ selection for "Don't start more than {xx} tests per second" as well as other specific limits that can be applied, such as the SNMP traps and Performance Counter tests limits on the \Miscellaneous\ tab. If they don't fix the issue they may reveal something if the frequency of freezing changes.
Other thoughts:
I know you've checked and rechecked, but ODBC is still a common suspect.
Could the 'freeze' be associated with an action taken instead of a test performed? The reason for the randomness may be due to an automated action kicking in. Off the top of my head, I'm not sure how you would debug that, but it's something to think about. Maybe check your logs to see if a failure time syncs with the freeze.
Are you moving from a single core to a multi-core processor? Could it be an issue with affinity. You could try forcing processor affinity.
Alex,
This is likely a moot point and I'm sure this has been tested in multi-core environments, but to satisfy my own curiosity, do you progamatically select a core (a crap shoot I know

Also, could there be a 'bug' associated with processor make (Intel, AMD) or model? (In case they've changed processor platforms)
I only ask these things because, from the sound of it, the only new thing is the hardware involved.
Yes, we are testing HostMonitor on such systems.This is likely a moot point and I'm sure this has been tested in multi-core environments
No, HostMonitor does not use some special procedures to manage threads. Windows does this.do you progamatically select a core
HostMonitor decide what test should be performed and starts thread. Each test performed by separate thread that returns results into main thread.what about multi-threaded tasks, scripts and external calls - are they 'tied' to the applications core or allowed to run independently and auto-select
Everything is possible in this worldAlso, could there be a 'bug' associated with processor make (Intel, AMD) or model? (In case they've changed processor platforms)


Regards
Alex
Yes, we are colleges.KS-Soft wrote:PS Just to clarify: are you working in the same team (losisoft and hsq) with the same instance of HostMonitor?

Currently we move away all the ODBC queries which where running from hostmon to agents. I guess with that we can figure out if it's ODBC releated or not.KS-Soft wrote:![]()
So, we return to begining?![]()
What about ODBC and antivirus monitor?
Hi Alex,
Much appreciate all your efforts.
In order to narrowing the problem I try to push down all our ODBC tests to the passive agents and do not let the hostmonitor itself executing any of them.
In case of the ODBC driver having a problem/bug, does this step helps?
In my logic it can be a "workaround" as the process will not use the local oracle client. In worst case it is ending in an RMA freeze but not in a stucked hostmonitor process.
All the local antivirus components are stopped since 3 days so it is not influencing this game at all.
Regards,
Gabor
Much appreciate all your efforts.
In order to narrowing the problem I try to push down all our ODBC tests to the passive agents and do not let the hostmonitor itself executing any of them.
In case of the ODBC driver having a problem/bug, does this step helps?
In my logic it can be a "workaround" as the process will not use the local oracle client. In worst case it is ending in an RMA freeze but not in a stucked hostmonitor process.
All the local antivirus components are stopped since 3 days so it is not influencing this game at all.
Regards,
Gabor
If you still using ODBC logging, then this will not help.In order to narrowing the problem I try to push down all our ODBC tests to the passive agents and do not let the hostmonitor itself executing any of them.
In case of the ODBC driver having a problem/bug, does this step helps?
If you cannot disable ODBC logging, could you start anoher "test" instance of HostMonitor without ODBC logging?
E.g. could you do the following
- copy entire HostMonitor folder (e.g. "c:\program files\hostmonitor -> c:\test\hostmonitor)
- start hostmonitor (copy) using "hostmon.exe /stop" command line parameter
- disable logging (Options dialog)
- disable actions (menu Monitoring -> Disable)
- load copied HML file (e.g. c:\test\hostmonitor\main.hml)
- start monitoring
As result you will have 2 instances of HostMonitor: production copy and testing copy that can be modified at any time without problems.
Yes, RMA is very helpul to narrow some problem when the problem caused by some test method. Its pretty easy to change "test by" property for set of tests.In my logic it can be a "workaround" as the process will not use the local oracle client. In worst case it is ending in an RMA freeze but not in a stucked hostmonitor process.
Regards
Alex
Hi Alex,KS-Soft wrote:Do you have installed Microsoft Windows Search 4.0?
May be it leads to some problems... we are not sure yet.
Regards
Alex
No, we don't use Windows search.
We have changed how we log into MS-SQL, and that stabilized the system more or less. We still have some hanging, but it's like 1-2x a day.
Regards,
Jozsef