View previous topic :: View next topic |
Author |
Message |
KS-Soft
Joined: 03 Apr 2002 Posts: 12795 Location: USA
|
Posted: Mon Jul 23, 2012 10:53 am Post subject: |
|
|
5) check resource usage for each process started on the system. You may use standard Windows Task Manager to check Handles, GDI and USER objects. What is the total resource usage on the system? How many handles/threads/GDI objects used by hostmon.exe process?
Write some notes, counters, then restart HostMonitor.
or you may setup several Performance Counter + Dominant Process tests to check resource usage. At least check Handles and Threads and record data innto private log using full mode.
Regards
Alex |
|
Back to top |
|
|
apaitoperations
Joined: 24 Feb 2011 Posts: 40
|
Posted: Wed Jul 25, 2012 6:02 am Post subject: reply |
|
|
Hello Alex,
I right now disabled all logging - private and system. Lets see.
I already sent you configuration files in March and you didnt find a problem - since then only a few changes were made. Do you mean that you may have missed something the first time ? If i send you config-files -> are there any credentials saved in the file ? I ask because of security reasons. which files do you need ?
I will switch over to this thread.
wbr
Georg Höllebauer |
|
Back to top |
|
|
KS-Soft
Joined: 03 Apr 2002 Posts: 12795 Location: USA
|
Posted: Wed Jul 25, 2012 6:27 am Post subject: |
|
|
We always remove files after testing (for security reasons) so we do not have them.
Yes we checked your settings and we tried to reproduce the problem without any luck. Also we tried to setup testlist using several thousands of private log files, again everything was fine.
Yes, we could miss something (and we cannot setup exactly the same environment, usually we cannot reach private target hosts from our network).
If you do not use "connect as" test option, then passwords stored in connlist.lst file, you may skip it.
Regards
Alex |
|
Back to top |
|
|
apaitoperations
Joined: 24 Feb 2011 Posts: 40
|
Posted: Wed Jul 25, 2012 10:03 am Post subject: no luck with logging disabled |
|
|
Hello Alex,
No luck with logging disabled - HM just stopped to execute checks.
When i stop and start monitoring it writes 4 or 5 log-entries to system-log but execution of checks still doesnt continue.
So logging is still functional but HM just doesnt execute any checks anymore.
There mist be a serious bug in the HM-core-code !!
wbr
Georg Höllebauer |
|
Back to top |
|
|
KS-Soft
Joined: 03 Apr 2002 Posts: 12795 Location: USA
|
Posted: Wed Jul 25, 2012 10:19 am Post subject: |
|
|
Yes, there must be a bug and we spent 200 hours looking for it. And we can spend 10000 more hours and do not find anything unless we get some tips that help us to understand where exactly we should look for the problem
What exactly means "4 or 5 log-entries"? What exactly message do you see in the log?
Could you please provide more information
1) use Auditing Tool to check for errors/warnings
2) check HostMonitor system log (specified on System Log page in HostMonitor Options dialog) for errors
3) check if HostMonitor can perform tests, try to refresh some simple Ping test that does not have any Master tests and performed directly by HostMonitor. Check "Recurrences" and "Last test time" fields
4) check test statuses. Do you see a lot of tests with Unknown or Checking status?
5) check resource usage for each process started on the system. You may use standard Windows Task Manager to check Handles, GDI and USER objects. What is the total resource usage on the system? How many handles/threads/GDI objects used by hostmon.exe process?
Write some notes, counters, then restart HostMonitor.
Regards
Alex |
|
Back to top |
|
|
apaitoperations
Joined: 24 Feb 2011 Posts: 40
|
Posted: Wed Jul 25, 2012 10:22 am Post subject: another strange thing |
|
|
when HM hangs i checked HM-Watchdog and it showed that HM is not completely stuck but performs a few checks every second but not nearly that much when it is working properly (33 checks/sec).
Watchdog shows a verly low zig-zag-thing in the left window.
wbr
Georg Höllebauer |
|
Back to top |
|
|
KS-Soft
Joined: 03 Apr 2002 Posts: 12795 Location: USA
|
Posted: Wed Jul 25, 2012 10:38 am Post subject: |
|
|
In some cases HostMonitor can slow down monitoring, e.g. when there is problem with logs, system resources or when too many test items cannot get result from servers.
That's why we need answers to other questions.
Regards
Alex |
|
Back to top |
|
|
apaitoperations
Joined: 24 Feb 2011 Posts: 40
|
Posted: Wed Jul 25, 2012 12:48 pm Post subject: update |
|
|
Hello Alex,
With 4 to 5 log entries i mean that :
Monitor stopped
SNMP Trap Listener : UPD Port closed
Monitoring stopped. Operator: 'myusername'
Monitor started
SNMP Trap Listener: UDP Port #162 opened for listening
Monitgoring stared. Operator: 'myusername'
So logging is working but the checks dont get executed.
so now to the 5 points :
1) no errors/warnings in Auditing Tool - there never were any.
2) no errors in Hostmonitor system log - there never were any.
3) i tried to refresh such a test and nothing happens - "Recurrences" and "Last test time" fields stay the same - as it always is when HM is stuck.
4) not many with with Unknown status - sometimes 2 to 5 and i never saw checks with Checking status when HM is stuck.
5) i will send the results per email as soon i have them.
wbr
Georg Höllebauer |
|
Back to top |
|
|
KS-Soft
Joined: 03 Apr 2002 Posts: 12795 Location: USA
|
Posted: Wed Jul 25, 2012 1:08 pm Post subject: |
|
|
One more reason: if you perform most of tests using Active RMA agents, that were connected but lost connection, HostMonitor may wait 4 min (I think) for new connection from agents without reporting any error.
But "refresh" operation should set status right away...
We need to see your settings and resource usage so we can verify some theories.
Also, when problems comes back, could you open Auditing Tool and check Performance History for last 24 or 48 hours using both Actions/Logging modes and send screen shot to us?
Regards
Alex |
|
Back to top |
|
|
apaitoperations
Joined: 24 Feb 2011 Posts: 40
|
Posted: Wed Jul 25, 2012 1:12 pm Post subject: update |
|
|
Hello Alex,
I dont use any active-RMA -> so thats not the problem.
I will send settings, resource-usage and auditing-tool-screenshot to you as soon as i have it.
wbr
Georg Höllebauer |
|
Back to top |
|
|
rc
Joined: 01 Aug 2005 Posts: 100
|
Posted: Thu Jul 26, 2012 5:10 am Post subject: |
|
|
Hi Alex,
from me there is something new:
I had to manually stop the last 2 days due to increased maintenance on our network monitoring for 4 hours. Because in my experience of our host monitor always stops after 16 million tests because of a bug, I was expecting that this time around about 8 hours shifts to the rear.
But this was a mistake. This morning I had another stop after only 14 million checks. My guess is that the error depends on the number of checks was so wrong.
Otherwise, I can confirm all the statements of George only.
On the 2nd host monitor, I have just watched again the graphics of HM checks: As you can see, unfortunately, no abnormalities. All curves are normal without any rise or fall until the error. All values go to zero suddenly.
I mean these values:
=== Information related to HostMonitor requested by the test item ===
HMVersionText : HostMonitor v. 9.22
HMVersionBin : 233E
HMSystemName : <server name>
HMStartedTime : 21.07.2012 13:09:00
HMTestItemsCnt : 12639
HMTestsDone : 0
HMTestsFailed : 0
HMActionsDone : 0
HMLogsDone : 0
HMTestsPerSec : 0,00
HMActionsPerSec : 0,00
HMLogsPerSec : 0,00
HMActionsATC : 504,44
HMLogsATC : 1,15
HMLoggingPoolUsage : 0,0 %
HMStatusString : monitoring started, alerts enabled, modifications not stored
Thus, there is unfortunately still no clue where the fault might lie.
Regards,
Enrico |
|
Back to top |
|
|
KS-Soft
Joined: 03 Apr 2002 Posts: 12795 Location: USA
|
Posted: Thu Jul 26, 2012 5:41 am Post subject: |
|
|
HMActionsATC : 504,44
This can be a problem. If HostMonitor have to start a lot of actions, 0.5sec per action is too much.
How did you get this value? Are you using HM Monitor test method with "display: action average time consumtion" option and you see this value in Reply field of the test? or perhaps this value displayed by '%HMActionsATC%' variable?
HMActionsPerSec : 0,00
That's because HostMonitor stopped tests? What "normal" value of this counter on your system?
Could you create several HM Monitor, Performance Counter and Dominand Process test methods, set Full logging mode using private log files and record the following info
Performance Counter test
- handles for hostmon.exe process
- threads for hostmon.exe process
Dominant Process test
- handles for top process
- threads for top process
HM Monitor test
- tests per second
- actions per sec
- logs per sec
- action average time consumtion
- logging average time consumtion
- logging pool usage
Could you please send your settings to support@ks-soft.net? We need all *.INI, *.LST file and your HML file with test list. You may skip connlist.lst file.
Also old versions started as service could create LogList.lst file in Windows directory. Please include this file as well.
Apaitoperations, we need your files as well.
Regards
Alex |
|
Back to top |
|
|
rc
Joined: 01 Aug 2005 Posts: 100
|
Posted: Thu Jul 26, 2012 6:39 am Post subject: |
|
|
OK, I sent some comments to the address support@ks-soft.net |
|
Back to top |
|
|
apaitoperations
Joined: 24 Feb 2011 Posts: 40
|
Posted: Thu Jul 26, 2012 8:06 am Post subject: HM Konfig-files sent |
|
|
Alex,
I just sent our config-files per email.
its a 8 Mb zip-file
should i also create these checks with another HM- Instance and set full logging :
Performance Counter test
- handles for hostmon.exe process
- threads for hostmon.exe process
Dominant Process test
- handles for top process
- threads for top process
HM Monitor test
- tests per second
- actions per sec
- logs per sec
- action average time consumtion
- logging average time consumtion
- logging pool usage
wbr
Georg Höllebauer |
|
Back to top |
|
|
KS-Soft
Joined: 03 Apr 2002 Posts: 12795 Location: USA
|
Posted: Thu Jul 26, 2012 8:39 am Post subject: |
|
|
Yes, I sent you e-mail.
Lets use e-mail...
Regards
Alex |
|
Back to top |
|
|
|