KS-Soft. Network Management Solutions
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister    ProfileProfile    Log inLog in 

HM freezes after update from V8.86 to V9.06
Goto page Previous  1, 2, 3, 4, 5, 6, 7, 8  Next
 
Post new topic   Reply to topic    KS-Soft Forum Index -> Bug reports
View previous topic :: View next topic  
Author Message
KS-Soft



Joined: 03 Apr 2002
Posts: 12795
Location: USA

PostPosted: Tue Nov 13, 2012 6:28 am    Post subject: Reply with quote

Quote:
Since the host monitor or the RCC is still responding, one must assume that the monitoring task is waiting, but for what?

It waits until some of these 1000 threads finish execution.
Do you see some "Timed out" or "Checking" test items?

Quote:
Incidentally, there is in the UDP test method a small bug. In the GUI I enter 2 seconds for timeout. The variable AlertThreshold but shows 2 ms

You are right, our mistake
Will be fixed in next version.

Quote:
At URL checks often the Windows timeout is registered.

How many URL test do you have?

Quote:
What we need in my opinion is a more detailed debug logging: Perhaps it is possible to generate messages in syslog when within a minute no more tests were performed. If appear in the log a list of current tests, one could limit the error might easily.

I think we don't need just messages about pause. We need list of tests that take too much time for execution.
And probably you already have such log. "Timed out" items should be recorded into regular log that stores test results. Do you see such items?

Its more difficult to catch tests that take 3 or 5 min for execution.
You can adjust one internal parameter and tell HostMonitor to mark test as "timed out" after 3 or 5 min (instead of 15 min by default) but then some "normal" tests that need a lot of time for execution will fail...

When problem appears, could you check for "Timed out" test items (Unknown status, "Timed out" reply string)?
Test items that stay in "Checking" status for a while?
Do you see "timed out" items in the log?

Regards
Alex
Back to top
View user's profile Send private message Visit poster's website
rc



Joined: 01 Aug 2005
Posts: 100

PostPosted: Tue Nov 13, 2012 7:23 am    Post subject: Reply with quote

I analyzed the complete log from yesterday. I've found no evidence before the failure, indicating that something is wrong. With a total of 500 tests of 3.2 million tests, the result was "timeout". There were only SNMP Get tests.

I have 9130 Snmp Get tests in my configuration:
0001 x 1000 ms timeout
8976 x 2000 ms timeout
0063 x 3000 ms timeout
0049 x 5000 ms timeout
0041 x 10000 ms timeout

To another question:
I have a total of 103 url checks and 86 http checks

At the relevant time there was only one test in the state Unknown.
Checking the condition "Checking" is impossible for me at 13000 tests a whole. Can I create a view for it maybe? Today for me it is simply not visible in the GUI or in the log what the application is doing.

Quote:
You can adjust one internal parameter and tell HostMonitor to mark test as "timed out" after 3 or 5 min (instead of 15 min by default) but then some "normal" tests that need a lot of time for execution will fail...


I've found in the hostmon.ini no parameters to this.

Perhaps it would be a nice to have an indicator under auditing tool that display tests that pose a potential danger
Back to top
View user's profile Send private message
KS-Soft



Joined: 03 Apr 2002
Posts: 12795
Location: USA

PostPosted: Tue Nov 13, 2012 11:34 am    Post subject: Reply with quote

Quote:
I analyzed the complete log from yesterday. I've found no evidence before the failure, indicating that something is wrong. With a total of 500 tests of 3.2 million tests, the result was "timeout". There were only SNMP Get tests

There are 2 different messages
"Timeout" is not a problem (unless timeout specified for the test is very long)
"Timed out" indicates a problem regardles of test settings.

Quote:
I have 9130 Snmp Get tests in my configuration:
0001 x 1000 ms timeout
8976 x 2000 ms timeout
0063 x 3000 ms timeout
0049 x 5000 ms timeout
0041 x 10000 ms timeout

Should not be a problem. Unless Retries value is high...
Could you send HML file with tests to support@ks-soft.net? May be we can find something...

Quote:
Checking the condition "Checking" is impossible for me at 13000 tests a whole. Can I create a view for it maybe? Today for me it is simply not visible in the GUI or in the log what the application is doing.

There is no such option.

Quote:
Perhaps it would be a nice to have an indicator under auditing tool that display tests that pose a potential danger

Its not easy to tell...
Probably HostMonitor should collect some statistics, this would be more usefull than check test settings

Regards
Alex
Back to top
View user's profile Send private message Visit poster's website
apaitoperations



Joined: 24 Feb 2011
Posts: 40

PostPosted: Tue Nov 20, 2012 3:56 am    Post subject: Cant find the debuglog-feature Reply with quote

Hello !

I also installed the version 9.32 but i cant find the debuglog-feature in RCC-> Auditing Tool

What am i missing ?

wbr
Georg Höllebauer
Back to top
View user's profile Send private message
KS-Soft



Joined: 03 Apr 2002
Posts: 12795
Location: USA

PostPosted: Tue Nov 20, 2012 9:24 am    Post subject: Reply with quote

You just open Auditing Tool, log file is created automatically.

Regards
Alex
Back to top
View user's profile Send private message Visit poster's website
apaitoperations



Joined: 24 Feb 2011
Posts: 40

PostPosted: Tue Nov 20, 2012 9:30 am    Post subject: debuglog location Reply with quote

and where will the debuglog be createtd ?

on HM-Server oder Maschine where RCC ist running an Auditing Tool is opened ?

wbr
Georg Höllebauer
Back to top
View user's profile Send private message
KS-Soft



Joined: 03 Apr 2002
Posts: 12795
Location: USA

PostPosted: Tue Nov 20, 2012 9:32 am    Post subject: Reply with quote

On system where HostMonitor is running, check folder where HostMonitor configuration files located.

Regards
Alex
Back to top
View user's profile Send private message Visit poster's website
apaitoperations



Joined: 24 Feb 2011
Posts: 40

PostPosted: Tue Nov 20, 2012 9:35 am    Post subject: found it :-) Reply with quote

HM just stopped working about 30 minutes ago - here is the debug-log

the last thing that gets updatet is the userlog.xml - so HM is recieving commands like refresh but actually does nothing.

----
2012-11-20 10:47:18
Timer1: 1 2012-11-20 10:47:17
Timer2: 1
PoolRecAvail: 4096
TTLimit1: 128
TCnt1: 9725
TTThreads: 70
LIdx: 3242
ATCnt2: 32
----
2012-11-20 10:48:21
Timer1: 1 2012-11-20 10:48:20
Timer2: 1
PoolRecAvail: 4096
TTLimit1: 128
TCnt1: 9725
TTThreads: 95
LIdx: 8862
ATCnt2: 32
----
2012-11-20 10:50:44
Timer1: 1 2012-11-20 10:50:44
Timer2: 1
PoolRecAvail: 4096
TTLimit1: 128
TCnt1: 9725
TTThreads: 128
LIdx: 9725
ATCnt2: 32
----
2012-11-20 16:00:59
Timer1: 1 2012-11-20 16:00:58
Timer2: 1
PoolRecAvail: 4096
TTLimit1: 128
TCnt1: 9734
TTThreads: 1003
LIdx: 3154
ATCnt2: 32
----
2012-11-20 16:19:52
Timer1: 1 2012-11-20 16:19:52
Timer2: 1
PoolRecAvail: 4096
TTLimit1: 128
TCnt1: 9734
TTThreads: 1003
LIdx: 3154
ATCnt2: 32
----
2012-11-20 16:29:03
Timer1: 1 2012-11-20 16:29:02
Timer2: 1
PoolRecAvail: 4096
TTLimit1: 128
TCnt1: 9734
TTThreads: 1003
LIdx: 3154
ATCnt2: 32
----
2012-11-20 16:33:18
Timer1: 1 2012-11-20 16:33:18
Timer2: 1
PoolRecAvail: 4096
TTLimit1: 128
TCnt1: 9734
TTThreads: 1003
LIdx: 3154
ATCnt2: 32
Back to top
View user's profile Send private message
KS-Soft



Joined: 03 Apr 2002
Posts: 12795
Location: USA

PostPosted: Tue Nov 20, 2012 9:42 am    Post subject: Reply with quote

Looks similar - monitoring is active but HostMonitor does not start new tests because there are a lot of started/unfinished test probes.
Do you see items with Unknown status and "Timed out" reply value?
Do you see items that hold "Checking" status for long time?

Regards
Alex
Back to top
View user's profile Send private message Visit poster's website
apaitoperations



Joined: 24 Feb 2011
Posts: 40

PostPosted: Tue Nov 20, 2012 9:51 am    Post subject: Reply with quote

Same here :

Hard to tell if there are many items in Status "Checking" - we use only views for filtering for Unknown and Bad items.

In view properties there is no possibility to filter for items in status checking.
Is there any other way to see only items in status checking or a statistic which items had status checking for a long time ?

wbr
Georg Höllebauer
Back to top
View user's profile Send private message
KS-Soft



Joined: 03 Apr 2002
Posts: 12795
Location: USA

PostPosted: Tue Nov 20, 2012 10:15 am    Post subject: Reply with quote

Update: www.ks-soft.net/download/hm932c.zip
(rcc.exe update is optional)

When this version creates debuglog1.txt, it checks for test items that have "Checking" status for 40 sec or more and records information about such items (status period, testID and test name).
But update means you have to restart HostMonitor so current list of "checking" test items will be lost

Right now you may create View to check for "Unknown/Timed out" test items. Also if there are such items, you should see records in regular log with test results.

Regards
Alex
Back to top
View user's profile Send private message Visit poster's website
apaitoperations



Joined: 24 Feb 2011
Posts: 40

PostPosted: Tue Nov 20, 2012 10:22 am    Post subject: Update Reply with quote

Hi !

I will update HM in the next 60 minutes and post again with new information.

Thanks for the quick update-file !

wbr
Georg Höllebauer
Back to top
View user's profile Send private message
apaitoperations



Joined: 24 Feb 2011
Posts: 40

PostPosted: Wed Nov 21, 2012 2:54 am    Post subject: new debuglog Reply with quote

Hi !

Here is the new debug-log with HM version 9.32c running :

----
2012-11-21 09:37:29
Timer1: 1 2012-11-21 09:37:28
Timer2: 1
PoolRecAvail: 4096
TTLimit1: 64
TCnt1: 9726
TTThreads: 1001
LIdx: 4402
ATCnt2: 32
----
2012-11-21 09:38:32
Timer1: 1 2012-11-21 09:38:32
Timer2: 1
PoolRecAvail: 4096
TTLimit1: 64
TCnt1: 9726
TTThreads: 1001
LIdx: 4402
ATCnt2: 32

The TTThreads always stops at around 1001-1003 and never goes beyond that

I dont see any Tests in Status Timed Out or Checking when HM stops monitoring.

With my Setup it stops every 3 to 3.5 hours.
I have 39 checks/sec

wbr
Georg Höllebauer
Back to top
View user's profile Send private message
apaitoperations



Joined: 24 Feb 2011
Posts: 40

PostPosted: Wed Nov 21, 2012 7:02 am    Post subject: another debuglog Reply with quote

HM froze again an here the actual debug-log :

2012-11-21 14:00:00
Timer1: 1 2012-11-21 13:59:59
Timer2: 1
PoolRecAvail: 4096
TTLimit1: 40
TCnt1: 9732
TTThreads: 1001
LIdx: 1596
ATCnt2: 32

Again the 1001 TTThreads - maybe thats a limitation somewhere in Windows-Server 2008 R2 SP1 ?

wbr
Georg Höllebauer
Back to top
View user's profile Send private message
KS-Soft



Joined: 03 Apr 2002
Posts: 12795
Location: USA

PostPosted: Wed Nov 21, 2012 8:32 am    Post subject: Reply with quote

1001 threads and no test items in "checking" status?? no test items with "timed out" reply??
How many threads displayed by Windows Task Manager for hostmon.exe process?

Regards
Alex
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic    KS-Soft Forum Index -> Bug reports All times are GMT - 6 Hours
Goto page Previous  1, 2, 3, 4, 5, 6, 7, 8  Next
Page 5 of 8

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group

KS-Soft Forum Index