KS-Soft. Network Management Solutions
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister    ProfileProfile    Log inLog in 

HM freezes after update from V8.86 to V9.06
Goto page 1, 2, 3, 4, 5, 6, 7, 8  Next
 
Post new topic   Reply to topic    KS-Soft Forum Index -> Bug reports
View previous topic :: View next topic  
Author Message
rc



Joined: 01 Aug 2005
Posts: 101

PostPosted: Tue Dec 20, 2011 6:55 am    Post subject: HM freezes after update from V8.86 to V9.06 Reply with quote

Hi Aleks,

yesterday I updated our HM installation from V8.86 to V9.06 and it was running approximately 6 hours without problems. But after this HM freezed. The test execution was broken. I have no error messages and the same issue after restart HM service but number of threads was approximately 1000! It's a pity because old version V8.86 is very stable.

We use Windows Server 2003 SP2 on HP ProLiant DL380 G6 (8 Core 2.27 Ghz Intel Xeon/4086 MB RAM). We have 29 test/sec

If you like i would send you our configuration files.

kind regards
Enrico
Back to top
View user's profile Send private message
KS-Soft



Joined: 03 Apr 2002
Posts: 11782
Location: USA

PostPosted: Tue Dec 20, 2011 7:09 am    Post subject: Reply with quote

There are thousand changes in the code - that's why we uploaded Beta version and wait for bug reports 6 weeks. There are over 14,000 downloads but nobody sent bug reports for Beta version (as usually)
Also we spent many many weeks testing software on our servers, it works great here.

Do you have ODBC Query tests? ODBC logging?
Have you changed ODBC driver recently?
Yes, please send HML, LST and INI files to support@ks-soft.net

Regards
Alex
Back to top
View user's profile Send private message Send e-mail Visit poster's website
rc



Joined: 01 Aug 2005
Posts: 101

PostPosted: Tue Dec 20, 2011 7:39 am    Post subject: Reply with quote

...Ok, I send you the files.
We don't use ODBC logging but we have 5 ODBC Query tests.
Also I don't changed ODBC driver recently.

Regards
Enrico
Back to top
View user's profile Send private message
apaitoperations



Joined: 24 Feb 2011
Posts: 40

PostPosted: Sun Jul 15, 2012 4:32 am    Post subject: same here Reply with quote

I upgraded from 8.68 to 8.82 an then the problems started.

HM-Service stuck every few hours on server 2008 R2 64bit.

see http://www.ks-soft.net/cgi-bin/phpBB/viewtopic.php?t=6377&start=0&postdays=0&postorder=asc
Back to top
View user's profile Send private message
rc



Joined: 01 Aug 2005
Posts: 101

PostPosted: Mon Jul 16, 2012 1:09 am    Post subject: ...finally somebody with the same issue Reply with quote

... unfortunately, I also have furthermore this phenomenon.
Always after approximately 16 millions explained tests no more checks are explained.
However, the application still reacts. There never are entries in the log or unusually high values in the task manager. I restart the service for months about a second Hostmonitor automatically.

In the meantime, I have evacuated all ODBC tests on a RMA agent and have updated the ODBC driver. ODBC-Logging I do not have. In it it does not lie definitively.

The behaviour of the Hostmonitors (V9.18) is always same. Everything functions perfectly up to the border of 16 millions explained tests (3 millions per day).

It would be nice if the mistake is still found.

Best greetings
Enrico
Back to top
View user's profile Send private message
apaitoperations



Joined: 24 Feb 2011
Posts: 40

PostPosted: Mon Jul 16, 2012 1:50 am    Post subject: now we are already three Reply with quote

Hello Enrico,

We are already three users with the same issue :

You, me and Kris !

As far as i understand we use different versions on different Operating-Systems so that cannot be the problem - it must be something in the code after version 8.68 but maybe before version 8.82.

Maybe we should open a new topic for the three of us and post there ?
What do you think ?

wbr
Georg
Back to top
View user's profile Send private message
apaitoperations



Joined: 24 Feb 2011
Posts: 40

PostPosted: Mon Jul 16, 2012 2:03 am    Post subject: haow to count checks Reply with quote

Enrico,

How do you count the checks so you know about the 16 millions border ?

My HM-Service freezes about every 5 hours.
I have 33 checks per second.

Also no ODBC-Logging and no ODBC-Checks - i removed all i had because of permanent problems with odbc-drivers. I now use my own shell-scripts to connect to databases.

But i use about 200 different Text-Logs.

I also never see any entries in any logfile when hm is stuck.

When HM is frozen : rcc stays functional and it even sends commands to HM-Service and the commands reach HM-Service - i see that in the userlog.xml.

So HM-Service just sits and waits and doesnt perform any checks any more. I also see that in Ressource-Manager there is always one thread that keeps the other threads waiting - use "Analyze Wait Chain" in Ressource Monitor on hostmon.exe

wbr
Georg
Back to top
View user's profile Send private message
rc



Joined: 01 Aug 2005
Posts: 101

PostPosted: Mon Jul 16, 2012 2:05 am    Post subject: Reply with quote

Hello Georg,

if it helps the solution, with pleasure

Enrico
Back to top
View user's profile Send private message
rc



Joined: 01 Aug 2005
Posts: 101

PostPosted: Mon Jul 16, 2012 2:42 am    Post subject: Reply with quote

Hello Georg,

I have just seen that you have added still an other contribution.

I let myself send an SMS with status information by the Hostmonitor every day. If the service is restarted because no more test is explained, I also get an SMS from 2nd Hostmonitor. About that I can see or calculate how many tests were explained.

This number is about always same. This puts out with me approximately 5 days up to the next new start. These are 37 tests per second

Enrico
Back to top
View user's profile Send private message
rc



Joined: 01 Aug 2005
Posts: 101

PostPosted: Mon Jul 23, 2012 7:01 am    Post subject: Yesterday it happened again Reply with quote

Hello George,

Yesterday it happened again.
After almost exactly carried out 16 million checks in about 5 days my second instance of HostMonitor has again determined that the value of testing / per second for the main installation has reached 0 again and triggered an automatic restart of the service.

What is amazing here again is the fact that the host monitor is more accessible, because the second host monitor can easily detect even the values ​​for the test method "Check Host Monitor".

Without this check, I could not see from the outside so that no more checks are performed. In the meantime I've swapped all ODBC checks on an RMA and the values ​​for CPU and memory are all the time at a low level. Nevertheless, I have the latest version v9.22 the same phenomenon.

@ Alex: Is this event can not be documented in the system log?
There must be some method to get to the stop monitoring an error message.

The constant restarting the service is not bad, but it has the disadvantage that the alarm profile with dependency actions filed after the restart can not be executed, for example, reset the interval to the original value. In this case, I must always intervene manually.

In the hope of a solution
Enrico
Back to top
View user's profile Send private message
KS-Soft



Joined: 03 Apr 2002
Posts: 11782
Location: USA

PostPosted: Mon Jul 23, 2012 8:01 am    Post subject: Reply with quote

1st we should understand what exactly happened.
May be HostMonitor works just fine but there is some mistake in HM Monitor test method or some statistic counter? Are you sure HostMonitor did not perform tests?

Could you remove "restart service" action and instead of restart, try to check what is wrong.
1) use Auditing Tool to check for errors/warnings
2) check HostMonitor system log (specified on System Log page in HostMonitor Options dialog) for errors
3) check if HostMonitor can perform tests, try to refresh some simple Ping test that does not have any Master tests and performed directly by HostMonitor. Check "Recurrences" and "Last test time" fields
4) check test statuses. Do you see a lot of tests with Unknown or Checking status?
5) check resource usage for each process started on the system. You may use standard Windows Task Manager to check Handles, GDI and USER objects. What is the total resource usage on the system? How many handles/threads/GDI objects used by hostmon.exe process?
Write some notes, counters, then restart HostMonitor.

Quote:
@ Alex: Is this event can not be documented in the system log?

What kind of the event? HostMonitor records event when monitoring is stopped. Also you may setup HostMonitor to start actions when monitoring is stopped or paused.
But we do not have any idea what happened on your system. Monitoring was stopped? Paused? or may be HostMonitor delayed test execution because of some logging related problem? or may be HostMonitor delayed test execution because a lot of tests cannot be finished waiting for answer from target systems?
Have you checked system log, Auditing Tool? Any errors?

Regards
Alex
Back to top
View user's profile Send private message Send e-mail Visit poster's website
rc



Joined: 01 Aug 2005
Posts: 101

PostPosted: Mon Jul 23, 2012 8:33 am    Post subject: Reply with quote

OK Alex, thanks for the quick reply.

However, I can only write always the same:
I have never found any error messages.
If the error occurs, no further checks can also be manually executed. The "Disable / Enable Host Monitor" is then no function.
The error also occurs not at a particular time, but with me always exactly carried out after 16 million tests. This is, unfortunately, often on weekends or at night. Since my availability monitoring is important, the application will be restarted immediately.

Incidentally, I use the function "store historical data in the file" for a total of 12 639 tests.

As described in the application behaves normally otherwise. Only the main task - the testing is not met.

Because I just do not have the time to narrow down the cause further, I've solved the problem by automatically restarting the service and how you see other people have the same problem.

If there is no option that produces an error message, I honestly see no chance to find the cause of the problem.

I'm sorry.

Enrico
Back to top
View user's profile Send private message
KS-Soft



Joined: 03 Apr 2002
Posts: 11782
Location: USA

PostPosted: Mon Jul 23, 2012 8:38 am    Post subject: Reply with quote

Quote:
The "Disable / Enable Host Monitor" is then no function

You cannot disable test??
Using RCC? Can you login to system directly and try to disable test item using HostMonitor GUI?

No errors in the log..
What about Unknown test statuses? Do you see a lot of such records in the log (not system log but regular log with test results)?

Regards
Alex
Back to top
View user's profile Send private message Send e-mail Visit poster's website
rc



Joined: 01 Aug 2005
Posts: 101

PostPosted: Mon Jul 23, 2012 8:49 am    Post subject: Reply with quote

Quote:

You cannot disable test??
Using RCC? Can you login to system directly and try to disable test item using HostMonitor GUI?

No errors in the log..
What about Unknown test statuses? Do you see a lot of such records in the log (not system log but regular log with test results)?


No, I mean the feature stop / start monitor.
Tests in the status of "Unknown" are not displayed.
The log (I log each test result) ends just at the time when checking stops.
Back to top
View user's profile Send private message
apaitoperations



Joined: 24 Feb 2011
Posts: 40

PostPosted: Mon Jul 23, 2012 8:57 am    Post subject: exactly the same here Reply with quote

exactly the same happens here with my installation but about every 5 hours - thats about 450.000 to 550.000 tests.

it just stops to execute the checks. before it happens i think it gets slower and slower - the intervalls are not met anymore - i see that in full logging logfiles - should perform check every minute - suddenly check is performed after 2 minutes then 5 minutes then it stops completely.

also here no sign of any event message in any logfile or eventlog.

wbr
Georg
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    KS-Soft Forum Index -> Bug reports All times are GMT - 6 Hours
Goto page 1, 2, 3, 4, 5, 6, 7, 8  Next
Page 1 of 8

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group

KS-Soft Forum Index