KS-Soft. Network Management Solutions
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister    ProfileProfile    Log inLog in 

Constant, random "unknown" statuses
Goto page 1, 2  Next
 
Post new topic   Reply to topic    KS-Soft Forum Index -> Configuration, Maintenance, Troubleshooting
View previous topic :: View next topic  
Author Message
Brian_Mckay



Joined: 19 Feb 2004
Posts: 4
Location: San Francisco, CA

PostPosted: Thu Feb 19, 2004 11:26 am    Post subject: Constant, random "unknown" statuses Reply with quote

Hi all,

we've been fine tuning Host Mon for a few weeks now, and we're currently monitoring about twenty servers. The problem we're having is that Host Mon is constantly returning unknown statuses, which has my boss worried. It seems to happen the most on our Exchange and SQL database servers, usually on tests monitoring certain services and the event logs. I currently have the action profile set up to repeat the test after it comes back with a "bad" or "unknown" for a second time, followed by an email alert after the third failure and a pager alert after the fourth. In most cases, this has kept us from getting deluged with notifications, but in a few instances, like on the exchange server, sometimes the tests still don't come back okay even after 4 tries.

I guess what I'm trying to discern is, what causes an "Unknown" status? is it connected to the test timing out due to network latency, or because there are too many resources already being used on the target server? Is there any way we can reduce the number of "unknown" statuses we get back (would changing the time interval between tests make a difference?). I've seen something mentioned on the board about running Host Mon as a service - does that tend to produce better results?
Back to top
View user's profile Send private message Send e-mail
KS-Soft



Joined: 03 Apr 2002
Posts: 12791
Location: USA

PostPosted: Thu Feb 19, 2004 5:27 pm    Post subject: Reply with quote

I don't think HostMonitor will work differently in service mode. As I understand you have problem with 2 test methods: Service test and NT Event Log test?
Service test displays "Unknown" status when system cannot connect to remote Service Control Manager. As well as NT Event Log test displays "Unknown" status when cannot establish connection to remote system.
If this error occurs irregularly, one of possible reasons could be high network traffic that cause DNS requests to fail. In this case usage of IP addresses instead of name of the systems should help.

Regards
Alex
Back to top
View user's profile Send private message Visit poster's website
Brian_Mckay



Joined: 19 Feb 2004
Posts: 4
Location: San Francisco, CA

PostPosted: Fri Feb 20, 2004 9:23 am    Post subject: Reply with quote

Thanks Alex, I'll go back and modify all of these tests to use IP address and see how things go over the weekend.
Back to top
View user's profile Send private message Send e-mail
Brian_Mckay



Joined: 19 Feb 2004
Posts: 4
Location: San Francisco, CA

PostPosted: Fri Feb 20, 2004 1:23 pm    Post subject: Reply with quote

I thought of one other thing to try - Most of the services tests are scheduled for every ten minutes. Perhaps it was trying to poll ALL of them on a given server at exactly every ten minutes? I tried staggering the services tests 5-10 seconds apart on two of my worst offenders (exchange server and database server), and so far they've been clean of unknown statuses for the past hour. I will continue to monitor this and try it on other servers.
Back to top
View user's profile Send private message Send e-mail
KS-Soft



Joined: 03 Apr 2002
Posts: 12791
Location: USA

PostPosted: Fri Feb 20, 2004 1:34 pm    Post subject: Reply with quote

Quote:
I thought of one other thing to try - Most of the services tests are scheduled for every ten minutes. Perhaps it was trying to poll ALL of them on a given server at exactly every ten minutes?


May be not all of them but many of them. There is "Don't start more than [N] tests per second" option on Behavior page in the Options dialog. This parameter defines how many tests per second the program will start. Default value (32) is good for most networks but may be its not good in your case.

Regards
Alex
Back to top
View user's profile Send private message Visit poster's website
Brian_Mckay



Joined: 19 Feb 2004
Posts: 4
Location: San Francisco, CA

PostPosted: Fri Feb 20, 2004 1:53 pm    Post subject: Reply with quote

sounds good - I'll give that a try as well and see how it performs over the weekend.
Back to top
View user's profile Send private message Send e-mail
mpriess



Joined: 02 Jul 2002
Posts: 112
Location: Arizona, USA

PostPosted: Fri Apr 02, 2004 12:20 pm    Post subject: We still have issues with random unkown statuses... Reply with quote

Hi Alex,

We receive unknowns often for performance counter and event log checks for several servers. The other tests that are pointing to the same servers are fine (TCP, PING, service checks, etc).

Here is another strange thing: If the test that is "unknown" is currently using an IP address and we change it to hostname it will then go back to a good status....OR VICE VERSA....If the test is currently using hostname and we switch it to IP address this sometimes fixes it as well.
But a simple test refresh or disable\enable will not work.

There are one or two that don't come back at all. Even after enabling\disabling and trying all the steps above.
No permissions have been changed on any of the servers.
We are using hostmonitor 4.30. Windows 2000 Server - All the latest SP's and patches. Running about 760 tests with approximately 3,750,550 tests completed in the last two weeks. Server utilization averages about 30% for the CPU. Compaq 1850R - PII500. Available Memory is fine.

Is this too many tests over that time period?

We do notice that hostmonitor "freezes" for about 30 seconds or so every few minutes and the interface becomes unusable...and after the 30 seconds everything goes back to normal and we can move around in the hostmonitor interface again. We do not know if this is contributing to the unkown statuses or not but wanted to make mention of it to see if anyone else is experiencing this.

Hostmonitor is still doing a great job alerting and nothing else has been an issue. Just the freeze up issue and the unknown status issue.

Thanks for any assistance you can provide. I'm sending a screen shot of the tests for one of the servers to your email address in case that will help.

Thx,
Mark[/img]
Back to top
View user's profile Send private message
mpriess



Joined: 02 Jul 2002
Posts: 112
Location: Arizona, USA

PostPosted: Fri Apr 02, 2004 2:02 pm    Post subject: To add to the last post.... Reply with quote

I have to stop and restart hostmonitor to get the tests to go back to a "good" status most of the time.

Thx,
Mark
Back to top
View user's profile Send private message
KS-Soft



Joined: 03 Apr 2002
Posts: 12791
Location: USA

PostPosted: Fri Apr 02, 2004 2:26 pm    Post subject: Reply with quote

Eh!
We do not believe anymore that pdh.dll can work in multithreaded environment. So, we decided to implement 4th method to work with this DLL - external application. We will implement simple utility that will be called by HostMonitor and perform the test. This way pdh.dll will be loaded for single test only. Of course it will need more resources but I hope it will work releable.

Regards
Alex
Back to top
View user's profile Send private message Visit poster's website
genasea



Joined: 25 Sep 2002
Posts: 27

PostPosted: Fri Apr 16, 2004 3:31 pm    Post subject: Same issue Reply with quote

We have spent dozens of hours attempting to find a pattern with these constant 'Uknown' statuses. We have more than 30 tests setup to the MS Exchange server, and when we cutover to a new Exchange server, most of the tests went into the 'Unknown' limbo state quite regularly. The old server running NT worked fine for tens of thousands of sample, no 'Unknowns', now the new server runs on 2000 SP4 and nothing but problems. These tests are all disabled due to the unreliability of the tests.

One of our SQL servers (which also runs 2000 SP4) is also having major problems (i.e. returns 'Unknowns' a large amount of the time).

We cutover to a new hostmonitor server, running XP, and couldn't even get performance counter tests to work on more than 20 tests (all would go into "Unknown' if more tests were enabled). So we rebuilt the server to 2000 SP4, and had the same result. Next we rebuilt the box to 2000 SP3 (to match the OS of the old server), and now are back to the regular amount of 'Unknowns' when conducting Microsoft server testing.

99% of our 'Unknown' responses are coming from Performance Counter Tests.

I do hope that the new solution from Hostmonitor can deal effectively with this issue.

Thank you,

Scott
Back to top
View user's profile Send private message
KS-Soft



Joined: 03 Apr 2002
Posts: 12791
Location: USA

PostPosted: Mon Apr 19, 2004 8:43 pm    Post subject: Reply with quote

There is new module at www.ks-soft.net/download/hm445.zip
This version supports new "External" mode for Performance Counter test. It should fix problems with the test. Use Miscellaneous page in the Options dialog to set new mode.

Also this version supports new SNMP Trap test method. If you want to try this method, I would recommend to copy your existent Advanced Host Monitor into another directory (or another computer), unzip new modules, and setup SNMP Trap tests for testing purpose only.

Regards
Alex
Back to top
View user's profile Send private message Visit poster's website
timn



Joined: 20 Nov 2003
Posts: 184
Location: United States

PostPosted: Tue Apr 20, 2004 8:31 am    Post subject: Reply with quote

Alex:

I'm not sure this is related but.....

We find that after an overnight reboot of the roughly 60 machines
that are part of a web farm, several perfmon tests will get stuck
in an 'unknown' state. Because all these tests are actually performed
by RMA, we've found restarting the RMA will fix this in many instances.

Here is what we think is going on. We believe that HM is requesting the
perfmon data from the RMA too soon after reboot -- indeed these tests are currently dependent only upon an IP Ping of the remote machine. Thus, if a test is performed AFTER IP network becomes available but BEFORE the actual module that supports the specific perfmon test (for example, Web Publishing Services) has completed initialization, the RMA will return 'unknown', and then becomes 'stuck' at that value until the RMA is restarted.

We believe that we may be able to correct this by making such perfmon tests dependent upon some other kind of test (rather than simple ping) that would ensure relevent services/modules are loaded and initialized prior to calling RMA to perform perfmon test. Just haven't had time to think this through completely.
Back to top
View user's profile Send private message
KS-Soft



Joined: 03 Apr 2002
Posts: 12791
Location: USA

PostPosted: Tue Apr 20, 2004 6:20 pm    Post subject: Reply with quote

Timn,
Yesterday we uploaded update for HostMonitor.
Today we uploaded update for RMA: www.ks-soft.net/download/rma119.zip It also includes perfobj.exe module - external performance counter retriever.
If you update RMA, copy perfobj.exe into RMA's directory and add "PerfWorkMode=3" line into rma.ini file ([Misc] section), it should fix the problem. Please note: you have to restart agent if you made changes in rma.ini file

Regards
Alex
Back to top
View user's profile Send private message Visit poster's website
timn



Joined: 20 Nov 2003
Posts: 184
Location: United States

PostPosted: Fri Apr 23, 2004 9:32 am    Post subject: Reply with quote

Alex:

I tried this on one machine. I did upgrade agent to 1.19, added a "[Misc]" section to the RMA.INI file with line reading "PerfWorkMode=3", I restarted the agent . But now all my perfmon tests for this machine fail with message (for example):

RMA: 301 - Error: Invalid Result (C:\Program Files\RMA-Win>C:\Program Files\RMA-Win\perfobj.exe "\Process(inetinfo)\Thread Count" -n 1)

Any ideas? What did I do wrong?
Back to top
View user's profile Send private message
KS-Soft



Joined: 03 Apr 2002
Posts: 12791
Location: USA

PostPosted: Fri Apr 23, 2004 9:37 pm    Post subject: Reply with quote

Its already fixed. New modules located at the same place:
www.ks-soft.net/download/rma119.zip
www.ks-soft.net/download/hm445.zip

Regards
Alex
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic    KS-Soft Forum Index -> Configuration, Maintenance, Troubleshooting All times are GMT - 6 Hours
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group

KS-Soft Forum Index