6.82 not sending Alerts

All questions related to installations, configurations and maintenance of Advanced Host Monitor (including additional tools such as RMA for Windows, RMA Manager, Web Servie, RCC).
Bulldog98
Posts: 23
Joined: Fri Sep 21, 2007 3:29 pm

Post by Bulldog98 »

KS-Soft wrote:It looks like system is out of resources.
Okay, I have now installed KS HostMonitor on it's very own management box. The server it is running on is 4 X Dual Core Xeon 3.16Ghz with 8GB of RAM and on a 1GB Network.

This server is located within a management network which is in our datacentre at a remote location. I have KSHM installed as a service and we are connecting to KSHM using the RCI over our VPN from a remote location.

The KSHM Process is using just 392Mb of memory and on average around 1% of CPU. Network utilisation is around 2.5%. Approx. 3300 Handles, 0 GDI Objects and 0 User Objects.

Since installing KSHM on this new box we do not receive any errors in the SysLog but we are still having performance issues when using the RCI.

The symptoms are as follows when using the RCI:

1) Slow performance and delayed responses when interacting with the user interface or tests.

2) Memory tests seem to all fail and display "Unknown" status now and again whilst all other tests remain working.

3) Other tests fail and return "Unknown" status now and again but when you perform a refresh on the individual tests they return their true status again.

Again, the load on our server is 28 tests a second and KS Reports that it should be able to cope easily with this load. We currently have 3882 tests running on KSHM.

Performance Counter Tests: 2495
Service Tests: 458
Ping Tests: 30
Http Tests: 18
UNC Tests: 461
URL Tests: 7
NT Event Log Tests: 101
CPU Usage Tests: 1
TCP Tests: 1

Most of the memory and cpu tests we have configured use the "Performance Counters" tests.

I have the threshold for tests allowed to be started per a second to 100 and RCI Refresh Rate is set at 60 per a second. We are monitoring just under 400 servers using KSHM.

I would appreciate any help or advice that you can give that will help us solve the issues we are experiencing.

Thanks
Bulldog98
Posts: 23
Joined: Fri Sep 21, 2007 3:29 pm

Post by Bulldog98 »

P.S. The alerts are now working again but we are still experiencing the performance issues.
KS-Soft Europe
Posts: 2832
Joined: Tue May 16, 2006 4:41 am
Contact:

Post by KS-Soft Europe »

Bulldog98 wrote:The KSHM Process is using just 392Mb of memory and on average around 1% of CPU. Network utilisation is around 2.5%. Approx. 3300 Handles, 0 GDI Objects and 0 User Objects.
0 GDI Objects and 0 User Objects mean that you have specified account for the HostMonitor's service using windows "Services" applet. We recommend to specify a special user account using the "Service" page in the "Options" dialog of the HostMonitor. Quote from the manual:
http://www.ks-soft.net/hostmon.eng/mfra ... tm#Service
==============
Note #1: When HostMonitor starts as a service, it uses the system account (as all interactive services). But this account may not have all the necessary permissions, so some tests will not work correctly (UNC test, "disk free space" test for shared drives, "CPU Usage" test for remote machines, etc). If you need these tests, you will need to assign a special user account on the Service page in the Options dialog. In this case HostMonitor will impersonate the security context of the user. Do not change the account using the system utility "Services". If you do so, HostMonitor may be unable to interact with the desktop.
==============
Bulldog98 wrote:Since installing KSHM on this new box we do not receive any errors in the SysLog but we are still having performance issues when using the RCI.

The symptoms are as follows when using the RCI:

1) Slow performance and delayed responses when interacting with the user interface or tests.
Do you mean it takes some time to modify test, refresh tests, etc?
Bulldog98 wrote:2) Memory tests seem to all fail and display "Unknown" status now and again whilst all other tests remain working.
As I understand, you are using "Performance Counters" test to check memory usage? What exact error do you see in "Reply" field? Actually, "Performance Counters" is heavy and not pretty reliable technology, so we recommend to use "WMI" test instead: http://www.ks-soft.net/hostmon.eng/mfra ... ts.htm#wmi
Bulldog98 wrote:3) Other tests fail and return "Unknown" status now and again but when you perform a refresh on the individual tests they return their true status again.
What exact error do you see in "Reply" field?
Bulldog98 wrote:I have the threshold for tests allowed to be started per a second to 100 and RCI Refresh Rate is set at 60 per a second.
Probably, you do mean 60 per minute? Or you have specified 3600 into "Status refresh rate:" box? Actually, 30 per minute should be enough, but everything depends on your network.

Regards,
Max
Bulldog98
Posts: 23
Joined: Fri Sep 21, 2007 3:29 pm

Post by Bulldog98 »

KS-Soft Europe wrote: As I understand, you are using "Performance Counters" test to check memory usage? What exact error do you see in "Reply" field? Actually, "Performance Counters" is heavy and not pretty reliable technology, so we recommend to use "WMI" test instead: http://www.ks-soft.net/hostmon.eng/mfra ... ts.htm#wmi
Am I right in thinking that WMI would require the following ports to be open on our management network? TCP 135, 445, UDP 445 and UDP 5000 to 6000.

Thanks
KS-Soft Europe
Posts: 2832
Joined: Tue May 16, 2006 4:41 am
Contact:

Post by KS-Soft Europe »

Bulldog98 wrote:Am I right in thinking that WMI would require the following ports to be open on our management network? TCP 135, 445, UDP 445 and UDP 5000 to 6000.
In general case, yes. However, if "Performance counters" test are working properly, "WMI" tests should work fine as well. There are might be a security issues, described in the following articles:
http://www.microforge.net/kb/39
http://www.microforge.net/kb/62

Regards,
Max
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

The symptoms are as follows when using the RCI:
1) Slow performance and delayed responses when interacting with the user interface or tests
Well, when you work with HostMonitor remotely, interface does work with some delay. RCC has to request data from HostMonitor, encrypt, transfer data, decrypt....
You may try to increase "Status refresh rate" option
www.ks-soft.net/hostmon.eng/rcc/index.htm#start
The symptoms are as follows when using the RCI:
2) Memory tests seem to all fail and display "Unknown" status now and again whilst all other tests remain working.
3) Other tests fail and return "Unknown" status now and again but when you perform a refresh on the individual tests they return their true status again.
As I said RCC does not perform tests, it does not execute alerts. So, this problem has nothing to do with RCC. RCC just shows what is going on on HostMonitor's system.

So, lets check that system. When HostMonitor cannot perform Performance Counter test and shows Unknown status, it should display error message in Reply field of the test. What message do you see there?

Performance Counters often work unreliable. That's why we have implemented several options. Quote from the manual
Performance Counter test related option

Test mode
Windows implementation of performance counters has bugs. E.g., Windows 2000 (Professional, Server, and Advanced Server editions) can produce memory leak in PDH.DLL when user (application) querying performance counter that does not exist. This bug fixed in SP2. Also PDH.DLL does not work correctly with multithread applications.
That's why in HostMonitor we have implemented several different methods to work with pdh.dll:

- MultiThread mode: HostMonitor works almost according to Microsoft documentation with some workaround to avoid most likely problems. HM loads pdh.dll at once and uses it all the time. This method fast because HM can start several tests simultaneously. If everything will work correctly on your system, use this method (by default HostMonitor uses this method).

- OneByOne mode: Using this method HM will start Performance Counter tests one by one This method is slow (when you setup Performance Counter test using Test Properties dialog program even can hang for 1-2 min) but using this method you may avoid some problems due to a buggy pdh.dll

- Smart mode: With this method HM will try to detect when pdh.dll has to be reloaded.

- External mode: HostMonitor uses external (perfobj.exe) utility to perform the tests. This is fast and most reliable method.

PerfObj utility is included into package. Remote Monitoring Agent (RMA) can use this utility as well (utility should be located in the same directory where RMA is installed).

If you want to change test method used by the agent, add line "PerfWorkMode=N" into [Misc] section of the rma.ini file and restart agent. N is a code of the mode:
0 - MultiThread mode;
1 - OneByOne mode;
2 - Smart mode;
3 - External mode
These options located on Misc page in the Options dialog. You have a lot of Performance Counter test items, so I would recommend to try External mode (OneByOne is not good in your case).
Another posible solution - use WMI test instead of Performance Counters.
Am I right in thinking that WMI would require the following ports to be open on our management network? TCP 135, 445, UDP 445 and UDP 5000 to 6000.
If you need to monitor hosts protected by firewall, we recommend to use Remote Monitoring Agent (RMA)
www.ks-soft.net\hostmon.eng\rma-win\index.htm

Regards
Alex
Bulldog98
Posts: 23
Joined: Fri Sep 21, 2007 3:29 pm

Post by Bulldog98 »

If you need to monitor hosts protected by firewall, we recommend to use Remote Monitoring Agent (RMA)
www.ks-soft.net\hostmon.eng\rma-win\index.htm

Regards
Alex
Due to the number of servers we have I would prefer not to install individual RMA Clients on each server. I was thinking along the lines of installing RMA on 1 blade within each of our Blade Centres, this would give us a total of 21 RMA Agents plus however many I decide to configure for the rackmount servers.

My main concern is performance impact, it is not an option to free up a blade in each chassis specifically for RMA and I am hoping to get away with installing RMA on one of the lesser used production blades. All tests related to any of the blades within this chassis would then point to the RMA Client on the blade in this chassis and then perform all the tests across all other blades contained within this chassis as well as itself.

Lets say we have a total of 473 tests per a chassis these consist of 385 performance monitor tests, 32 UNC tests and 56 service tests. The blade will be a quad core 1.67ghz xeon processor with 8gb RAM.

Current resource usage of the blades that I am thinking of utilising have an average of 5% cpu usage and 5.4gb free memory. Network utilisation is less than 1% and should not be an issue.

Just wondering whether you think that configuration would suffice and whether you believe the server could handle the RMA alongside its current workload.

Thanks
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

Due to the number of servers we have I would prefer not to install individual RMA Clients on each server. I was thinking along the lines of installing RMA on 1 blade within each of our Blade Centres,
Sure, RMA was designed to monitor entire network.
Lets say we have a total of 473 tests per a chassis these consist of 385 performance monitor tests, 32 UNC tests and 56 service tests. The blade will be a quad core 1.67ghz xeon processor with 8gb RAM
...
Just wondering whether you think that configuration would suffice and whether you believe the server could handle the RMA alongside its current workload
It depends on how often do you need to check each test item. E.g. if want to perform each test every 5 min, RMA will execute 473/60/5 = 1.5 tests per second (average load). Don't see any problems.

Regards
Alex
Bulldog98
Posts: 23
Joined: Fri Sep 21, 2007 3:29 pm

Post by Bulldog98 »

KS-Soft wrote:
It depends on how often do you need to check each test item. E.g. if want to perform each test every 5 min, RMA will execute 473/60/5 = 1.5 tests per second (average load). Don't see any problems.

Regards
Alex
A third of the tests will be checking every 60 seconds so 158/60/1 = 2.6 tests per second, a third will check every 5 minutes so 158/60/5 = 0.5 tests per a second and the other third will be checking every 15 minutes so 158/60/15 = 0.18 tests per a second. Total load would be 3.28 tests per a second.

What is the restrictive factor with RMA? (CPU? RAM? TCP? Network?) What hits should I look out for when I test this to make sure that it is not having a major performance impact on my system.

Many Thanks
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

Each test method works differently. UNC, Performance Counter and Service tests do not need a lot of memory, RMA/Windows will use some CPU/network resources but not much.

Regards
Alex
Post Reply