Hostmonitor Sizing Guide?

All questions related to installations, configurations and maintenance of Advanced Host Monitor (including additional tools such as RMA for Windows, RMA Manager, Web Servie, RCC).
Post Reply
mp1
Posts: 199
Joined: Tue Mar 07, 2006 3:24 am

Hostmonitor Sizing Guide?

Post by mp1 »

Hi,

We have at the moment about 4.000 checks configured and a "load" from 42 tests/sec. We already have optimized the checks and intervals as much as we can or want ;-).

We want to add more checks, altough we don't want to increase the check interval.

We already have moved checks to different RMA on other systems.
Does this help the hostmonitor systems?

What are the recommendations for a larger sizing?
Add a 2nd instance from hostmonitor?

Thanks in advance

Martin
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

It depends on what exactly tests, test settings, logging settings, system do you use.
E.g. 42 ping tests /sec - I do not see any problems. If HostMonitor can perform ping tests (e.g. local LAN tests) then system load will increase when you setup agent to perform these tests.

Does Auditing Tool shows some errors/warnings so you want to setup 2nd HostMonitor?

Regards
Alex
mp1
Posts: 199
Joined: Tue Mar 07, 2006 3:24 am

Post by mp1 »

It depends on what exactly tests, test settings, logging settings, system do you use.
E.g. 42 ping tests /sec - I do not see any problems. If HostMonitor can perform ping tests (e.g. local LAN tests) then system load will increase when you setup agent to perform these tests.

Does Auditing Tool shows some errors/warnings so you want to setup 2nd HostMonitor?
The Auditing Tool doesn't shows any errors.
We have different checks, a lot of Shell Scripts (4/sec), URL checks (4/sec), CPU usage (5/sec), Service, TCP and so on.

What we can see, we we add more tests and the load will increase to 45 or more tests/sec, then Hostmonitor will get problems with false postives. In our case, I would say that a load from about 42 tests/sec is the maximum.

So to move checks to different agents on other systems doesn't help for the load of the hostmonitor systems? If you move the checks, then hostmonitor should only get the results from the agents - or?

Is it possibile to add a 2nd instance, that will use the "configuration" (alert profiles etc. ...) from the first instance?

Regards,

Martin
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

then Hostmonitor will get problems with false postives
What exactly means false postives?
Test method?
Status?
Reply?

Windows?
Service Pack?
Antivirus monitor?
Do you use ODBC logging or ODBC test method? If yes, what ODBC driver do you use?

Could you check resource usage for each process? You may use standard Windows Task Manager to check Handles, GDI and USER objects. What is the total resource usage on the system? How many handles/threads/GDI objects used by hostmon.exe process?

So to move checks to different agents on other systems doesn't help for the load of the hostmonitor systems?
As I said this depends on test methods.
Ping tests, TCP tests - this will not help.
Shell Scripts - yes, this will help.
CPU Usage - will help
Is it possibile to add a 2nd instance, that will use the "configuration" (alert profiles etc. ...) from the first instance?
You may copy ALL *.LST and *.INI files to 2nd system.
If you plan to copy (synchronize) profiles on regular basis, then DO NOT modify settings on 2nd system. E.g.
- maintain 2 different test lists on 2 systems
- modify actions, schedules, reports, etc only on 1st system then copy files to 2nd system (copy ALL LST files at one!).

Regards
Alex
mp1
Posts: 199
Joined: Tue Mar 07, 2006 3:24 am

Post by mp1 »

What exactly means false postives?
Test method?
Status?
Reply?

Windows?
Service Pack?
Antivirus monitor?
Do you use ODBC logging or ODBC test method? If yes, what ODBC driver do you use?
We will get timeouts with URL requests, script execution errors, often probelms with performance counter checks. We don't have this errors, when we decrease the tests per second.

Our System:

Exclusiv Physical System (IBM Server, 24 GB RAM, Intel E5620 2,4 GHz)
Windows Server 2008 R2 (latest patches)
No AntiVirus installed
No ODBC Logging configured
No other software installed, only a backup client
How many handles/threads/GDI objects used by hostmon.exe process?
handles about 11000, threads 150, GDI objects 0(?)

Our estimated workload:

PING 20/sec
CPU usage 5/sec
Shell Script 4/sec
URL request 4/sec
TCP 2/sec
Service 2/sec
WMI 1/sec
Perfromance 59/min
....
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

We will get timeouts with URL requests, script execution errors, often probelms with performance counter checks. We don't have this errors, when we decrease the tests per second.
What EXACTLY status and reply do you see?
handles about 11000, threads 150, GDI objects 0(?)
11,000 handles by Hostmon.exe process? Too much. Something wrong...
PING 20/sec
CPU usage 5/sec
Shell Script 4/sec
URL request 4/sec
TCP 2/sec
Service 2/sec
WMI 1/sec
Perfromance 59/min
....
These tests should not lead to leaks. However Shell Script may cause high load (especially if your script requires 4 or more seconds to be executed).
"...." means there are some other test methods? What exactly?

Regards
Alex
mp1
Posts: 199
Joined: Tue Mar 07, 2006 3:24 am

Post by mp1 »

What EXACTLY status and reply do you see?
It's different from the check type:

status: unknown
reply: Timed out (Shell Script - timeout already set to 0)
reply: RMA: 301 - Script execution error
status: no answer (URL checks)


It's only from time to time and the next check is again ok.
This problem we have with many different checks.
Especially with the URL checks, the "no answer" reply we don't understand. If we check the URL at the same time with the browser, all is ok, also after a refresh the check is ok.
11,000 handles by Hostmon.exe process? Too much. Something wrong..
hmm 11,000 handles shows the dominant process check, if I look in the taskmanager I will see only 10 handles?

further test methods we use:

HTTP 59/min
UNC 58/min
DNS 44/min
LDAP 44/min
ODBC 28/min
SNMP 23/min
Count Files 18/min
Oracle 17/min
Folder/file 15/min
Dominant Process 8/min
Process 7/min
Text Log 6/min
NT Event Log 5/min
Folder/File size 2/min
Mailrealy 1,5/min
and some further tests

Thanks for the quick reply ...
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

reply: Timed out (Shell Script - timeout already set to 0)
This means script did not finished within 15 min.
status: no answer (URL checks)
This means no answer within specified timeout. May be timeout too short?
hmm 11,000 handles shows the dominant process check, if I look in the taskmanager I will see only 10 handles?
10 handles for HostMon.exe process? Impossible.
Please send screen shot to support@ks-soft.net
ODBC 28/min
Oracle 17/min
I have asked if you are using ODBC logging ot ODBC tests and what ODBC driver do you use.
Oracle makes terrible ODBC drivers with thousand bugs, it may lead to various problems, so try to disable these tests or move to RMA installed on different system.
What exactly Oracle client and ODBC driver do you use? Version?

Regards
Alex
Post Reply