Windows 2003 RMA CPU Issue

All questions related to installations, configurations and maintenance of Advanced Host Monitor (including additional tools such as RMA for Windows, RMA Manager, Web Servie, RCC).
KS-Soft Europe
Posts: 2832
Joined: Tue May 16, 2006 4:41 am
Contact:

Post by KS-Soft Europe »

scott.carroll@brulant.com wrote:Any ideas on a solution?
Could you increase the test interval up to 2 minutes? Just for testing?
scott.carroll@brulant.com wrote:If I change it to what you mention above (to not send an alert if the status=unknown) is there also a way to keep that 5 bad then send alert setting?
Sure. You may use expression like the following:
('%SimpleStatus%'=='DOWN') and (%Recurrences%>4)
Please note: you have to disable "Treat Unknown as Bad" option to use foregoing expression. Quote from the manual:
http://www.ks-soft.net/hostmon.eng/mfra ... .htm#macro
============================
%SimpleStatus%
This macro may return one of the following text values:
* "UP" for good statuses (Host is Alive, Ok);
* "DOWN" for any bad status (No answer, Bad, Bad Contents); for Warning status when "Treat Warning as Bad" option is enabled; for Unknown status when "Treat Unknown as Bad" option is enabled
* "UNKNOWN" if status of the test is "Unknown" or "Unknown host" and "Treat Unknown as Bad" option is disabled; variable also returns "UNKNOWN" for statuses like WaitForMaster, OutOfSchedule, Paused
* "WARNING" when status of the test is "Warning" and "Treat Warning as Bad" option is disabled
============================
%Recurrences%
Similar to %CurrentStatusIteration% but this counter is resetted by HostMonitor when test status changes from "bad" to "good", from "good" to unknown or warning and vice versa. While changes between "Ok", "Host is alive" and "Normal" statuses do not reset the counter, changes between "Bad" and "No answer" do not reset the counter either. "Threat Unknown status as Bad" and "Treat Warning status as Bad" options determine behaviour of the counter for Unknown and Warning statuses.
============================

Regards,
Max
scott.carroll@brulant.com
Posts: 29
Joined: Fri Dec 29, 2006 10:17 am

Post by scott.carroll@brulant.com »

I actually increased the test interval to 5 minutes yesterday and fewer alerts are coming in, but I still get them once in a while. I'm assuming that the monitor just isnt' catching them now since I upped the interval from 1 minute to 5 minutes.

Where can I find the Treat Unknown as Bad" option? This option would actually solve all of my problems and I wouldn't even have to use the expression you mentioned because all I want to do is stop the alerts from monitors with the status of "Unknown." I thought it might be in the individual monitor itself, but all I don't see it. Is it in the options somewhere?[/img]
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

Yes, I am only testing the local machine that the RMA agent is installed on and yes, I use <local computer> instead of a hostname or IP.
That's strange. When you check remote system, there are many different reasons that may cause test to fail - permission issue, firewall, network problem, non-running services (Remote Registry Service), etc. However none of this should make any problems when you check <local system>.
Something is wrong with your systems. Sorry, I cannot say what exactly is wrong. May be this issue is related to 3rd party software... usually antivirus real-time monitors and personal firewalls lead to mysterious problems. Do you have installed any?
Where can I find the Treat Unknown as Bad" option?
Test Properties dialog, "Optional status processing" section at the bottom of the dialog. If you do not see "Treat Unknown as Bad" option below "Reverse alert" option, click on [+] sign to expand this section.

Regards
Alex
scott.carroll@brulant.com
Posts: 29
Joined: Fri Dec 29, 2006 10:17 am

Post by scott.carroll@brulant.com »

I'm not sure how it could be the server since the same exact issue is happening on 6 different boxes. Maybe there is a bug in the RMA for Windows Server 2003 Enterprise SP1 and SP2?

Anyways, I'm going to try to disable the "Warning=Bad" option and that should do the trick. Then all of the monitors will still properly work and we just won't see the unnecessary alerts for when something is "unknown."

I'll let you know if I have any more problems and thanks so much for your help!
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

RMA uses API that is common for any Windows. CPU Usage test for local system uses RegQueryValueEx function
http://msdn2.microsoft.com/en-us/library/ms724911.aspx
it depends on several Performance Counters DLLs however I am sure these DLLs are enabled on the systems, otherwise RMA would return error every time, not just sometimes.

So, do you have installed some anivirus monitors or personal firewalls on these systems? Any software that is common for these 6 systems and installed only there?
BTW What about resource usage on the box? What is the total number of handles used by all processes? How many handles are used by RMA?

Regards
Alex
scott.carroll@brulant.com
Posts: 29
Joined: Fri Dec 29, 2006 10:17 am

Post by scott.carroll@brulant.com »

We do have McAfee Enterprise installed on all of these servers, but we were actually having problems prior to the installation of it on them. Other than that, we are using typical Windows 2003 Server apps as well as Commerce and DB programs.

The resource usage on these boxes is actually pretty low at the moment (they are not live yet), so that shouldn't be a factor and rma.exe's handle count is at 7,312 and the overall box itself isn't too bad (well under 15k).

Thanks again for your help! I disabled the unknown=bad option and hopefully that takes care of our problem.
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

H'm.. normally RMA should use less resources. Do you use this RMA to perform ODBC Query tests? If yes, what ODBC driver do you use?

Regards
Alex
scott.carroll@brulant.com
Posts: 29
Joined: Fri Dec 29, 2006 10:17 am

Post by scott.carroll@brulant.com »

No ODBC requests. Here are the requests that are on each of these servers:

Disk space for each drive
CPU
Several service and process checks
Memory
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

May be unreleable CPU test results and high number of handles - 2 effects of the same problem :roll:
Is # of handles so high on each of these 6 systems? What about other systems where agents are installed?
Could you try to disable CPU test for 1 system, restart agent on that system, restart agent on another system without disabling CPU tests and watch handles used by agents for a while on both systems?

Regards
Alex
scott.carroll@brulant.com
Posts: 29
Joined: Fri Dec 29, 2006 10:17 am

Post by scott.carroll@brulant.com »

Yes, the handle count is that high on all of the servers and when I restart the RMA service the handles dip down really low (about 100), but slowly creep back up. I guess I could schedule daily/weekly service restarts, but that seems unnecessary at this point because the other fix eliminated the unnecessary alerts and we are good to go.

Thanks for all your help!
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

Leaking Handles will lead to other problems. Its better to investigate what test method cause this problem...

Regards
Alex
scott.carroll@brulant.com
Posts: 29
Joined: Fri Dec 29, 2006 10:17 am

Post by scott.carroll@brulant.com »

I looked into it a little more and only 2 of the servers that I use RMA on have the high handle counts - all of the other servers had handle counts of 250 or less.

The two servers that do have high handle counts seem to stay in the mid 7000s, but I'm not really sure what would be causing it on these servers only. They are both Windows Server 2003 Enterprise, but one is R2 SP2 and the other is just SP2. They both have McAfee installed and running on them, but so do all the other servers that aren't having any problems. One is running SQL 2007 and the other is running IIS and Microsoft Commerce 2007.

We haven't had any monitor issues since I made the changes we discussed previously, but any ideas for the high handles on these two servers? If you think it might be a problem then I'd like to try and troubleshoot.

Thanks so much!
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

Some ODBC drivers leads to resource leakage, but you do not use ODBC Query test method.
Antiviruses (Symantec, McAfee) often lead to similar problems. Antivirus installs modules that work in application address space and uses application resources. So resource leakage in antivirus DLL looks like resource leakage in application.
However such problem usually appears when application (HostMonitor, RMA or some other software) uses SMTP or POP3 protocols (antiviruses just love to check your mail). While Disk Free Space, CPU Usage and Performance Counter test methods should not get antivirus attention.
So probably this problem is caused by something else :roll: If you can disable some tests for a while, this may help to find the problem. E.g. you may restart both RMAs that use a lot of handles and disabe some tests just for one of them so we can compare resource usage for 2 agents. E.g. you may disable Memoty test. I assume you are using Performance Counter test method to check memory usage?

Regards
Alex
scott.carroll@brulant.com
Posts: 29
Joined: Fri Dec 29, 2006 10:17 am

Post by scott.carroll@brulant.com »

I've been watching the handles a little more and they appear to fluctuate rather than rise and stay there. Do the handles rise sharply when a test is done? Each RMA runs between 8 and 10 tests every 5 minutes, so maybe the handles increase when the tests are done and then drop again when the tests are over? It makes sense and if that's the case then we're all set. :-) If it's not, then maybe we can proceed with disabling certain monitors to see if that helps (these servers are live, however, so they can not be disabled for long).
Dubolomov
Posts: 214
Joined: Thu Jun 01, 2006 10:27 am
Location: Russia

Post by Dubolomov »

Hi!
I had the same problem with hi CPU loading coused by rma. Agent checks security logs on this Win'2003 machine. Yesterday i changed security properties for logging all event types from all event source. Maximum log size is set to 13120Kb. After that rma.exe with services.exe processes had 100% of CPU usage when HM trying to check NT Event Log on this server.
AD controller where rma is installed is Win'2003 Enterprise SP2, dual 2.4GHz processors, 1Gb of RAM.
So now i must stop checking security log. Is it possible to minimise CPU usage testing NT Event logs?
Post Reply