WMI Checks - 'flapping' between Good and Unknown

All questions related to installations, configurations and maintenance of Advanced Host Monitor (including additional tools such as RMA for Windows, RMA Manager, Web Servie, RCC).
Post Reply
jivetolkein
Posts: 96
Joined: Thu Jul 19, 2007 4:35 am

WMI Checks - 'flapping' between Good and Unknown

Post by jivetolkein »

Strange strange problem - seen it a few times but now its almost consistent:

Running a couple of scripts (VB, WMI queries) and a normal WMI method tesat via an RMA Agent. The checks flap between an OK state and an Unknown state every minute, despite having worked OK in the past. The Unknown states give errors of 'Error: Script returns no results' and the WMI test returns 'Not enough storage to perform this operation'
The tests will then run OK shortly afterward.

Was coincidentally getting popup errors on the server running the RMA with the following error:

Application popup: cmd.exe - Application Error : The application failed to initialize properly (0xc0000142). Click on OK to terminate the application.

.. giving an infinite loop of OK pressing unless cscript.exe processes were all killed - upgrading the RMA to the version in 7.10 APPEARS to have solved this bit of the problem though.

It's a bit of a show stopper - I have a view for our service desk of just problems, but these checks pop in and out of the view as they go unknown and then cure themselves. ANy ideas? I can forward a screen shot of the quick log of a test if it'd help.
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

May be there is resource leak caused by your script or some WMI class.
E.g. there is memory leak caused by Win32_NetworkAdapter class on Windows XP http://support.microsoft.com/kb/824262

Could you please provide more information?
- What Windows do you use?
- Service Pack?
- What exactly WMI objects do you check?
- What objects are used by your scripts?
- Could you check resource usage for each process? You may use standard Windows Task Manager to check Handles, GDI and USER objects. What the total resource usage? What process uses the most of resources?

Regards
Alex
KS-Soft Europe
Posts: 2832
Joined: Tue May 16, 2006 4:41 am
Contact:

Post by KS-Soft Europe »

In addition to Alex's requests, could you also answer to my questions:
1. What exact test method do you use to execute scripts? "Shell Script"? "Active Script"?

2. What exact value is specified in "Don't start more than [N] tests per second" box in "Behavior" page of the Options dialog?

3. Could you try to enable "Non-simultaneously test execution" folder-level option for the folder, where the most of scripts are located?
http://www.ks-soft.net/hostmon.eng/mfra ... FolderTree

Regards,
Max
jivetolkein
Posts: 96
Joined: Thu Jul 19, 2007 4:35 am

Post by jivetolkein »

Server is W2k3 SP1, as is server running RMA - clients are a mix of W2K SP4 upwards. No XP, some Linux but obviously they don't count here.

Error occurs not just with my scripts (could well believe they were the problem else) but also with the WMI method test - here's the edited test:

Method = WMI

RMAgent = ******
Title = servername - Free Memory (WMI)
RelatedURL =
ScheduleMode= Regular
Schedule =
Interval = 900
Alerts = Default - Mail to ********
ReverseAlert= No
UnknownIsBad= Yes
WarningIsBad= Yes
UseCommonLog= Yes
PrivLogMode = Default
CommLogMode = Default
SyncCounters= Yes
SyncAlerts = No
DependsOn = list
MasterTest-Alive = ???????????
;--- Test specific properties ---
Host = BG0014.eu.schering.net
NameSpace = root\cimv2
Query = select FreePhysicalMemory from Win32_OperatingSystem
SumMode = any ...
CompareMode = LessThan
CompareVal = 32000
NoResStatus = Unknown


The scripts run as 'Shell Script' method as I pass the servername as a parameter (they are adaptations of old scripts)

32 Test/second max, running as a VM on ESX 3.01, quad core Xeon with 8Gb RAM (not heavily loaded) - could lower this, but we're registering 2 /second as the load.

I'll try enabling non simutaneous next time


Obviously, as soon as I posted it went away.. had been trying service restarts on what I could, but can't pin down what made it feel better.

Only other clue is it looks like the HM server had recently been patched.. with what I'd have to look into....

PS - Sorry Alex, but where should I look? HM server or RMA server running checks?
KS-Soft Europe
Posts: 2832
Joined: Tue May 16, 2006 4:41 am
Contact:

Post by KS-Soft Europe »

According to the Microsoft, to work around this problem, follow these steps:
1. At a command prompt, type REGEDT32.EXE to start Registry Editor.
2. In Registry Editor, locate the following registry key:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\SubSystems
3. In the right pane of Registry Editor, click Windows.
4. On the Edit menu, click Modify.
5. In the Edit String dialog box, locate the SharedSection parameter string in the Value data box, and then specify a larger value for the SharedSection parameter.

Note The SharedSection parameter specifies the system and desktop by using the following format, where <xxxx> defines the maximum size of the system-wide heap (in kilobytes), <yyyy> defines the size of the per desktop heap, and <zzzz> is the size of the desktop heap for each desktop that is associated with a non-interactive Windows station:
SharedSection=<xxxx>,<yyyy>,<zzzz>
6. Click OK.
If you increase the non-interactive desktop heap (third parameter) by 512 KB or by 1024 KB, it typically provides sufficient memory to resolve the problem.

Warning: If you use Registry Editor incorrectly, you may cause serious problems that may require you to reinstall your operating system. we cannot guarantee that you can solve problems that result from using Registry Editor incorrectly. Use Registry Editor at your own risk.

http://support.microsoft.com/kb/126962


Regards,
Max
jivetolkein
Posts: 96
Joined: Thu Jul 19, 2007 4:35 am

Post by jivetolkein »

Interesting - I've upped it from 512 to 1024 - can't see it doing any harm, there's 3Gb on that box so should be a bit spare.

I'll go with that a for week and see how it fares.
KS-Soft Europe
Posts: 2832
Joined: Tue May 16, 2006 4:41 am
Contact:

Post by KS-Soft Europe »

I think, it worth to increase the size of the per desktop heap (second value) of the SharedSection parameter also.

Regards,
Max
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

Error occurs not just with my scripts (could well believe they were the problem else) but also with the WMI method test - here's the edited test:
I understand this. However if some test/process took too many of system resources, this may cause problems for other applications as well.
Could you check resource usage for each process?
...
PS - Sorry Alex, but where should I look? HM server or RMA server running checks?
If tests are performed by RMA, then we should look for the problem on system where RMA is running or on target system. If RMA checks several remote systems and all (several) systems return error, then we should looks for the problem on system where RMA is running.

Regards
Alex
jivetolkein
Posts: 96
Joined: Thu Jul 19, 2007 4:35 am

Post by jivetolkein »

OK, it has remained stable overnight :)
KS-Soft Europe wrote:I think, it worth to increase the size of the per desktop heap (second value) of the SharedSection parameter also.

Regards,
Max
Done - pushed upto 6144 on the RMA server.

Checking taskmgr onthis server, I have lots (34 at the moment) of cscript.exe images running in the context of my RMAs account.

Each uses 5.6Mb of RAM, 2 USER objects and 5 GDI objects and appox. 120 handles (varies up and down by 1 or 2).

Does this show my scripts not closing/ending properly, and eating all the resources on the RMA server? I'm using wscript.quit to end the script (when the result is output), and they are called by shell script method like this:

cmd /c cscript /E:VBScript %Script% %Params%

.. so I can pass the target as a parameter (I duplicate tests by fqdn in replicator).
jivetolkein
Posts: 96
Joined: Thu Jul 19, 2007 4:35 am

Post by jivetolkein »

PS - sorry for the daft questions regarding the server - I was sure it was the RMA box all the way through just from plain common sense, but I'm a total hacker (NOT as in l33t ;-)) at VBScript/WMI, without a real understanding of how it works, so I want to be sure I'm not pointing out irrelevances to you guys.
KS-Soft Europe
Posts: 2832
Joined: Tue May 16, 2006 4:41 am
Contact:

Post by KS-Soft Europe »

jivetolkein wrote:OK, it has remained stable overnight :)
Good news indeed. :-)
jivetolkein wrote:Checking taskmgr onthis server, I have lots (34 at the moment) of cscript.exe images running in the context of my RMAs account.
Is the number of cscript.exe instances growing up? Or it does become stable?
jivetolkein wrote:I'm using wscript.quit to end the script (when the result is output), and they are called by shell script method like this:
Actually, wscript.quit is used to return the specified exit code. If the script does not return specific exit code, you do not need to use wscript.quit statement. However, I do not think the problem is related to the script. Looks Like Windows does not want to close cscript.exe process after script is performed. Hm..
What exact timeout is specified in "Test Properties" window of the "Shell Script" test method? Probably, you have to increase this timeout to ensure the script is performed within this timeout?
jivetolkein wrote:.. so I can pass the target as a parameter (I duplicate tests by fqdn in replicator).
I would suggest you to try "Active Script" test method. It should help. As you know, "Active Script" test method supports macro variables, so you may specify parameters, you want to pass to the script, into "Comment" field and use %CommentLine1%, %CommentLine2%, etc. variables within the script body. You may find useful examples in Examples\Scripts subfolder of HostMonitor's folder.
http://www.ks-soft.net/hostmon.eng/mfra ... htm#script

Regards,
Max
jivetolkein
Posts: 96
Joined: Thu Jul 19, 2007 4:35 am

Post by jivetolkein »

I've killed all cscripts, and they are now coming back slowly, 4 there now - I'll leave it as long as I can, probably till Monday as I've a day off tomorrow to see how high it gets.

I'm allowing 20 seconds for the scripts to run, normally they return a good result in 2 seconds.. but maybe it's not getting a response that causes the orphaning and increas in cscript instances?

I'll try the Active Script method - had problems making it work before, but I have a small site I could run as a test case instead of shell script, just drop some UNC checks and critical service checks in to cover myself.
jivetolkein
Posts: 96
Joined: Thu Jul 19, 2007 4:35 am

Post by jivetolkein »

OK, I've seen the order of events now - its related to a Shell Script method check not getting a response.

Check is started by HM, sent to the RMA.
On the RMA, in Task manager, I see a cmd.exe and a cscript.exe spawned
Eventually, the check fails (hits time out) (Unknown state rather than bad)
The cmd.exe despawns, the cscript.exe remains behind.

I've tried adding wscript.timeout=x into the scripts and making them less than the Host Monitor timeout, but the cscript.exe image is left orphaned

Must be something scriptwise - any sure fire ways of killing the scripts after a certain time??
jivetolkein
Posts: 96
Joined: Thu Jul 19, 2007 4:35 am

Post by jivetolkein »

Problem is definitely script related, can reproduce it without HM, RMA or anything else. Back to the drawing board :(

CSCRIPT.EXE isn't closing when a server is there but not responding properly.

Launching the script with '/T:x' also doesn't kill the processes off after the specified timeout.wscript.timeout doesn't kill it. The WMI get seems to drop it into a blackhole, so I think even doping a start time, check time type of addtion to the script won't work.

Looking for a global cscript timeout regkey if anyone knows off one? Or any ideas why cscript isn't taking any notice of the T option - does the same for me launching the script from a cmd prompt on my workstation.

In the meantime I've got HM to check the RMA hosts for excessive cscripts... it's only a problem with a server that is only 'semi alive'.
Post Reply