WMI Checks - 'flapping' between Good and Unknown
-
- Posts: 96
- Joined: Thu Jul 19, 2007 4:35 am
WMI Checks - 'flapping' between Good and Unknown
Strange strange problem - seen it a few times but now its almost consistent:
Running a couple of scripts (VB, WMI queries) and a normal WMI method tesat via an RMA Agent. The checks flap between an OK state and an Unknown state every minute, despite having worked OK in the past. The Unknown states give errors of 'Error: Script returns no results' and the WMI test returns 'Not enough storage to perform this operation'
The tests will then run OK shortly afterward.
Was coincidentally getting popup errors on the server running the RMA with the following error:
Application popup: cmd.exe - Application Error : The application failed to initialize properly (0xc0000142). Click on OK to terminate the application.
.. giving an infinite loop of OK pressing unless cscript.exe processes were all killed - upgrading the RMA to the version in 7.10 APPEARS to have solved this bit of the problem though.
It's a bit of a show stopper - I have a view for our service desk of just problems, but these checks pop in and out of the view as they go unknown and then cure themselves. ANy ideas? I can forward a screen shot of the quick log of a test if it'd help.
Running a couple of scripts (VB, WMI queries) and a normal WMI method tesat via an RMA Agent. The checks flap between an OK state and an Unknown state every minute, despite having worked OK in the past. The Unknown states give errors of 'Error: Script returns no results' and the WMI test returns 'Not enough storage to perform this operation'
The tests will then run OK shortly afterward.
Was coincidentally getting popup errors on the server running the RMA with the following error:
Application popup: cmd.exe - Application Error : The application failed to initialize properly (0xc0000142). Click on OK to terminate the application.
.. giving an infinite loop of OK pressing unless cscript.exe processes were all killed - upgrading the RMA to the version in 7.10 APPEARS to have solved this bit of the problem though.
It's a bit of a show stopper - I have a view for our service desk of just problems, but these checks pop in and out of the view as they go unknown and then cure themselves. ANy ideas? I can forward a screen shot of the quick log of a test if it'd help.
May be there is resource leak caused by your script or some WMI class.
E.g. there is memory leak caused by Win32_NetworkAdapter class on Windows XP http://support.microsoft.com/kb/824262
Could you please provide more information?
- What Windows do you use?
- Service Pack?
- What exactly WMI objects do you check?
- What objects are used by your scripts?
- Could you check resource usage for each process? You may use standard Windows Task Manager to check Handles, GDI and USER objects. What the total resource usage? What process uses the most of resources?
Regards
Alex
E.g. there is memory leak caused by Win32_NetworkAdapter class on Windows XP http://support.microsoft.com/kb/824262
Could you please provide more information?
- What Windows do you use?
- Service Pack?
- What exactly WMI objects do you check?
- What objects are used by your scripts?
- Could you check resource usage for each process? You may use standard Windows Task Manager to check Handles, GDI and USER objects. What the total resource usage? What process uses the most of resources?
Regards
Alex
-
- Posts: 2832
- Joined: Tue May 16, 2006 4:41 am
- Contact:
In addition to Alex's requests, could you also answer to my questions:
1. What exact test method do you use to execute scripts? "Shell Script"? "Active Script"?
2. What exact value is specified in "Don't start more than [N] tests per second" box in "Behavior" page of the Options dialog?
3. Could you try to enable "Non-simultaneously test execution" folder-level option for the folder, where the most of scripts are located?
http://www.ks-soft.net/hostmon.eng/mfra ... FolderTree
Regards,
Max
1. What exact test method do you use to execute scripts? "Shell Script"? "Active Script"?
2. What exact value is specified in "Don't start more than [N] tests per second" box in "Behavior" page of the Options dialog?
3. Could you try to enable "Non-simultaneously test execution" folder-level option for the folder, where the most of scripts are located?
http://www.ks-soft.net/hostmon.eng/mfra ... FolderTree
Regards,
Max
-
- Posts: 96
- Joined: Thu Jul 19, 2007 4:35 am
Server is W2k3 SP1, as is server running RMA - clients are a mix of W2K SP4 upwards. No XP, some Linux but obviously they don't count here.
Error occurs not just with my scripts (could well believe they were the problem else) but also with the WMI method test - here's the edited test:
Method = WMI
RMAgent = ******
Title = servername - Free Memory (WMI)
RelatedURL =
ScheduleMode= Regular
Schedule =
Interval = 900
Alerts = Default - Mail to ********
ReverseAlert= No
UnknownIsBad= Yes
WarningIsBad= Yes
UseCommonLog= Yes
PrivLogMode = Default
CommLogMode = Default
SyncCounters= Yes
SyncAlerts = No
DependsOn = list
MasterTest-Alive = ???????????
;--- Test specific properties ---
Host = BG0014.eu.schering.net
NameSpace = root\cimv2
Query = select FreePhysicalMemory from Win32_OperatingSystem
SumMode = any ...
CompareMode = LessThan
CompareVal = 32000
NoResStatus = Unknown
The scripts run as 'Shell Script' method as I pass the servername as a parameter (they are adaptations of old scripts)
32 Test/second max, running as a VM on ESX 3.01, quad core Xeon with 8Gb RAM (not heavily loaded) - could lower this, but we're registering 2 /second as the load.
I'll try enabling non simutaneous next time
Obviously, as soon as I posted it went away.. had been trying service restarts on what I could, but can't pin down what made it feel better.
Only other clue is it looks like the HM server had recently been patched.. with what I'd have to look into....
PS - Sorry Alex, but where should I look? HM server or RMA server running checks?
Error occurs not just with my scripts (could well believe they were the problem else) but also with the WMI method test - here's the edited test:
Method = WMI
RMAgent = ******
Title = servername - Free Memory (WMI)
RelatedURL =
ScheduleMode= Regular
Schedule =
Interval = 900
Alerts = Default - Mail to ********
ReverseAlert= No
UnknownIsBad= Yes
WarningIsBad= Yes
UseCommonLog= Yes
PrivLogMode = Default
CommLogMode = Default
SyncCounters= Yes
SyncAlerts = No
DependsOn = list
MasterTest-Alive = ???????????
;--- Test specific properties ---
Host = BG0014.eu.schering.net
NameSpace = root\cimv2
Query = select FreePhysicalMemory from Win32_OperatingSystem
SumMode = any ...
CompareMode = LessThan
CompareVal = 32000
NoResStatus = Unknown
The scripts run as 'Shell Script' method as I pass the servername as a parameter (they are adaptations of old scripts)
32 Test/second max, running as a VM on ESX 3.01, quad core Xeon with 8Gb RAM (not heavily loaded) - could lower this, but we're registering 2 /second as the load.
I'll try enabling non simutaneous next time
Obviously, as soon as I posted it went away.. had been trying service restarts on what I could, but can't pin down what made it feel better.
Only other clue is it looks like the HM server had recently been patched.. with what I'd have to look into....
PS - Sorry Alex, but where should I look? HM server or RMA server running checks?
-
- Posts: 2832
- Joined: Tue May 16, 2006 4:41 am
- Contact:
According to the Microsoft, to work around this problem, follow these steps:
1. At a command prompt, type REGEDT32.EXE to start Registry Editor.
2. In Registry Editor, locate the following registry key:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\SubSystems
3. In the right pane of Registry Editor, click Windows.
4. On the Edit menu, click Modify.
5. In the Edit String dialog box, locate the SharedSection parameter string in the Value data box, and then specify a larger value for the SharedSection parameter.
Note The SharedSection parameter specifies the system and desktop by using the following format, where <xxxx> defines the maximum size of the system-wide heap (in kilobytes), <yyyy> defines the size of the per desktop heap, and <zzzz> is the size of the desktop heap for each desktop that is associated with a non-interactive Windows station:
SharedSection=<xxxx>,<yyyy>,<zzzz>
6. Click OK.
If you increase the non-interactive desktop heap (third parameter) by 512 KB or by 1024 KB, it typically provides sufficient memory to resolve the problem.
Warning: If you use Registry Editor incorrectly, you may cause serious problems that may require you to reinstall your operating system. we cannot guarantee that you can solve problems that result from using Registry Editor incorrectly. Use Registry Editor at your own risk.
http://support.microsoft.com/kb/126962
Regards,
Max
1. At a command prompt, type REGEDT32.EXE to start Registry Editor.
2. In Registry Editor, locate the following registry key:
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\Session Manager\SubSystems
3. In the right pane of Registry Editor, click Windows.
4. On the Edit menu, click Modify.
5. In the Edit String dialog box, locate the SharedSection parameter string in the Value data box, and then specify a larger value for the SharedSection parameter.
Note The SharedSection parameter specifies the system and desktop by using the following format, where <xxxx> defines the maximum size of the system-wide heap (in kilobytes), <yyyy> defines the size of the per desktop heap, and <zzzz> is the size of the desktop heap for each desktop that is associated with a non-interactive Windows station:
SharedSection=<xxxx>,<yyyy>,<zzzz>
6. Click OK.
If you increase the non-interactive desktop heap (third parameter) by 512 KB or by 1024 KB, it typically provides sufficient memory to resolve the problem.
Warning: If you use Registry Editor incorrectly, you may cause serious problems that may require you to reinstall your operating system. we cannot guarantee that you can solve problems that result from using Registry Editor incorrectly. Use Registry Editor at your own risk.
http://support.microsoft.com/kb/126962
Regards,
Max
-
- Posts: 96
- Joined: Thu Jul 19, 2007 4:35 am
-
- Posts: 2832
- Joined: Tue May 16, 2006 4:41 am
- Contact:
I understand this. However if some test/process took too many of system resources, this may cause problems for other applications as well.Error occurs not just with my scripts (could well believe they were the problem else) but also with the WMI method test - here's the edited test:
If tests are performed by RMA, then we should look for the problem on system where RMA is running or on target system. If RMA checks several remote systems and all (several) systems return error, then we should looks for the problem on system where RMA is running.Could you check resource usage for each process?
...
PS - Sorry Alex, but where should I look? HM server or RMA server running checks?
Regards
Alex
-
- Posts: 96
- Joined: Thu Jul 19, 2007 4:35 am
OK, it has remained stable overnight
Checking taskmgr onthis server, I have lots (34 at the moment) of cscript.exe images running in the context of my RMAs account.
Each uses 5.6Mb of RAM, 2 USER objects and 5 GDI objects and appox. 120 handles (varies up and down by 1 or 2).
Does this show my scripts not closing/ending properly, and eating all the resources on the RMA server? I'm using wscript.quit to end the script (when the result is output), and they are called by shell script method like this:
cmd /c cscript /E:VBScript %Script% %Params%
.. so I can pass the target as a parameter (I duplicate tests by fqdn in replicator).

Done - pushed upto 6144 on the RMA server.KS-Soft Europe wrote:I think, it worth to increase the size of the per desktop heap (second value) of the SharedSection parameter also.
Regards,
Max
Checking taskmgr onthis server, I have lots (34 at the moment) of cscript.exe images running in the context of my RMAs account.
Each uses 5.6Mb of RAM, 2 USER objects and 5 GDI objects and appox. 120 handles (varies up and down by 1 or 2).
Does this show my scripts not closing/ending properly, and eating all the resources on the RMA server? I'm using wscript.quit to end the script (when the result is output), and they are called by shell script method like this:
cmd /c cscript /E:VBScript %Script% %Params%
.. so I can pass the target as a parameter (I duplicate tests by fqdn in replicator).
-
- Posts: 96
- Joined: Thu Jul 19, 2007 4:35 am
PS - sorry for the daft questions regarding the server - I was sure it was the RMA box all the way through just from plain common sense, but I'm a total hacker (NOT as in l33t
) at VBScript/WMI, without a real understanding of how it works, so I want to be sure I'm not pointing out irrelevances to you guys.

-
- Posts: 2832
- Joined: Tue May 16, 2006 4:41 am
- Contact:
Good news indeed.jivetolkein wrote:OK, it has remained stable overnight![]()

Is the number of cscript.exe instances growing up? Or it does become stable?jivetolkein wrote:Checking taskmgr onthis server, I have lots (34 at the moment) of cscript.exe images running in the context of my RMAs account.
Actually, wscript.quit is used to return the specified exit code. If the script does not return specific exit code, you do not need to use wscript.quit statement. However, I do not think the problem is related to the script. Looks Like Windows does not want to close cscript.exe process after script is performed. Hm..jivetolkein wrote:I'm using wscript.quit to end the script (when the result is output), and they are called by shell script method like this:
What exact timeout is specified in "Test Properties" window of the "Shell Script" test method? Probably, you have to increase this timeout to ensure the script is performed within this timeout?
I would suggest you to try "Active Script" test method. It should help. As you know, "Active Script" test method supports macro variables, so you may specify parameters, you want to pass to the script, into "Comment" field and use %CommentLine1%, %CommentLine2%, etc. variables within the script body. You may find useful examples in Examples\Scripts subfolder of HostMonitor's folder.jivetolkein wrote:.. so I can pass the target as a parameter (I duplicate tests by fqdn in replicator).
http://www.ks-soft.net/hostmon.eng/mfra ... htm#script
Regards,
Max
-
- Posts: 96
- Joined: Thu Jul 19, 2007 4:35 am
I've killed all cscripts, and they are now coming back slowly, 4 there now - I'll leave it as long as I can, probably till Monday as I've a day off tomorrow to see how high it gets.
I'm allowing 20 seconds for the scripts to run, normally they return a good result in 2 seconds.. but maybe it's not getting a response that causes the orphaning and increas in cscript instances?
I'll try the Active Script method - had problems making it work before, but I have a small site I could run as a test case instead of shell script, just drop some UNC checks and critical service checks in to cover myself.
I'm allowing 20 seconds for the scripts to run, normally they return a good result in 2 seconds.. but maybe it's not getting a response that causes the orphaning and increas in cscript instances?
I'll try the Active Script method - had problems making it work before, but I have a small site I could run as a test case instead of shell script, just drop some UNC checks and critical service checks in to cover myself.
-
- Posts: 96
- Joined: Thu Jul 19, 2007 4:35 am
OK, I've seen the order of events now - its related to a Shell Script method check not getting a response.
Check is started by HM, sent to the RMA.
On the RMA, in Task manager, I see a cmd.exe and a cscript.exe spawned
Eventually, the check fails (hits time out) (Unknown state rather than bad)
The cmd.exe despawns, the cscript.exe remains behind.
I've tried adding wscript.timeout=x into the scripts and making them less than the Host Monitor timeout, but the cscript.exe image is left orphaned
Must be something scriptwise - any sure fire ways of killing the scripts after a certain time??
Check is started by HM, sent to the RMA.
On the RMA, in Task manager, I see a cmd.exe and a cscript.exe spawned
Eventually, the check fails (hits time out) (Unknown state rather than bad)
The cmd.exe despawns, the cscript.exe remains behind.
I've tried adding wscript.timeout=x into the scripts and making them less than the Host Monitor timeout, but the cscript.exe image is left orphaned
Must be something scriptwise - any sure fire ways of killing the scripts after a certain time??
-
- Posts: 96
- Joined: Thu Jul 19, 2007 4:35 am
Problem is definitely script related, can reproduce it without HM, RMA or anything else. Back to the drawing board :(
CSCRIPT.EXE isn't closing when a server is there but not responding properly.
Launching the script with '/T:x' also doesn't kill the processes off after the specified timeout.wscript.timeout doesn't kill it. The WMI get seems to drop it into a blackhole, so I think even doping a start time, check time type of addtion to the script won't work.
Looking for a global cscript timeout regkey if anyone knows off one? Or any ideas why cscript isn't taking any notice of the T option - does the same for me launching the script from a cmd prompt on my workstation.
In the meantime I've got HM to check the RMA hosts for excessive cscripts... it's only a problem with a server that is only 'semi alive'.
CSCRIPT.EXE isn't closing when a server is there but not responding properly.
Launching the script with '/T:x' also doesn't kill the processes off after the specified timeout.wscript.timeout doesn't kill it. The WMI get seems to drop it into a blackhole, so I think even doping a start time, check time type of addtion to the script won't work.
Looking for a global cscript timeout regkey if anyone knows off one? Or any ideas why cscript isn't taking any notice of the T option - does the same for me launching the script from a cmd prompt on my workstation.
In the meantime I've got HM to check the RMA hosts for excessive cscripts... it's only a problem with a server that is only 'semi alive'.