When some tests fail, anothers hang with "checking"

All questions related to installations, configurations and maintenance of Advanced Host Monitor (including additional tools such as RMA for Windows, RMA Manager, Web Servie, RCC).
Post Reply
art
Posts: 4
Joined: Mon Jan 30, 2006 1:53 am

When some tests fail, anothers hang with "checking"

Post by art »

Hi! I have HM 5.38 on Win2003, P4 2.8GHz, 1Gb server. The number of tests is 266 and estimate load is 1.2 test/sec. I've separated tests by type of tested service and I've placed them into different folders in HM folders tree.
In folder "e-mail" I've gathered tests of our e-mail system. There are several kinds of tests, they use different methods:
- tests of our e-mail domains, antispam, antivirus - shell script (vbs)
- test of smtp availability - SMTP test
- spamlists check - HTTP test
- services check - Service (through RMA)
- queues check - Count Files (through RMA)
Everything works fine. But sometimes it fails.
For example, when one mail server goes down, its tests show "fail" - it's true. But some other tests (services check on the other server, HTTP checks on the other, etc), that have no relations to this server, show "checking..." for a long time. They don't show "unknown" or "timeout". But if failed server begin to work again, this tests can show theirs normal retry, such as 17 ms (instead of it were showing "checking..." for an hour).
I can't understand how this tests depends on failed server's tests. Maybe it because of shell script (vbs) tests, that send e-mail and are checking answer during up to 10 minutes. So, they work 15 sec in normal case and they work 10 minutes when server fail.
Do you have any suggestions? Thanx
Yoorix
Posts: 177
Joined: Wed Dec 14, 2005 8:28 am

Post by Yoorix »

Probably the cause is your tests are trying to send e-mails via "downed" mailserver. I think you should adjust your vb scripts to avoid this problem.

Regards,
Yoorix
art
Posts: 4
Joined: Mon Jan 30, 2006 1:53 am

Post by art »

It's possible that script have some bugs. But it's an external program. I think, it run as separate process. Why possible scripts' problems have effect on other HM tests, such as service check on non-failed server?
Yoorix
Posts: 177
Joined: Wed Dec 14, 2005 8:28 am

Post by Yoorix »

Yes, you are right, each test is running in separate thread.
And your VB scripts is working fine.

Probably, that scripts have to send e-mails, but mailserver is downed, and they can not retrieve answer from mailserver until timeout has been expired. And you think, that test is hanged, right?
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

Probably the cause is your tests are trying to send e-mails via "downed" mailserver. I think you should adjust your vb scripts to avoid this problem.
This should not be a problem.
Tests started in separated threads, so any problem should not effect different tests unless problem located in some system (Windows) module.
Truth to say we never experience such problems on our systems..
For example, when one mail server goes down, its tests show "fail" - it's true. But some other tests (services check on the other server, HTTP checks on the other, etc), that have no relations to this server, show "checking..." for a long time.
Your mail server is just mail server? Or it works as Domain Controller, DHCP, DNS or some other server as well?

Strange that HTTP test hungs in "checking" status. Unlike Service test method that uses various Windows subsystems (e.g. RPC calls), HTTP test uses winsock API only. May be you have installed some non-standard socket applications? Antivirus monitoring software? Personal firewall? Content monitoring software? Network Analyzer software (Sniffer)?

Regards
Alex
art
Posts: 4
Joined: Mon Jan 30, 2006 1:53 am

Post by art »

Yoorix Probably, that scripts have to send e-mails, but mailserver is downed, and they can not retrieve answer from mailserver until timeout has been expired. And you think, that test is hanged, right?
No. If mailserver is downed, script just can't send mail. And it can't get response mail. It will wait for some time and will return "Bad" as a result.

Here is main idea of mail testing script

Code: Select all

const TotalSecondsToWait = 540
const StepSecondsToWait = 15

set wss = CreateObject("WScript.Shell")

...

		wss.Run "cscript sendmail.vbs //B //T:60 " & ... <parameters>, 0, True
...

WaitStartTime=timer
res = false
do while (not res) and (timer-WaitStartTime<TotalSecondsToWait)
	WScript.Sleep StepSecondsToWait*1000
	res = CheckResponse(mbox(i,2), TestID, UserPwd, mbox(i,3))
loop

If res Then
	WScript.StdOut.Write statusOk
Else
	WScript.StdOut.Write statusBad
End If


'------------------------------------------------------------------------
'get letter from mailbox, put it on disk and check it's accuracy
Function CheckResponse(pop, login, pwd, file)
	wss.Run "mail2files.exe " & ... <parameters>, 0, True
...
End Function
mail2files.exe gets letter from mailbox and puts it on disk
Here is main from my sendmail.vbs

Code: Select all

...
Set iMsg  = CreateObject("CDO.Message")
Set iConf = CreateObject("CDO.Configuration")
Set Flds  = iConf.Fields

FLds.Item("http://schemas.microsoft.com/cdo/configuration/sendusing") = 2
Flds.Item("http://schemas.microsoft.com/cdo/configuration/smtpserver") = SMTPSrv
Flds.Item("http://schemas.microsoft.com/cdo/configuration/smtpserverport") = 25
Flds.Item("http://schemas.microsoft.com/cdo/configuration/smtpconnectiontimeout") = 20
Flds.Item("http://schemas.microsoft.com/cdo/configuration/smtpusessl") = False
Flds.Item("http://schemas.microsoft.com/cdo/configuration/smtpauthenticate") = 0
Flds.Update

with iMsg
wscript.echo mfrom, mto, msubj, mbody
  Set .Configuration = iConf
  .From     = MFrom
  .To	    = MTo
  .Subject  = MSubj
  .Textbody = MBody
  if Attach then .AddAttachment MAttachmentF
  .Send
End With
...
KS-Soft Your mail server is just mail server? Or it works as Domain Controller, DHCP, DNS or some other server as well?
Our mail server it's for mail only. Problem isn't with it. If it will be failed, I just can't send and receive mail, so script will work for about 9 minutes and will return "Bad". It will mean, that mail server has failed :)
KS-Soft May be you have installed some non-standard socket applications? Antivirus monitoring software? Personal firewall? Content monitoring software? Network Analyzer software (Sniffer)?
My HM is running on server without such software.

It looks like tests in one HM folder are depend on each other. When some tests begin to work longer than usually (up to 10 minutes), some others begin to show "checking..." for this time and then they show 17 seconds, for example. How did they work in 17 seconds, if they were showing "checking..." for a much longer period? It's very strange. Maybe it's a problem with tests' state refreshing.
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

Nope, its not problem with refreshing or tests dependency. HostMonitor sets "Checking..." status after all internal checks (test period, schedules, dependancy, etc) has been completed. If you see "Checkng..." status it means HostMonitor sent request to Windows API and waiting for responce.
So, there is some system problem.

Could you try to disable your script test and create some SMTP test. If mail server stops, will you see the same "checking..." problem?
If not, it means CDO object has some side effects on the system...

Regards
Alex
art
Posts: 4
Joined: Mon Jan 30, 2006 1:53 am

Post by art »

Oh, I've found it!!! It's because of checked "Non-simultaneously test execution" in folder's properties.
If it 's checked, when I do "refresh this folder", I get results on some tests only when other have finished, even in case of normal execution.
If I clean the checkbox and do "refresh this folder", everything works fine and I get tests results in order they have finished.
So, I uncheck it and now HostMonitor is ok.
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

Ha, of course. With this option enabled HostMonitor does not start new threads.

Regards
Alex
Post Reply