Hi! I have HM 5.38 on Win2003, P4 2.8GHz, 1Gb server. The number of tests is 266 and estimate load is 1.2 test/sec. I've separated tests by type of tested service and I've placed them into different folders in HM folders tree.
In folder "e-mail" I've gathered tests of our e-mail system. There are several kinds of tests, they use different methods:
- tests of our e-mail domains, antispam, antivirus - shell script (vbs)
- test of smtp availability - SMTP test
- spamlists check - HTTP test
- services check - Service (through RMA)
- queues check - Count Files (through RMA)
Everything works fine. But sometimes it fails.
For example, when one mail server goes down, its tests show "fail" - it's true. But some other tests (services check on the other server, HTTP checks on the other, etc), that have no relations to this server, show "checking..." for a long time. They don't show "unknown" or "timeout". But if failed server begin to work again, this tests can show theirs normal retry, such as 17 ms (instead of it were showing "checking..." for an hour).
I can't understand how this tests depends on failed server's tests. Maybe it because of shell script (vbs) tests, that send e-mail and are checking answer during up to 10 minutes. So, they work 15 sec in normal case and they work 10 minutes when server fail.
Do you have any suggestions? Thanx
When some tests fail, anothers hang with "checking"
This should not be a problem.Probably the cause is your tests are trying to send e-mails via "downed" mailserver. I think you should adjust your vb scripts to avoid this problem.
Tests started in separated threads, so any problem should not effect different tests unless problem located in some system (Windows) module.
Truth to say we never experience such problems on our systems..
Your mail server is just mail server? Or it works as Domain Controller, DHCP, DNS or some other server as well?For example, when one mail server goes down, its tests show "fail" - it's true. But some other tests (services check on the other server, HTTP checks on the other, etc), that have no relations to this server, show "checking..." for a long time.
Strange that HTTP test hungs in "checking" status. Unlike Service test method that uses various Windows subsystems (e.g. RPC calls), HTTP test uses winsock API only. May be you have installed some non-standard socket applications? Antivirus monitoring software? Personal firewall? Content monitoring software? Network Analyzer software (Sniffer)?
Regards
Alex
No. If mailserver is downed, script just can't send mail. And it can't get response mail. It will wait for some time and will return "Bad" as a result.Yoorix Probably, that scripts have to send e-mails, but mailserver is downed, and they can not retrieve answer from mailserver until timeout has been expired. And you think, that test is hanged, right?
Here is main idea of mail testing script
Code: Select all
const TotalSecondsToWait = 540
const StepSecondsToWait = 15
set wss = CreateObject("WScript.Shell")
...
wss.Run "cscript sendmail.vbs //B //T:60 " & ... <parameters>, 0, True
...
WaitStartTime=timer
res = false
do while (not res) and (timer-WaitStartTime<TotalSecondsToWait)
WScript.Sleep StepSecondsToWait*1000
res = CheckResponse(mbox(i,2), TestID, UserPwd, mbox(i,3))
loop
If res Then
WScript.StdOut.Write statusOk
Else
WScript.StdOut.Write statusBad
End If
'------------------------------------------------------------------------
'get letter from mailbox, put it on disk and check it's accuracy
Function CheckResponse(pop, login, pwd, file)
wss.Run "mail2files.exe " & ... <parameters>, 0, True
...
End Function
Here is main from my sendmail.vbs
Code: Select all
...
Set iMsg = CreateObject("CDO.Message")
Set iConf = CreateObject("CDO.Configuration")
Set Flds = iConf.Fields
FLds.Item("http://schemas.microsoft.com/cdo/configuration/sendusing") = 2
Flds.Item("http://schemas.microsoft.com/cdo/configuration/smtpserver") = SMTPSrv
Flds.Item("http://schemas.microsoft.com/cdo/configuration/smtpserverport") = 25
Flds.Item("http://schemas.microsoft.com/cdo/configuration/smtpconnectiontimeout") = 20
Flds.Item("http://schemas.microsoft.com/cdo/configuration/smtpusessl") = False
Flds.Item("http://schemas.microsoft.com/cdo/configuration/smtpauthenticate") = 0
Flds.Update
with iMsg
wscript.echo mfrom, mto, msubj, mbody
Set .Configuration = iConf
.From = MFrom
.To = MTo
.Subject = MSubj
.Textbody = MBody
if Attach then .AddAttachment MAttachmentF
.Send
End With
...
Our mail server it's for mail only. Problem isn't with it. If it will be failed, I just can't send and receive mail, so script will work for about 9 minutes and will return "Bad". It will mean, that mail server has failedKS-Soft Your mail server is just mail server? Or it works as Domain Controller, DHCP, DNS or some other server as well?

My HM is running on server without such software.KS-Soft May be you have installed some non-standard socket applications? Antivirus monitoring software? Personal firewall? Content monitoring software? Network Analyzer software (Sniffer)?
It looks like tests in one HM folder are depend on each other. When some tests begin to work longer than usually (up to 10 minutes), some others begin to show "checking..." for this time and then they show 17 seconds, for example. How did they work in 17 seconds, if they were showing "checking..." for a much longer period? It's very strange. Maybe it's a problem with tests' state refreshing.
Nope, its not problem with refreshing or tests dependency. HostMonitor sets "Checking..." status after all internal checks (test period, schedules, dependancy, etc) has been completed. If you see "Checkng..." status it means HostMonitor sent request to Windows API and waiting for responce.
So, there is some system problem.
Could you try to disable your script test and create some SMTP test. If mail server stops, will you see the same "checking..." problem?
If not, it means CDO object has some side effects on the system...
Regards
Alex
So, there is some system problem.
Could you try to disable your script test and create some SMTP test. If mail server stops, will you see the same "checking..." problem?
If not, it means CDO object has some side effects on the system...
Regards
Alex
Oh, I've found it!!! It's because of checked "Non-simultaneously test execution" in folder's properties.
If it 's checked, when I do "refresh this folder", I get results on some tests only when other have finished, even in case of normal execution.
If I clean the checkbox and do "refresh this folder", everything works fine and I get tests results in order they have finished.
So, I uncheck it and now HostMonitor is ok.
If it 's checked, when I do "refresh this folder", I get results on some tests only when other have finished, even in case of normal execution.
If I clean the checkbox and do "refresh this folder", everything works fine and I get tests results in order they have finished.
So, I uncheck it and now HostMonitor is ok.