View previous topic :: View next topic |
Author |
Message |
Marcus
Joined: 18 Nov 2002 Posts: 367
|
Posted: Wed Apr 06, 2005 1:43 am Post subject: Win32 error #1722 |
|
|
I have recently switched to version 5.12 and are now experiencing the following:
Every now and then, I loose my network connection to a remote server and get an Win32 error #1722 message back in the reply field with an unknown status. This is no problem, since we have a network problem.
The real problem is that this test is now 'kicked out' . It is never tested again, until I manually refresh the test. From that point on it is 'pulled back' into its normal schedule. Have I missed a new option or is this realy a bug
This does only occur with a service test. A ping test comes back into schedule without any problem. |
|
Back to top |
|
|
KS-Soft
Joined: 03 Apr 2002 Posts: 12795 Location: USA
|
Posted: Wed Apr 06, 2005 7:31 pm Post subject: |
|
|
Cannot reproduce this problem.
May be you are using "Change test interval" action in the alert profile assigned to this test?
Regards
Alex |
|
Back to top |
|
|
Marcus
Joined: 18 Nov 2002 Posts: 367
|
Posted: Thu Apr 07, 2005 1:27 am Post subject: |
|
|
Quote: | May be you are using "Change test interval" action in the alert profile assigned to this test?
| No we do not. And even if we did, it would not explain why the test is not performed anymore until it is manually started. It looks like the scheduler is skipping the tests (recurrences stays on 1, it does not come any higher) |
|
Back to top |
|
|
Marcus
Joined: 18 Nov 2002 Posts: 367
|
Posted: Thu Apr 07, 2005 2:58 am Post subject: |
|
|
I really do have a problem with this situation (it occurs every day and also some out of schedule test do no return into schedule). So my question now is: is it possible to perform a downgrade to my 4.x version?? |
|
Back to top |
|
|
KS-Soft
Joined: 03 Apr 2002 Posts: 12795 Location: USA
|
Posted: Thu Apr 07, 2005 11:13 am Post subject: |
|
|
Could you please send HML file with tests to support@ks-soft.net?
Yes, you can downgrade
Regards
Alex |
|
Back to top |
|
|
Marcus
Joined: 18 Nov 2002 Posts: 367
|
Posted: Fri Apr 08, 2005 1:55 am Post subject: |
|
|
I have changed the number of tests that are allowed to be started each second to 128 (was 64) and this morning we had the first correct situation. So I prefer to wait what happens in the weekend before we perform a downgrade. |
|
Back to top |
|
|
Marcus
Joined: 18 Nov 2002 Posts: 367
|
Posted: Mon Apr 11, 2005 1:34 am Post subject: |
|
|
The problem seems to be solved. No errors found after the weekend. So performing more tests seems to be the solution. Problem however is that I never known when to perform more tests.
So it is still a problem, but with an acceptable workaround |
|
Back to top |
|
|
KS-Soft
Joined: 03 Apr 2002 Posts: 12795 Location: USA
|
Posted: Mon Apr 11, 2005 12:05 pm Post subject: |
|
|
Quote: | Alex>May be you are using "Change test interval" action in the alert profile assigned to this test?
Marcus>No we do not. |
You are using this action. And looks like problem caused by this action that you are using for all (or most of) service tests.
You have several thousand of service tests. In case of network failure tests change status to Unknown, actions are triggered - as a result you have thousands of tests with short test interval (you are using 30 sec test interval). HostMonitor cannot perform all tests at the same time, so many tests will be performed much later....
Possible solutions:
- use longer interval for "Set new test interval" action. E.g. 90 sec instead of 30 sec
- use "advanced" mode for this action - change interval when test changes status to "bad", ignore "unknown" status
- change "Consider status of the master test obsolete after N seconds" option (Behavior page in the Options dialog). Use shorter interval, e.g. 5-10 sec. In this case "master" ping tests will be performed more often and "service" tests will not be performed when there is network problem. You even can try to setup 2 sec for this option
Regards
Alex |
|
Back to top |
|
|
Marcus
Joined: 18 Nov 2002 Posts: 367
|
Posted: Tue Apr 12, 2005 2:38 am Post subject: |
|
|
Quote: | You are using this action |
But this still does not explain why the tests are never executed. I don't mind the tests being executed at a later time (I know HostMonitor will catch up if the situation becomes normal again). But I do mind test never being performed again, until I manually refresh the test.
Quote: | use longer interval for "Set new test interval" action. E.g. 90 sec instead of 30 sec | I made more room for the tests by increasing the number of tests to be fired every second. This seems to work.
Quote: | use "advanced" mode for this action - change interval when test changes status to "bad", ignore "unknown" status | I do not want to ignore unknown status, since this can mean more than a network problem. This will attend me on those situation.
Quote: | - change "Consider status of the master test obsolete after N seconds" option (Behavior page in the Options dialog). Use shorter interval, e.g. 5-10 sec | Haven't thought about that one. I now have implemented 10 secs |
|
Back to top |
|
|
Marcus
Joined: 18 Nov 2002 Posts: 367
|
Posted: Tue Apr 12, 2005 3:53 am Post subject: |
|
|
Quote: | I made more room for the tests by increasing the number of tests to be fired every second. This seems to work. | We still run into problems so I increased the new test interval to 60 seconds.
So if I'm correct I will never have an unknown status when I have network problems: New test interval will fire the test after 60 seconds, which wil fire a ping test (master test is invalid after 10 seconds) and the new test wil get the "wait for master" status.
The only time I can get an Unknown status due to network problems will occur only when:
The Network is available before the ping test, but fails before service test is performed. This interval is maximum 10 seconds and this behaviour (ok when ping, fail before service test) must happen twice since there is a new test interval of 60 seconds which wil fire the same sequence again.
Am I correct or am I missing something?? |
|
Back to top |
|
|
KS-Soft
Joined: 03 Apr 2002 Posts: 12795 Location: USA
|
Posted: Tue Apr 12, 2005 10:44 am Post subject: |
|
|
Quote: | The only time I can get an Unknown status due to network problems will occur only when:
The Network is available before the ping test, but fails before service test is performed. |
Right.
Quote: | and this behaviour (ok when ping, fail before service test) must happen twice since there is a new test interval of 60 seconds which wil fire the same sequence again. |
Not exactly. You will see "unknown" status after 1st probe (network is available before the ping test, but fails before service test). Then, after 2nd probe, test will change status to "Wait for Master"
Quote: | But this still does not explain why the tests are never executed. |
"never" is not precise term. Have you wait till network problem fixed and other test items change status back to "alive"?
Lets make experiment:
- select single(!) "service" test and change name of the target system to invalid (or you may create new item for testing purpose)
- then press Space on this item, test will be "refreshed" and change status to "unknown.
What happens next? I assume it will be tested within 60 sec (you are using this interval for "set new test interval" action, right)
Regards
Alex |
|
Back to top |
|
|
Marcus
Joined: 18 Nov 2002 Posts: 367
|
Posted: Wed Apr 13, 2005 2:22 am Post subject: |
|
|
Quote: | "never" is not precise term. Have you wait till network problem fixed and other test items change status back to "alive"? |
Never in this case is until I manually refresh the specific test. The master ping test was already running ok for several hours.
I now have service tests that still say "out of schedule", while it's master ping test is already running more than 1.5 hour. Stopping and starting HostMonitor solves this problem. (I did send some screenshots to support@ks-soft.net)
Quote: | Lets make experiment: |
Did the test and the test goes to unknown, is tested a second time after 60 seconds (the new interval) and the iteration goes from 1 to 2. So we now know it works for a single test. |
|
Back to top |
|
|
Marcus
Joined: 18 Nov 2002 Posts: 367
|
Posted: Wed Apr 13, 2005 9:25 am Post subject: |
|
|
Quote: | Never in this case is until I manually refresh the specific test |
It looks like that the tests that stay unknown are all created with version 5.x of HostMonitor and all version 4.x are tested ok when connections is restored. I can't confirm this yet, but it looks like it.
I will check the creation dates of tests after the next network outage (won't be long ) |
|
Back to top |
|
|
KS-Soft
Joined: 03 Apr 2002 Posts: 12795 Location: USA
|
Posted: Wed Apr 13, 2005 11:36 am Post subject: |
|
|
Quote: | Never in this case is until I manually refresh the specific test. The master ping test was already running ok for several hours. |
This problem is not about "master" test, its about test items with short test interval. Theoretically error in "Set new test interval: restore original value" action could lead to such problem... But I do not see any mistakes there
Could you check what "Estimate load" dialog shows when HostMonitor does not perform some tests?
Quote: | It looks like that the tests that stay unknown are all created with version 5.x of HostMonitor and all version 4.x are tested ok when connections is restored. I can't confirm this yet, but it looks like it. |
Its simply to explain - tests that were created by version 5 were created after tests from version 4, so new tests have higher indexes.
Anyway, that upgrade (www.ks-soft.net/download/hm523.zip) should eliminate problem (install version 5.22 before using this update)
Regards
Alex |
|
Back to top |
|
|
Marcus
Joined: 18 Nov 2002 Posts: 367
|
Posted: Thu Apr 14, 2005 2:01 am Post subject: |
|
|
Quote: | This problem is not about "master" test | I know, it just shows the schedule is filled in correctly
Quote: | Could you check what "Estimate load" dialog shows when HostMonitor does not perform some tests |
14 tests/sec and our current setting is 128
I installed 5.23 and did not see any unknowns this morning. But then again there has been no network outage.
What I do see is that two 'customers' are still out of schedule while the ping test is already running (same problem as yesterday, but different customer / folder in HostMonitor).
I noticed that all service tests are out of schedule for a single schedule (In this case two schedules), which was not the case yesterday (5.12 version). Don't know if it means anything, but just in case. |
|
Back to top |
|
|
|