Win32 error #1722

All questions related to installations, configurations and maintenance of Advanced Host Monitor (including additional tools such as RMA for Windows, RMA Manager, Web Servie, RCC).
User avatar
Marcus
Posts: 367
Joined: Mon Nov 18, 2002 6:00 pm

Win32 error #1722

Post by Marcus »

I have recently switched to version 5.12 and are now experiencing the following:

Every now and then, I loose my network connection to a remote server and get an Win32 error #1722 message back in the reply field with an unknown status. This is no problem, since we have a network problem.

The real problem is that this test is now 'kicked out' 8) . It is never tested again, until I manually refresh the test. From that point on it is 'pulled back' into its normal schedule. Have I missed a new option or is this realy a bug :-?

This does only occur with a service test. A ping test comes back into schedule without any problem.
KS-Soft
Posts: 12821
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

Cannot reproduce this problem.
May be you are using "Change test interval" action in the alert profile assigned to this test?

Regards
Alex
User avatar
Marcus
Posts: 367
Joined: Mon Nov 18, 2002 6:00 pm

Post by Marcus »

May be you are using "Change test interval" action in the alert profile assigned to this test?
No we do not. And even if we did, it would not explain why the test is not performed anymore until it is manually started. It looks like the scheduler is skipping the tests (recurrences stays on 1, it does not come any higher)
User avatar
Marcus
Posts: 367
Joined: Mon Nov 18, 2002 6:00 pm

Post by Marcus »

I really do have a problem with this situation (it occurs every day and also some out of schedule test do no return into schedule). So my question now is: is it possible to perform a downgrade to my 4.x version??
KS-Soft
Posts: 12821
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

Could you please send HML file with tests to support@ks-soft.net?
Yes, you can downgrade

Regards
Alex
User avatar
Marcus
Posts: 367
Joined: Mon Nov 18, 2002 6:00 pm

Post by Marcus »

I have changed the number of tests that are allowed to be started each second to 128 (was 64) and this morning we had the first correct situation. So I prefer to wait what happens in the weekend before we perform a downgrade. 8)
User avatar
Marcus
Posts: 367
Joined: Mon Nov 18, 2002 6:00 pm

Post by Marcus »

The problem seems to be solved. No errors found after the weekend. So performing more tests seems to be the solution. Problem however is that I never known when to perform more tests.

So it is still a problem, but with an acceptable workaround :wink:
KS-Soft
Posts: 12821
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

Alex>May be you are using "Change test interval" action in the alert profile assigned to this test?
Marcus>No we do not.
You are using this action. And looks like problem caused by this action that you are using for all (or most of) service tests.
You have several thousand of service tests. In case of network failure tests change status to Unknown, actions are triggered - as a result you have thousands of tests with short test interval (you are using 30 sec test interval). HostMonitor cannot perform all tests at the same time, so many tests will be performed much later....

Possible solutions:
- use longer interval for "Set new test interval" action. E.g. 90 sec instead of 30 sec

- use "advanced" mode for this action - change interval when test changes status to "bad", ignore "unknown" status

- change "Consider status of the master test obsolete after N seconds" option (Behavior page in the Options dialog). Use shorter interval, e.g. 5-10 sec. In this case "master" ping tests will be performed more often and "service" tests will not be performed when there is network problem. You even can try to setup 2 sec for this option

Regards
Alex
User avatar
Marcus
Posts: 367
Joined: Mon Nov 18, 2002 6:00 pm

Post by Marcus »

You are using this action
:oops: :oops:

But this still does not explain why the tests are never executed. I don't mind the tests being executed at a later time (I know HostMonitor will catch up if the situation becomes normal again). But I do mind test never being performed again, until I manually refresh the test.
use longer interval for "Set new test interval" action. E.g. 90 sec instead of 30 sec
I made more room for the tests by increasing the number of tests to be fired every second. This seems to work.
use "advanced" mode for this action - change interval when test changes status to "bad", ignore "unknown" status
I do not want to ignore unknown status, since this can mean more than a network problem. This will attend me on those situation.
- change "Consider status of the master test obsolete after N seconds" option (Behavior page in the Options dialog). Use shorter interval, e.g. 5-10 sec
Haven't thought about that one. I now have implemented 10 secs
User avatar
Marcus
Posts: 367
Joined: Mon Nov 18, 2002 6:00 pm

Post by Marcus »

I made more room for the tests by increasing the number of tests to be fired every second. This seems to work.
We still run into problems so I increased the new test interval to 60 seconds.

So if I'm correct I will never have an unknown status when I have network problems: New test interval will fire the test after 60 seconds, which wil fire a ping test (master test is invalid after 10 seconds) and the new test wil get the "wait for master" status.

The only time I can get an Unknown status due to network problems will occur only when:
The Network is available before the ping test, but fails before service test is performed. This interval is maximum 10 seconds and this behaviour (ok when ping, fail before service test) must happen twice since there is a new test interval of 60 seconds which wil fire the same sequence again.

Am I correct or am I missing something??
KS-Soft
Posts: 12821
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

The only time I can get an Unknown status due to network problems will occur only when:
The Network is available before the ping test, but fails before service test is performed.
Right.
and this behaviour (ok when ping, fail before service test) must happen twice since there is a new test interval of 60 seconds which wil fire the same sequence again.
Not exactly. You will see "unknown" status after 1st probe (network is available before the ping test, but fails before service test). Then, after 2nd probe, test will change status to "Wait for Master"
But this still does not explain why the tests are never executed.
"never" is not precise term. Have you wait till network problem fixed and other test items change status back to "alive"?

Lets make experiment:
- select single(!) "service" test and change name of the target system to invalid (or you may create new item for testing purpose)
- then press Space on this item, test will be "refreshed" and change status to "unknown.
What happens next? I assume it will be tested within 60 sec (you are using this interval for "set new test interval" action, right)

Regards
Alex
User avatar
Marcus
Posts: 367
Joined: Mon Nov 18, 2002 6:00 pm

Post by Marcus »

"never" is not precise term. Have you wait till network problem fixed and other test items change status back to "alive"?
Never in this case is until I manually refresh the specific test. The master ping test was already running ok for several hours.

I now have service tests that still say "out of schedule", while it's master ping test is already running more than 1.5 hour. Stopping and starting HostMonitor solves this problem. (I did send some screenshots to support@ks-soft.net)
Lets make experiment:
Did the test and the test goes to unknown, is tested a second time after 60 seconds (the new interval) and the iteration goes from 1 to 2. So we now know it works for a single test.
User avatar
Marcus
Posts: 367
Joined: Mon Nov 18, 2002 6:00 pm

Post by Marcus »

Never in this case is until I manually refresh the specific test
It looks like that the tests that stay unknown are all created with version 5.x of HostMonitor and all version 4.x are tested ok when connections is restored. I can't confirm this yet, but it looks like it.

I will check the creation dates of tests after the next network outage (won't be long :wink: )
KS-Soft
Posts: 12821
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

Never in this case is until I manually refresh the specific test. The master ping test was already running ok for several hours.
This problem is not about "master" test, its about test items with short test interval. Theoretically error in "Set new test interval: restore original value" action could lead to such problem... But I do not see any mistakes there :roll:
Could you check what "Estimate load" dialog shows when HostMonitor does not perform some tests?
It looks like that the tests that stay unknown are all created with version 5.x of HostMonitor and all version 4.x are tested ok when connections is restored. I can't confirm this yet, but it looks like it.
Its simply to explain - tests that were created by version 5 were created after tests from version 4, so new tests have higher indexes.
Anyway, that upgrade (www.ks-soft.net/download/hm523.zip) should eliminate problem (install version 5.22 before using this update)

Regards
Alex
User avatar
Marcus
Posts: 367
Joined: Mon Nov 18, 2002 6:00 pm

Post by Marcus »

This problem is not about "master" test
I know, it just shows the schedule is filled in correctly :wink:
Could you check what "Estimate load" dialog shows when HostMonitor does not perform some tests
14 tests/sec and our current setting is 128 :-?

I installed 5.23 and did not see any unknowns this morning. But then again there has been no network outage. 8)
What I do see is that two 'customers' are still out of schedule while the ping test is already running (same problem as yesterday, but different customer / folder in HostMonitor).

I noticed that all service tests are out of schedule for a single schedule (In this case two schedules), which was not the case yesterday (5.12 version). Don't know if it means anything, but just in case.
Post Reply