View previous topic :: View next topic |
Author |
Message |
rodionov.ka
Joined: 08 Apr 2013 Posts: 8
|
Posted: Fri Apr 26, 2013 3:00 am Post subject: "Unknown" status in ping test |
|
|
Hello.
What mean status "unknown" in ping test?
at last time i'm receive many falsenegatives with this statuses
First mail - FAIL of service
Code: |
Test : DC71 Ping
Method: ping (timeout - 2000 ms)
Status : Unknown
StatusChangedTime: 26.04.2013 1:22:07
Reply : Timed out
Suggested Reply: 0 ms
Agent (host, who performed test): gw71.*domain*.ru
Last status: Host is alive
LastReply: 0 ms
PreviousStatusDuration: 6 days 18:06:15
Folder: DC71
Interval of test: 00:01:00
TaskComment: Ping DC71
Test Object Info: Ping DC71 (timeout: 2000 ms)
MasterTests (depend on):
Recurrences : 1
Total tests: 1368772
Alive ratio : 95,60 %
Dead ratio: 0,18 %
|
Second with HOST ALIVE
Code: |
Test : DC71 Ping
Method: ping (timeout - 2000 ms)
Status : Host is alive
StatusChangedTime: 26.04.2013 1:23:08
Reply : 0 ms
Suggested Reply: 0 ms
Agent (host, who performed test): gw71.*domain*.ru
Last status: Unknown
LastReply: Timed out
PreviousStatusDuration: 00:01:00
Folder: DC71
Interval of test: 00:01:00
TaskComment: Ping DC71
Test Object Info: Ping DC71 (timeout: 2000 ms)
MasterTests (depend on):
Recurrences : 1
Total tests: 1368773
Alive ratio : 95,60 %
Dead ratio: 0,18 %
|
when it's real in FAIL state, i receive message like this
Code: |
Test : Ping Lipetsk VPN-GW48
Method: ping (timeout - 2000 ms)
Status : [b]No answer[/b]
StatusChangedTime: 26.04.2013 9:36:18
Reply : 100 %
Suggested Reply: 100 %
Agent (host, who performed test): HostMonitor
Last status: Host is alive
LastReply: 0 %
PreviousStatusDuration: 2 days 10:50:40
Folder: VPN's
Interval of test: 00:05:00
TaskComment: Ping 192.168.48.254
Test Object Info: Ping 192.168.48.254 (timeout: 2000 ms)
MasterTests (depend on): GW5 Firewall
Ping Lipetsk-EXT
Recurrences : 1
Total tests: 291302
Alive ratio : 93,05 %
Dead ratio: 0,59 %
|
and many others same status tests.
I use two Active RMA in that network, first, and second in backup only mode.
How i can troubleshoot this messgaes to exclude fasle negatives? |
|
Back to top |
|
|
KS-Soft
Joined: 03 Apr 2002 Posts: 12795 Location: USA
|
Posted: Fri Apr 26, 2013 7:07 am Post subject: |
|
|
Quote: | Status : Unknown
StatusChangedTime: 26.04.2013 1:22:07
Reply : Timed out |
This means HostMonitor could not perform test within 15 min.
Quote: | How i can troubleshoot this messgaes to exclude fasle negatives? |
Well, its not "false". Unknown status means test cannot be executed correctly. There is some problem...
You may easily tell HostMonitor do not start "bad" actions on Unknown status but I think its better to find reason of this problem.
Could you please provide more information?
- HostMonitor version?
- Test performed by Active RMA? RMA version?
- What Windows do you use?
- Service pack?
- Do you use ODBC logging or ODBC test method? If yes, what ODBC driver do you use?
- Do you have installed some antivirus monitors, personal firewall, content monitoring software? Non stanard winsock components?
Regards
Alex |
|
Back to top |
|
|
rodionov.ka
Joined: 08 Apr 2013 Posts: 8
|
Posted: Mon Apr 29, 2013 4:23 am Post subject: |
|
|
Quote: | Could you please provide more information?
- HostMonitor version? |
v.9.32
Quote: | - Test performed by Active RMA? RMA version? |
ActiveRMA 4.52 and 4.53 (both have a same behavior)
Quote: | - What Windows do you use?
- Service pack? |
For Hostmonitor server - Win2003 R2 SP2 Eng (xeon 3Ghz, 4 Gb mem, sas disks)
For agents in "problem" remote office (link by VPN, medium quality inet channels)
Main Agent - Windows NT 5.2 Build 3790 Service Pack 2
Backup agent - Windows NT 6.1 Build 7601 Service Pack 1
Quote: | - Do you use ODBC logging or ODBC test method? If yes, what ODBC driver do you use? |
Yes, ODBC backup Logging on SQL server within LAN for HM Server, not for remote agents. No other ODBC related tests.
SQL Native Client - 2005.90.2047.00
Quote: | - Do you have installed some antivirus monitors, personal firewall, content monitoring software? Non stanard winsock components? |
I think no.
KS-Soft wrote: | Quote: | Status : Unknown
StatusChangedTime: 26.04.2013 1:22:07
Reply : Timed out |
This means HostMonitor could not perform test within 15 min.
Quote: | How i can troubleshoot this messgaes to exclude fasle negatives? |
Well, its not "false". Unknown status means test cannot be executed correctly. There is some problem...
You may easily tell HostMonitor do not start "bad" actions on Unknown status but I think its better to find reason of this problem.
Could you please provide more information?
- HostMonitor version?
- Test performed by Active RMA? RMA version?
- What Windows do you use?
- Service pack?
- Do you use ODBC logging or ODBC test method? If yes, what ODBC driver do you use?
- Do you have installed some antivirus monitors, personal firewall, content monitoring software? Non stanard winsock components?
Regards
Alex |
|
|
Back to top |
|
|
KS-Soft
Joined: 03 Apr 2002 Posts: 12795 Location: USA
|
Posted: Mon Apr 29, 2013 1:31 pm Post subject: |
|
|
Looks like connection was dropped after test probe started...
Do you see errors in RMA log file (by default its loga_bad.txt file, unless you changed settings)?
Any errors in HostMonitor system log file (specified on System Log page in HostMonitor Options dialog)?
Any warnings displayed by Auditing Tool (menu View)?
Regards
Alex |
|
Back to top |
|
|
rodionov.ka
Joined: 08 Apr 2013 Posts: 8
|
Posted: Tue Apr 30, 2013 7:01 am Post subject: |
|
|
KS-Soft wrote: | Looks like connection was dropped after test probe started...
Do you see errors in RMA log file (by default its loga_bad.txt file, unless you changed settings)? |
GW72 single agent
Code: | [26.04.2013 0:59] gw72.*domain*.ru Connection error
[26.04.2013 0:59] gw72.*domain*.ru Connection error
[26.04.2013 1:07] gw72.*domain*.ru Decode error: Cannot read data. An existing connection was forcibly closed by the remote host.
[26.04.2013 1:07] gw72.*domain*.ru Connection error |
GW71 Main agent:
Code: | [26.04.2013 0:59] gw71.*domain*.ru Connection error
[26.04.2013 1:07] gw71.*domain*.ru Connection error |
DC71 backup agent of GW71:
Code: | [26.04.2013 0:59] dc71.*domain*.ru Connection error
[26.04.2013 0:59] dc71.*domain*.ru Connection error
[26.04.2013 1:07] dc71.*domain*.ru Connection error
[26.04.2013 1:07] dc71.*domain*.ru Connection error
[26.04.2013 2:05] dc71.*domain*.ru Decode error: Cannot read data. An existing connection was forcibly closed by the remote host.
[26.04.2013 2:06] dc71.*domain*.ru Agent "dc71.*domain*.ru" already connected! |
KS-Soft wrote: | Any errors in HostMonitor system log file (specified on System Log page in HostMonitor Options dialog)? |
Code: | 26.04.2013 0:58:47 E-mail to ServersMonitoring@*domain*.ru has been sent (via mail.*domain*.ru)
26.04.2013 1:00:58 E-mail to ServersMonitoring@*domain*.ru has been sent (via mail.*domain*.ru)
26.04.2013 1:07:13 E-mail to ServersMonitoring@*domain*.ru has been sent (via mail.*domain*.ru)
26.04.2013 1:08:13 E-mail to ServersMonitoring@*domain*.ru has been sent (via mail.*domain*.ru)
26.04.2013 1:13:30 E-mail to ServersMonitoring@*domain*.ru has been sent (via mail.*domain*.ru)
26.04.2013 1:13:35 E-mail to ServersMonitoring@*domain*.ru has been sent (via mail.*domain*.ru)
26.04.2013 1:13:39 E-mail to ServersMonitoring@*domain*.ru has been sent (via mail.*domain*.ru)
26.04.2013 1:13:43 E-mail to ServersMonitoring@*domain*.ru has been sent (via mail.*domain*.ru)
26.04.2013 1:13:49 E-mail to ServersMonitoring@*domain*.ru has been sent (via mail.*domain*.ru)
26.04.2013 1:13:56 E-mail to ServersMonitoring@*domain*.ru has been sent (via mail.*domain*.ru)
26.04.2013 1:13:56 E-mail to ServersMonitoring@*domain*.ru has been sent (via mail.*domain*.ru)
26.04.2013 1:14:05 E-mail to ServersMonitoring@*domain*.ru has been sent (via mail.*domain*.ru)
26.04.2013 1:14:06 E-mail to ServersMonitoring@*domain*.ru has been sent (via mail.*domain*.ru)
26.04.2013 1:14:13 E-mail to ServersMonitoring@*domain*.ru has been sent (via mail.*domain*.ru)
26.04.2013 1:14:19 E-mail to ServersMonitoring@*domain*.ru has been sent (via mail.*domain*.ru)
26.04.2013 1:14:26 E-mail to ServersMonitoring@*domain*.ru has been sent (via mail.*domain*.ru)
26.04.2013 1:14:30 E-mail to ServersMonitoring@*domain*.ru has been sent (via mail.*domain*.ru)
26.04.2013 1:14:40 E-mail to ServersMonitoring@*domain*.ru has been sent (via mail.*domain*.ru)
26.04.2013 1:14:56 E-mail to ServersMonitoring@*domain*.ru has been sent (via mail.*domain*.ru)
26.04.2013 1:16:07 E-mail to ServersMonitoring@*domain*.ru has been sent (via mail.*domain*.ru)
26.04.2013 1:22:08 E-mail to ServersMonitoring@*domain*.ru has been sent (via mail.*domain*.ru)
26.04.2013 1:23:08 E-mail to ServersMonitoring@*domain*.ru has been sent (via mail.*domain*.ru)
26.04.2013 2:06:05 192.168.71.2: Agent "dc71.*domain*.ru" already connected! |
KS-Soft wrote: | Any warnings displayed by Auditing Tool (menu View)? |
Warning only about non correct bat or vbs files in disabled tests and wrong sound filenames.
Other question:
Why lots of "Already connected? every 30 seconds for some of agents? |
|
Back to top |
|
|
KS-Soft
Joined: 03 Apr 2002 Posts: 12795 Location: USA
|
Posted: Tue Apr 30, 2013 9:58 am Post subject: |
|
|
Quote: | Why lots of "Already connected? every 30 seconds for some of agents? |
2 possible reasons:
- you have installed 2 agents using the same name (could you please check this?)
- there is some mistake in our code
Regards
Alex |
|
Back to top |
|
|
KS-Soft
Joined: 03 Apr 2002 Posts: 12795 Location: USA
|
Posted: Tue Apr 30, 2013 11:42 am Post subject: |
|
|
PS
we checked code - this may happen when TCP connection dropped (not closed in normal way). HostMonitor system may wait for packets from remote system for some time until it recognizes problem and socket will be closed.
Regards
Alex |
|
Back to top |
|
|
rodionov.ka
Joined: 08 Apr 2013 Posts: 8
|
Posted: Mon May 06, 2013 1:26 am Post subject: |
|
|
KS-Soft wrote: | Quote: | Why lots of "Already connected? every 30 seconds for some of agents? |
2 possible reasons:
- you have installed 2 agents using the same name (could you please check this?)
- there is some mistake in our code
Regards
Alex |
Hi!
About "already connected":
I've double checked - no cfg's on remote systems with same agent names.
At now - some agents have this message and it's sending it constantly, when RMA manager active. Resarting of agent (with waiting about a 30 seconds) solving this "problem". Or not starting RMA manager solves this problem too
But what about first problem with "unknown's"?
What i'm need to do to solve it? |
|
Back to top |
|
|
KS-Soft
Joined: 03 Apr 2002 Posts: 12795 Location: USA
|
Posted: Tue May 07, 2013 5:16 am Post subject: |
|
|
If you cannot use more reliable channel, there is nothing you can do.
We can add some code in next version to handle such problems better (faster reconnect, etc).
But if connection is bad and test cannot be performed, you still will see Unknown status sometimes. You may setup action profiles (test settings) to ignore Unknown status, do not start actions or start different actions on Unknown status.
Regards
Alex |
|
Back to top |
|
|
rodionov.ka
Joined: 08 Apr 2013 Posts: 8
|
Posted: Tue May 07, 2013 5:53 am Post subject: |
|
|
KS-Soft wrote: | If you cannot use more reliable channel, there is nothing you can do. |
What difference with Active and Passive RMA in that case?
I know that the connection did not disappear for a time greater than 2-3 minutes (more than 2 tests of that connection from a main HM server: pings of ext ip and vpn session pings). If a time for assigning unknown status is a 15 minutes, why unknown status is set? Can you imagine 15 minutes channel breakdown in office internet network connection??
My be this an error in rma session management? |
|
Back to top |
|
|
KS-Soft
Joined: 03 Apr 2002 Posts: 12795 Location: USA
|
Posted: Tue May 07, 2013 7:30 am Post subject: |
|
|
Passive RMA waits for connection from HostMonitor.
HostMonitor may set Unknown status even if you use Passive RMA (when HostMonitor cannot connect to the agent and cannot connect to backup RMA).
As I said we will add some code...
Regards
Alex |
|
Back to top |
|
|
KS-Soft
Joined: 03 Apr 2002 Posts: 12795 Location: USA
|
Posted: Mon May 27, 2013 12:59 pm Post subject: |
|
|
We changed some code in new version of HostMonitor, RMA Manager and Active RMA.
Now it should work better over unreliable connections. E.g.
- when connection from primary RMA dropped and there is active (connected) backup agent, HostMonitor may switch to backup RMA right away even if test probe already started;
- if there is no backup RMA provided, HostMontor may update results for started test probes right after successful reconnect of agent;
and so on.
If you have installed version 9.50, we can provide updated hostmon.exe, rma_active.exe, rma_mgr.exe modules.
Regards
Alex |
|
Back to top |
|
|
rodionov.ka
Joined: 08 Apr 2013 Posts: 8
|
Posted: Tue May 28, 2013 6:03 am Post subject: |
|
|
KS-Soft wrote: | If you have installed version 9.50, we can provide updated hostmon.exe, rma_active.exe, rma_mgr.exe modules. |
Hello
We have installed version 9.32. Do you have updated version for us? |
|
Back to top |
|
|
KS-Soft
Joined: 03 Apr 2002 Posts: 12795 Location: USA
|
Posted: Tue May 28, 2013 10:17 am Post subject: |
|
|
Why do you want to use old version? Your license does not allow updates anymore?
Please send request to support@ks-soft.net, provide your registration name and/or order number.
Regards
Alex |
|
Back to top |
|
|
rodionov.ka
Joined: 08 Apr 2013 Posts: 8
|
Posted: Mon Jun 03, 2013 4:07 am Post subject: |
|
|
KS-Soft wrote: |
If you have installed version 9.50, we can provide updated hostmon.exe, rma_active.exe, rma_mgr.exe modules. |
Hello!
Can you provide updated modules? |
|
Back to top |
|
|
|