View previous topic :: View next topic |
Author |
Message |
andrep
Joined: 11 Dec 2008 Posts: 2
|
Posted: Thu Dec 11, 2008 3:04 am Post subject: RMA Client in Unkown status causing flase alets |
|
|
I have the lastest version and installed about 18 sites with 1 RMA on each site. 17 Test per site. Every now and then the RMA at sites does not respond and causes unknown status. When I force the test the its alive. It seems unstable ? What could be causing this ? |
|
Back to top |
|
|
KS-Soft Europe
Joined: 16 May 2006 Posts: 2832
|
Posted: Thu Dec 11, 2008 8:16 am Post subject: |
|
|
Could you provide more information, please?
- Do you use Active or Passive RMA?
- What exact error message do you see in "Reply field" of the test? "Connection error"? Other?
Regards,
Max |
|
Back to top |
|
|
KS-Soft
Joined: 03 Apr 2002 Posts: 12817 Location: USA
|
Posted: Thu Dec 11, 2008 11:24 am Post subject: |
|
|
Also, do you have installed some antivirus monitors, personal firewall, content monitoring software? Non stanard winsock components?
What exactly test method(s) do you use?
What Windows is installed on local and remote systems? Service Pack?
Regards
Alex |
|
Back to top |
|
|
ldean
Joined: 14 Nov 2008 Posts: 17
|
Posted: Tue Dec 16, 2008 10:14 am Post subject: |
|
|
I'm experiencing the same issue occasionally... What is strange, is we will sometimes have several tests that are performed by one agent, and one test will time out and show unknown and the other will show up as OK. I'm kind of lost on it, the only thing I can think of is to play around with the settings pertaining to the number of tests that are initiated at once. |
|
Back to top |
|
|
KS-Soft
Joined: 03 Apr 2002 Posts: 12817 Location: USA
|
Posted: Tue Dec 16, 2008 11:47 am Post subject: |
|
|
Could you please answer our questions?
Regards
Alex |
|
Back to top |
|
|
ldean
Joined: 14 Nov 2008 Posts: 17
|
Posted: Tue Dec 16, 2008 2:43 pm Post subject: |
|
|
I don't want to hijack this thread, but in the interest of getting a resolution for both of us, here are our details.
The issue has occurred on several different environments. All use active RMA.
Here is one example of what happens when the issue occurs:
Code: |
Test: server.domain.local C Drive
Method: Drive space
12/15/2008 6:34:41 PM Unknown Timed out
12/15/2008 6:44:42 PM Ok 23 Gb
12/15/2008 6:56:37 PM Unknown RMA not connected
12/15/2008 6:58:56 PM Ok 23 Gb
12/16/2008 12:00:18 AM Ok 23 Gb |
That particular test was on a server08 machine, however it occurred at the same time on several servers, the rest of which are all 2003. ie,
Code: |
Test: WMI Service
Method: check service
12/14/2008 12:00:04 AM Ok 0 ms
12/15/2008 12:00:07 AM Ok 0 ms
12/15/2008 6:56:37 PM Unknown RMA not connected
12/15/2008 6:59:11 PM Ok 0 ms
12/16/2008 12:00:17 AM Ok 0 ms |
What would be a good setting for the max # of tests initiated at once? could lowering it or raising it have an effect? I have set it to 100, and the problem hasn't reoccurred, yet... |
|
Back to top |
|
|
KS-Soft
Joined: 03 Apr 2002 Posts: 12817 Location: USA
|
Posted: Tue Dec 16, 2008 2:59 pm Post subject: |
|
|
Looks like connection was dropped and Agent could not reconnect for several minutes.
Could you check RMA logs? By default these text logs stored in the same folder where agent is installed (unless you changed location).
Quote: | What would be a good setting for the max # of tests initiated at once? could lowering it or raising it have an effect? I have set it to 100, and the problem hasn't reoccurred, yet... |
I think there is some external problem: network error, firewall or antivirus monitor...
Do you have antivirus monitor installed on the systems?
What version of HostMonitor and RMA do you use?
Regards
Alex |
|
Back to top |
|
|
ldean
Joined: 14 Nov 2008 Posts: 17
|
Posted: Tue Dec 16, 2008 3:19 pm Post subject: |
|
|
but I dont understand how out of 2 tests from one single RMA, one will come back OK and one will come back unknown, when the tests run at the same time?
edit: all RMA's and HM are on the latest available versions
Also: we added today a simple connectivity test for each of our remote clients today, which just pings 127.0.0.1 and returns the result to HM. I was watching it, and as it came time for the test to run, many of the agents showed "checking" for a long time, and eventually went to unknown. I am connected through remote desktop to those agents tho, so I know it can't be a network connection issue |
|
Back to top |
|
|
ldean
Joined: 14 Nov 2008 Posts: 17
|
Posted: Tue Dec 16, 2008 3:25 pm Post subject: |
|
|
Here are a couple of the test stsatistics for the connectivity tests. Here are 2 separate RMA's:
Code: | 2/16/2008 1:43:30 PM Host is alive 16 ms
12/16/2008 1:43:30 PM Host is alive 0 ms
12/16/2008 2:01:33 PM Unknown Timed out
12/16/2008 2:04:35 PM Host is alive 0 ms
12/16/2008 4:14:55 PM Unknown Timed out
12/16/2008 4:14:55 PM Unknown Timed out
12/16/2008 4:14:55 PM Unknown Timed out
12/16/2008 4:14:55 PM Unknown Timed out
12/16/2008 4:14:55 PM Unknown Timed out
12/16/2008 4:14:55 PM Unknown Timed out
12/16/2008 4:14:55 PM Unknown Timed out
12/16/2008 4:15:17 PM Host is alive 0 ms
12/16/2008 4:15:20 PM Host is alive 0 ms |
Code: | 12/16/2008 1:43:30 PM Host is alive 16 ms
12/16/2008 1:43:30 PM Host is alive 16 ms
12/16/2008 1:43:30 PM Host is alive 0 ms
12/16/2008 2:01:33 PM Unknown Timed out
12/16/2008 2:04:35 PM Host is alive 0 ms
12/16/2008 4:14:55 PM Unknown Timed out
12/16/2008 4:14:55 PM Unknown Timed out
12/16/2008 4:14:55 PM Unknown Timed out
12/16/2008 4:14:55 PM Unknown Timed out
12/16/2008 4:14:55 PM Unknown Timed out
12/16/2008 4:14:55 PM Unknown Timed out
12/16/2008 4:14:55 PM Unknown Timed out
12/16/2008 4:15:17 PM Host is alive 0 ms
12/16/2008 4:15:20 PM Host is alive 0 ms |
As I was watching, and they said checking, I right clicked on one and told it to refresh selected test, and it returned to OK.
EDIT: I'm not sure if the statistics are accurate... the tests have the same test name, perhaps that is throwing it off and giving me the stats for the same test? |
|
Back to top |
|
|
ldean
Joined: 14 Nov 2008 Posts: 17
|
Posted: Tue Dec 16, 2008 3:44 pm Post subject: |
|
|
The issue is occurring at the moment for me. I have an agent which in HM, had its tests come back as unknown, and it says RMA not connected, however in the RMA manager, it shows connected. Any idea what could cause that? Also, this test is supposed to run every couple minutes, but if I dont do anything, it just stays on unknown. If I manually refresh it, it will come back as OK. (BTW, I have access to the remote network via RDP, and it is connected still as well)
Any help is appreciated. |
|
Back to top |
|
|
KS-Soft
Joined: 03 Apr 2002 Posts: 12817 Location: USA
|
Posted: Tue Dec 16, 2008 9:45 pm Post subject: |
|
|
Quote: | but I dont understand how out of 2 tests from one single RMA, one will come back OK and one will come back unknown, when the tests run at the same time? |
According to your previous post 2 different test items failed at the same time
================
Test: server.domain.local C Drive Method: Drive space
12/15/2008 6:56:37 PM Unknown RMA not connected
...
Test: WMI Service Method: check service
12/15/2008 6:56:37 PM Unknown RMA not connected
================
Could you please check RMA logs?
Quote: | Also: we added today a simple connectivity test for each of our remote clients today, which just pings 127.0.0.1 and returns the result to HM. I was watching it, and as it came time for the test to run, many of the agents showed "checking" for a long time, and eventually went to unknown. I am connected through remote desktop to those agents tho, so I know it can't be a network connection issue |
Well. "Timed out" and "RMA not connected" thats 2 different problems.
1) "RMA not connected" means agent cannot connect to HostMonitor for several minutes. Actually this error may appear right after connection drop if you manually force test to be "refreshed".
If you check RMA logs, probably you may find some more information why RMA cannot reconnect
2) "Timed out" means agent did not return test result within 15 min. That looks strange...
Could you try to setup Passive RMA instead of Active? Just for testing...
Quote: | EDIT: I'm not sure if the statistics are accurate... the tests have the same test name, perhaps that is throwing it off and giving me the stats for the same test? |
Its better to use unique test names. Patterns can help you to do this
http://www.ks-soft.net/hostmon.eng/mframe.htm#patterns.htm
You may use Quick Log to check latest test results for specific item.
http://www.ks-soft.net/hostmon.eng/mframe.htm#testlist.htm#QuickLogPane
Quote: | The issue is occurring at the moment for me. I have an agent which in HM, had its tests come back as unknown, and it says RMA not connected, however in the RMA manager, it shows connected. Any idea what could cause that? |
That's possible. 2 different applications uses different sockets (TCP ports) and/or IP addresses.
As I said several times - lets check the logs
Quote: | Also, this test is supposed to run every couple minutes, but if I dont do anything, it just stays on unknown. If I manually refresh it, it will come back as OK. (BTW, I have access to the remote network via RDP, and it is connected still as well) |
Could you please start Auditing? Menu View -> Auditing tool.
Any warnings?
Regards
Alex |
|
Back to top |
|
|
ldean
Joined: 14 Nov 2008 Posts: 17
|
Posted: Wed Dec 17, 2008 8:57 am Post subject: |
|
|
auditing shows no issues except that a couple alert wav files could not be found.
i checked the RMA on one server that we have specifically been having issues with... this is, i believe, the only one that has had the "not connected" issue.... the others are just timing out.. but here is a segment of that log, maybe you can tell me what it means.
Code: | [12/16/2008 9:43 PM] SERVER2.flnet.local Decode error: Cannot read data (RMA Manager)
[12/16/2008 10:14 PM] SERVER2.flnet.local Decode error: Cannot read data
[12/16/2008 10:14 PM] SERVER2.flnet.local Decode error: Cannot read data (RMA Manager)
[12/16/2008 10:15 PM] SERVER2.domain.local Connection error
[12/16/2008 11:58 PM] SERVER2.domain.local Connection error
[12/17/2008 2:31 AM] SERVER2.domain.local Decode error: Cannot read data
[12/17/2008 2:31 AM] SERVER2.domain.local Decode error: Cannot read data (RMA Manager)
[12/17/2008 2:31 AM] SERVER2.domain.local Connection error
[12/17/2008 5:59 AM] SERVER2.domain.local Decode error: Cannot read data
[12/17/2008 9:29 AM] SERVER2.domain.local Decode error: Cannot read data (RMA Manager) |
However, I do see similar errors on some other RMA logs:
Code: | [11/16/2008 11:44 AM] server.domain.com Decode error: Cannot read data. An existing connection was forcibly closed by the remote host.
[11/16/2008 11:45 AM] server.domain.com Connection error
[11/16/2008 11:45 AM] server.domain.com Connection error |
I also changed all the test names so there are no repeats.
I am out of my office atm, but I will try and set someone up for passive. We were hoping to avoid this, in order to avoid having to mess with firewall rules on all of our remote clients.
Thanks for your help |
|
Back to top |
|
|
KS-Soft
Joined: 03 Apr 2002 Posts: 12817 Location: USA
|
Posted: Wed Dec 17, 2008 11:46 am Post subject: |
|
|
Quote: | [12/17/2008 2:31 AM] SERVER2.domain.local Decode error: Cannot read data
[12/17/2008 2:31 AM] SERVER2.domain.local Decode error: Cannot read data (RMA Manager) |
It looks like some other application (not HostMonitor and not RMA Manager) accepted connection request from the agent
Not sure how this is possible....
I still think there is some external problem. Do you have installed some antivirus monitors, personal firewall, content monitoring software? Non stanard winsock components?
Regards
Alex |
|
Back to top |
|
|
doodleman99
Joined: 02 Sep 2008 Posts: 38
|
Posted: Thu Apr 16, 2009 4:30 am Post subject: work around |
|
|
i had the same problem which was a REAL PAIN !!!!
hated waking up in the morning with 400 emails on my blackberry.
aaaaaanyway... my workaround is to UnTick the "treat unknown reply as bad" in the properties of the tests.
it's not perfect. but it works for me |
|
Back to top |
|
|
|