KS-Soft. Network Management Solutions
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister    ProfileProfile    Log inLog in 

RMA Client in Unkown status causing flase alets

 
Post new topic   Reply to topic    KS-Soft Forum Index -> Configuration, Maintenance, Troubleshooting
View previous topic :: View next topic  
Author Message
andrep



Joined: 11 Dec 2008
Posts: 2

PostPosted: Thu Dec 11, 2008 3:04 am    Post subject: RMA Client in Unkown status causing flase alets Reply with quote

I have the lastest version and installed about 18 sites with 1 RMA on each site. 17 Test per site. Every now and then the RMA at sites does not respond and causes unknown status. When I force the test the its alive. It seems unstable ? What could be causing this ?
Back to top
View user's profile Send private message
KS-Soft Europe



Joined: 16 May 2006
Posts: 2832

PostPosted: Thu Dec 11, 2008 8:16 am    Post subject: Reply with quote

Could you provide more information, please?
- Do you use Active or Passive RMA?
- What exact error message do you see in "Reply field" of the test? "Connection error"? Other?

Regards,
Max
Back to top
View user's profile Send private message Send e-mail Visit poster's website
KS-Soft



Joined: 03 Apr 2002
Posts: 12412
Location: USA

PostPosted: Thu Dec 11, 2008 11:24 am    Post subject: Reply with quote

Also, do you have installed some antivirus monitors, personal firewall, content monitoring software? Non stanard winsock components?
What exactly test method(s) do you use?
What Windows is installed on local and remote systems? Service Pack?

Regards
Alex
Back to top
View user's profile Send private message Visit poster's website
ldean



Joined: 14 Nov 2008
Posts: 17

PostPosted: Tue Dec 16, 2008 10:14 am    Post subject: Reply with quote

I'm experiencing the same issue occasionally... What is strange, is we will sometimes have several tests that are performed by one agent, and one test will time out and show unknown and the other will show up as OK. I'm kind of lost on it, the only thing I can think of is to play around with the settings pertaining to the number of tests that are initiated at once.
Back to top
View user's profile Send private message
KS-Soft



Joined: 03 Apr 2002
Posts: 12412
Location: USA

PostPosted: Tue Dec 16, 2008 11:47 am    Post subject: Reply with quote

Could you please answer our questions?

Regards
Alex
Back to top
View user's profile Send private message Visit poster's website
ldean



Joined: 14 Nov 2008
Posts: 17

PostPosted: Tue Dec 16, 2008 2:43 pm    Post subject: Reply with quote

I don't want to hijack this thread, but in the interest of getting a resolution for both of us, here are our details.

The issue has occurred on several different environments. All use active RMA.
Here is one example of what happens when the issue occurs:
Code:

Test: server.domain.local C Drive
Method: Drive space

12/15/2008 6:34:41 PM   Unknown   Timed out
12/15/2008 6:44:42 PM   Ok   23 Gb
12/15/2008 6:56:37 PM   Unknown   RMA not connected
12/15/2008 6:58:56 PM   Ok   23 Gb
12/16/2008 12:00:18 AM   Ok   23 Gb


That particular test was on a server08 machine, however it occurred at the same time on several servers, the rest of which are all 2003. ie,
Code:

Test: WMI Service
Method: check service

12/14/2008 12:00:04 AM   Ok   0 ms
12/15/2008 12:00:07 AM   Ok   0 ms
12/15/2008 6:56:37 PM   Unknown   RMA not connected
12/15/2008 6:59:11 PM   Ok   0 ms
12/16/2008 12:00:17 AM   Ok   0 ms


What would be a good setting for the max # of tests initiated at once? could lowering it or raising it have an effect? I have set it to 100, and the problem hasn't reoccurred, yet...
Back to top
View user's profile Send private message
KS-Soft



Joined: 03 Apr 2002
Posts: 12412
Location: USA

PostPosted: Tue Dec 16, 2008 2:59 pm    Post subject: Reply with quote

Quote:
RMA not connected

Looks like connection was dropped and Agent could not reconnect for several minutes.
Could you check RMA logs? By default these text logs stored in the same folder where agent is installed (unless you changed location).

Quote:
What would be a good setting for the max # of tests initiated at once? could lowering it or raising it have an effect? I have set it to 100, and the problem hasn't reoccurred, yet...

I think there is some external problem: network error, firewall or antivirus monitor...
Do you have antivirus monitor installed on the systems?
What version of HostMonitor and RMA do you use?

Regards
Alex
Back to top
View user's profile Send private message Visit poster's website
ldean



Joined: 14 Nov 2008
Posts: 17

PostPosted: Tue Dec 16, 2008 3:19 pm    Post subject: Reply with quote

but I dont understand how out of 2 tests from one single RMA, one will come back OK and one will come back unknown, when the tests run at the same time?

edit: all RMA's and HM are on the latest available versions

Also: we added today a simple connectivity test for each of our remote clients today, which just pings 127.0.0.1 and returns the result to HM. I was watching it, and as it came time for the test to run, many of the agents showed "checking" for a long time, and eventually went to unknown. I am connected through remote desktop to those agents tho, so I know it can't be a network connection issue
Back to top
View user's profile Send private message
ldean



Joined: 14 Nov 2008
Posts: 17

PostPosted: Tue Dec 16, 2008 3:25 pm    Post subject: Reply with quote

Here are a couple of the test stsatistics for the connectivity tests. Here are 2 separate RMA's:
Code:
2/16/2008 1:43:30 PM   Host is alive   16 ms
12/16/2008 1:43:30 PM   Host is alive   0 ms
12/16/2008 2:01:33 PM   Unknown   Timed out
12/16/2008 2:04:35 PM   Host is alive   0 ms
12/16/2008 4:14:55 PM   Unknown   Timed out
12/16/2008 4:14:55 PM   Unknown   Timed out
12/16/2008 4:14:55 PM   Unknown   Timed out
12/16/2008 4:14:55 PM   Unknown   Timed out
12/16/2008 4:14:55 PM   Unknown   Timed out
12/16/2008 4:14:55 PM   Unknown   Timed out
12/16/2008 4:14:55 PM   Unknown   Timed out
12/16/2008 4:15:17 PM   Host is alive   0 ms
12/16/2008 4:15:20 PM   Host is alive   0 ms



Code:
12/16/2008 1:43:30 PM   Host is alive   16 ms
12/16/2008 1:43:30 PM   Host is alive   16 ms
12/16/2008 1:43:30 PM   Host is alive   0 ms
12/16/2008 2:01:33 PM   Unknown   Timed out
12/16/2008 2:04:35 PM   Host is alive   0 ms
12/16/2008 4:14:55 PM   Unknown   Timed out
12/16/2008 4:14:55 PM   Unknown   Timed out
12/16/2008 4:14:55 PM   Unknown   Timed out
12/16/2008 4:14:55 PM   Unknown   Timed out
12/16/2008 4:14:55 PM   Unknown   Timed out
12/16/2008 4:14:55 PM   Unknown   Timed out
12/16/2008 4:14:55 PM   Unknown   Timed out
12/16/2008 4:15:17 PM   Host is alive   0 ms
12/16/2008 4:15:20 PM   Host is alive   0 ms


As I was watching, and they said checking, I right clicked on one and told it to refresh selected test, and it returned to OK.


EDIT: I'm not sure if the statistics are accurate... the tests have the same test name, perhaps that is throwing it off and giving me the stats for the same test?
Back to top
View user's profile Send private message
ldean



Joined: 14 Nov 2008
Posts: 17

PostPosted: Tue Dec 16, 2008 3:44 pm    Post subject: Reply with quote

The issue is occurring at the moment for me. I have an agent which in HM, had its tests come back as unknown, and it says RMA not connected, however in the RMA manager, it shows connected. Any idea what could cause that? Also, this test is supposed to run every couple minutes, but if I dont do anything, it just stays on unknown. If I manually refresh it, it will come back as OK. (BTW, I have access to the remote network via RDP, and it is connected still as well)

Any help is appreciated.
Back to top
View user's profile Send private message
KS-Soft



Joined: 03 Apr 2002
Posts: 12412
Location: USA

PostPosted: Tue Dec 16, 2008 9:45 pm    Post subject: Reply with quote

Quote:
but I dont understand how out of 2 tests from one single RMA, one will come back OK and one will come back unknown, when the tests run at the same time?

According to your previous post 2 different test items failed at the same time
================
Test: server.domain.local C Drive Method: Drive space
12/15/2008 6:56:37 PM Unknown RMA not connected
...
Test: WMI Service Method: check service
12/15/2008 6:56:37 PM Unknown RMA not connected
================

Could you please check RMA logs?

Quote:
Also: we added today a simple connectivity test for each of our remote clients today, which just pings 127.0.0.1 and returns the result to HM. I was watching it, and as it came time for the test to run, many of the agents showed "checking" for a long time, and eventually went to unknown. I am connected through remote desktop to those agents tho, so I know it can't be a network connection issue

Well. "Timed out" and "RMA not connected" thats 2 different problems.

1) "RMA not connected" means agent cannot connect to HostMonitor for several minutes. Actually this error may appear right after connection drop if you manually force test to be "refreshed".
If you check RMA logs, probably you may find some more information why RMA cannot reconnect

2) "Timed out" means agent did not return test result within 15 min. That looks strange...
Could you try to setup Passive RMA instead of Active? Just for testing...

Quote:
EDIT: I'm not sure if the statistics are accurate... the tests have the same test name, perhaps that is throwing it off and giving me the stats for the same test?

Its better to use unique test names. Patterns can help you to do this
http://www.ks-soft.net/hostmon.eng/mframe.htm#patterns.htm

You may use Quick Log to check latest test results for specific item.
http://www.ks-soft.net/hostmon.eng/mframe.htm#testlist.htm#QuickLogPane

Quote:
The issue is occurring at the moment for me. I have an agent which in HM, had its tests come back as unknown, and it says RMA not connected, however in the RMA manager, it shows connected. Any idea what could cause that?

That's possible. 2 different applications uses different sockets (TCP ports) and/or IP addresses.
As I said several times - lets check the logs

Quote:
Also, this test is supposed to run every couple minutes, but if I dont do anything, it just stays on unknown. If I manually refresh it, it will come back as OK. (BTW, I have access to the remote network via RDP, and it is connected still as well)

Could you please start Auditing? Menu View -> Auditing tool.
Any warnings?

Regards
Alex
Back to top
View user's profile Send private message Visit poster's website
ldean



Joined: 14 Nov 2008
Posts: 17

PostPosted: Wed Dec 17, 2008 8:57 am    Post subject: Reply with quote

auditing shows no issues except that a couple alert wav files could not be found.

i checked the RMA on one server that we have specifically been having issues with... this is, i believe, the only one that has had the "not connected" issue.... the others are just timing out.. but here is a segment of that log, maybe you can tell me what it means.
Code:
[12/16/2008 9:43 PM]   SERVER2.flnet.local   Decode error: Cannot read data (RMA Manager)
[12/16/2008 10:14 PM]   SERVER2.flnet.local   Decode error: Cannot read data
[12/16/2008 10:14 PM]   SERVER2.flnet.local   Decode error: Cannot read data (RMA Manager)
[12/16/2008 10:15 PM]   SERVER2.domain.local   Connection error
[12/16/2008 11:58 PM]   SERVER2.domain.local   Connection error
[12/17/2008 2:31 AM]   SERVER2.domain.local   Decode error: Cannot read data
[12/17/2008 2:31 AM]   SERVER2.domain.local   Decode error: Cannot read data (RMA Manager)
[12/17/2008 2:31 AM]   SERVER2.domain.local   Connection error
[12/17/2008 5:59 AM]   SERVER2.domain.local   Decode error: Cannot read data
[12/17/2008 9:29 AM]   SERVER2.domain.local   Decode error: Cannot read data (RMA Manager)


However, I do see similar errors on some other RMA logs:
Code:
[11/16/2008 11:44 AM]   server.domain.com   Decode error: Cannot read data. An existing connection was forcibly closed by the remote host.
[11/16/2008 11:45 AM]   server.domain.com   Connection error
[11/16/2008 11:45 AM]   server.domain.com   Connection error


I also changed all the test names so there are no repeats.

I am out of my office atm, but I will try and set someone up for passive. We were hoping to avoid this, in order to avoid having to mess with firewall rules on all of our remote clients.

Thanks for your help
Back to top
View user's profile Send private message
KS-Soft



Joined: 03 Apr 2002
Posts: 12412
Location: USA

PostPosted: Wed Dec 17, 2008 11:46 am    Post subject: Reply with quote

Quote:
[12/17/2008 2:31 AM] SERVER2.domain.local Decode error: Cannot read data
[12/17/2008 2:31 AM] SERVER2.domain.local Decode error: Cannot read data (RMA Manager)

It looks like some other application (not HostMonitor and not RMA Manager) accepted connection request from the agent
Not sure how this is possible....
I still think there is some external problem. Do you have installed some antivirus monitors, personal firewall, content monitoring software? Non stanard winsock components?

Regards
Alex
Back to top
View user's profile Send private message Visit poster's website
doodleman99



Joined: 02 Sep 2008
Posts: 38

PostPosted: Thu Apr 16, 2009 4:30 am    Post subject: work around Reply with quote

i had the same problem which was a REAL PAIN !!!!
hated waking up in the morning with 400 emails on my blackberry.

aaaaaanyway... my workaround is to UnTick the "treat unknown reply as bad" in the properties of the tests.
it's not perfect. but it works for me
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    KS-Soft Forum Index -> Configuration, Maintenance, Troubleshooting All times are GMT - 6 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group

KS-Soft Forum Index