Hi!
After setting up LDAP tests to some 50 machines, I've ben seeing some failures, but in groups of 3 to 5 at a time. If I refresh, response is ok, but it looks like if there is no 'retry limit' and somehow I'm loosing answers.
I've even seens ome 'Unknown host' as a status, even tho the host is in 'ip address' format, no DNS involved.
HM 5.92 is running on a W2003 server alone (so there should be no problems with that).
Is there a way to set a retry value for LDAP test? Am I having some 'port issue' in the HM machine?
Help would be greatly apreciated!
Regards
LDAP test failures
-
- Posts: 2832
- Joined: Tue May 16, 2006 4:41 am
- Contact:
Re: LDAP test failures
Have you installed SP1 on 2003 server?Ismael wrote:After setting up LDAP tests to some 50 machines, I've ben seeing some failures, but in groups of 3 to 5 at a time. If I refresh, response is ok, but it looks like if there is no 'retry limit' and somehow I'm loosing answers.
I've even seens ome 'Unknown host' as a status, even tho the host is in 'ip address' format, no DNS involved.
HM 5.92 is running on a W2003 server alone (so there should be no problems with that).
Do you use "Perform search operation" option?
In such case you may assign action "Repeat test" to profile. http://www.ks-soft.net/hostmon.eng/mfra ... #actRepeatIsmael wrote:Is there a way to set a retry value for LDAP test?
E.g. Your profile should contain two actions:
1. Repeat test
2. Send Email with "Start when N consecutive Bad/Good results occur" option specified as 2 or 3.
Have you installed any firewall, antivirus or content management software on HM machine? It might lead to some problems.Ismael wrote:Am I having some 'port issue' in the HM machine
Regards,
Max
Hi again, and thanks for your promt answer!..
Lets see, 2003 sp1 is installed, yes, and I perform a search operation in all af them.
(as you may see, I've already put a 'repeat test 3 times' action
)
There is no specific firewall, and only Mcafee Viruscan 8.0 (standard equipment
)
Will post on the results of the action profile (right now I don't have any 'failures' going on, so I cant tell)
Thanks again
Ismael
Lets see, 2003 sp1 is installed, yes, and I perform a search operation in all af them.
Code: Select all
; ------- Test #01 -------
Method = LDAP
;--- Common properties ---
Title = SSCC_NDS_LDAP
Comment = SSCC_NDS^MLDAP
RelatedURL =
ScheduleMode= Regular
Schedule =
Interval = 300
Alerts = Repeat test 3 times
ReverseAlert= No
UnknownIsBad= Yes
UseCommonLog= Yes
PrivLogMode = Default
CommLogMode = Default
;--- Test specific properties ---
Server = 192.168.254.77
Port = 389
Timeout = 60
Password =
search = Yes
baseobject = ou=sscc,o=ajbarna
searchfilter= (cn=SSCC_NDS)
resultslimit= 1

There is no specific firewall, and only Mcafee Viruscan 8.0 (standard equipment

Will post on the results of the action profile (right now I don't have any 'failures' going on, so I cant tell)
Thanks again
Ismael
-
- Posts: 2832
- Joined: Tue May 16, 2006 4:41 am
- Contact:
Yes. I see. Test params are Ok. Nothing special.Ismael wrote:as you may see, I've already put a 'repeat test 3 times' action
McAfee uses Winsock filter that could lead to such problem. Could you stop McAfee and try to perform LDAp test again? Anyway you should adjust McAfee Winsock filter to pass all HostMonitor's requests thru, if it possible.Ismael wrote:There is no specific firewall, and only Mcafee Viruscan 8.0 (standard equipment)
Regards,
Max
..... better statistics-wise, but still the same error
Now the eror message goes away in a couple of seconds, but still, the log gets full of 'ldap errors'.
As you can see, there are different networks involved and different servers (in groups, they seem to fail at the same time!)
I have McAfee disabled completelly.
Regards
Ismael
[/img]
Now the eror message goes away in a couple of seconds, but still, the log gets full of 'ldap errors'.
Code: Select all
[06/06/2006 15:42:33] COPIA2_LDAP No answer LDAP test (172.16.2.50)
[06/06/2006 15:42:33] NWIDM_LDAP No answer LDAP test (192.168.254.26)
[06/06/2006 15:42:33] APPS9_LDAP No answer LDAP test (192.168.254.179)
[06/06/2006 15:42:33] APPS_LDAP No answer LDAP test (192.168.254.150)
[06/06/2006 15:42:33] ESCAR_LDAP No answer LDAP test (172.16.2.202)
[06/06/2006 15:42:33] GLORIAS3A_LDAP No answer LDAP test (172.16.2.71)
[06/06/2006 15:42:33] GLORIAS_DSV_LDAP No answer LDAP test (172.16.16.53)
[06/06/2006 15:42:35] APPS_LDAP Host is alive 1 LDAP test (192.168.254.150)
[06/06/2006 15:42:35] COPIA2_LDAP Host is alive 1 LDAP test (172.16.2.50)
[06/06/2006 15:42:35] ESCAR_LDAP Host is alive 1 LDAP test (172.16.2.202)
[06/06/2006 15:42:35] GLORIAS3A_LDAP Host is alive 1 LDAP test (172.16.2.71)
[06/06/2006 15:42:35] APPS9_LDAP Host is alive 1 LDAP test (192.168.254.179)
[06/06/2006 15:42:35] GLORIAS_DSV_LDAP Host is alive 1 LDAP test (172.16.16.53)
[06/06/2006 15:42:35] NWIDM_LDAP Host is alive 1 LDAP test (192.168.254.26)
[06/06/2006 15:47:33] HORTA-GUINARDO_LDAP No answer LDAP test (10.7.16.1)
[06/06/2006 15:47:33] GUB_NW411_LDAP No answer LDAP test (10.13.16.1)
[06/06/2006 15:47:33] SANTS_LDAP No answer LDAP test (10.3.16.1)
[06/06/2006 15:47:33] TEST_NT_LDAP No answer LDAP test (172.16.2.56)
[06/06/2006 15:47:33] GLORIAS1A_LDAP No answer LDAP test (172.16.2.63)
[06/06/2006 15:47:33] SSPP1_LDAP No answer LDAP test (10.20.16.1)
[06/06/2006 15:47:34] GLORIAS_NDS_LDAP No answer LDAP test (172.16.5.26)
[06/06/2006 15:47:35] SANTS_LDAP Host is alive 1 LDAP test (10.3.16.1)
[06/06/2006 15:47:35] GLORIAS1A_LDAP Host is alive 1 LDAP test (172.16.2.63)
[06/06/2006 15:47:35] TEST_NT_LDAP Host is alive 1 LDAP test (172.16.2.56)
[06/06/2006 15:47:35] GUB_NW411_LDAP Host is alive 1 LDAP test (10.13.16.1)
[06/06/2006 15:47:35] SSPP1_LDAP Host is alive 1 LDAP test (10.20.16.1)
[06/06/2006 15:47:36] HORTA-GUINARDO_LDAP Host is alive 1 LDAP test (10.7.16.1)
[06/06/2006 15:47:36] GLORIAS_NDS_LDAP Host is alive 1 LDAP test (172.16.5.26)
I have McAfee disabled completelly.
Regards
Ismael
[/img]
-
- Posts: 2832
- Joined: Tue May 16, 2006 4:41 am
- Contact:
Did you try to increase timeout?Ismael wrote:As you can see, there are different networks involved and different servers (in groups, they seem to fail at the same time!)
What another applications are running on that system? Could one of them takes 100% CPU time after time?
Could you check Security Event Log on one of remote servers regarding logon/logoff records from HostMonitor at certain time when test failed?
Regards,
Max
Hi again.
This is a monitoring server with only HostMonitor and a SQL server 2005 (and an ODBC 'logging' to this server locally). It is a 2*3GHz CPU with 2GB of RAM.
It rarelly goes above 2% CPU.
The requests are not being logged in the LDAP servers (in fact those are Netware servers), it looks like they don't get there. As I posted in the first message, sometimes the answer is not a 'no response', instead, I receive an 'unknown host'
The timeout is set standard (60 seconds?) and I haven't played with it, but, anyway, it looks like a minute is quite a lot already.
Regards
Ismael
This is a monitoring server with only HostMonitor and a SQL server 2005 (and an ODBC 'logging' to this server locally). It is a 2*3GHz CPU with 2GB of RAM.
It rarelly goes above 2% CPU.
The requests are not being logged in the LDAP servers (in fact those are Netware servers), it looks like they don't get there. As I posted in the first message, sometimes the answer is not a 'no response', instead, I receive an 'unknown host'
Code: Select all
[06/06/2006 17:12:54] NWPARIS01_LDAP Unknown host LDAP test (192.168.254.154)
[06/06/2006 17:12:54] NWPARIS02_LDAP Unknown host LDAP test (192.168.254.160)
[06/06/2006 17:12:55] NWPARIS02_LDAP Host is alive 1 LDAP test (192.168.254.160)
[06/06/2006 17:12:55] NWPARIS01_LDAP Host is alive 1 LDAP test (192.168.254.154)
Regards
Ismael
-
- Posts: 2832
- Joined: Tue May 16, 2006 4:41 am
- Contact:
Yes, you are right. Minute is quite enough.Ismael wrote:The timeout is set standard (60 seconds?) and I haven't played with it, but, anyway, it looks like a minute is quite a lot already.
We cannot reproduce such problem on our test systems. We have set
20 LDAP tests up with time interval 10 sec and it have been working fine for 10 hours. Of course, sometimes tests with "search operation" return "bad" status, but in such circumstances it is a normal behaviour.
Anyway, we adopted your request and we will try to figure it out.
Regards,
Max