SMTP test up and down

All questions related to installations, configurations and maintenance of Advanced Host Monitor (including additional tools such as RMA for Windows, RMA Manager, Web Servie, RCC).
Post Reply
Robert_in_MTL
Posts: 229
Joined: Tue Jun 20, 2006 1:20 pm
Location: Montreal, Quebec

SMTP test up and down

Post by Robert_in_MTL »

Hi,

I am testing an SMTP server on a remote system (SMTP test) via a VPN with an agent. Everything seems in order:
  • The interval is set to 2 minutes.
  • The timeout is 60 sec
  • And the test has a master "ping" test.
  • And the "ping" test has a master test on the VPN connection (ping on the receiving interface on the remote router)
The agent in on a server on the same switch and rack then the target and has a backup agent on another server, also in the same switch / rack.

My problem is that the test regularly becomes unknown: maximum Alive duration is around 30~45 minutes, sometimes better sometimes worse, then a few bad, so HM displays RMA: cannot read data, so I only get an 87% alive%. The ping test is usualy very good (<50ms) during that time.

I have decreased the Email alert to 5 bad (it was 3 before) to reduce the false alert mails, also, I removed "Treat Unknown status as Bad" because the test seems to loose credibility as you will understand.

Could it be that HM has problems reading the Agent? If not, is my interval too tight? Or is the Agent overloaded? Do I have to set the test differently?

Would it be possible that the agents would keep let's say 1 hour of results if it does not get any reqests from HM? Then a reboot of the HM server would practically be seemless in the tests continuity, same for an RMA connexion loss.

I could send you logs if it helps (and if you have time to read these!)

While waiting for your answer, we will update and reboot the server during the maintenance window tonight after midnight, maybe it will help.

Again, thank you for your time,
KS-Soft Europe
Posts: 2832
Joined: Tue May 16, 2006 4:41 am
Contact:

Post by KS-Soft Europe »

Most likely this is Timeout issue. Please increase timeout specified for the agent. Please note: There are two timeouts are used. One of them is specified by rma_cfg utility (on system where agent is running). Second is specified on HostMonitor's side in "Agent Connection Parameters" dialog.
http://www.ks-soft.net/hostmon.eng/mfra ... #agentlist

Regards,
Max
Robert_in_MTL
Posts: 229
Joined: Tue Jun 20, 2006 1:20 pm
Location: Montreal, Quebec

Post by Robert_in_MTL »

Thank you for your reply Max,

I increased "Agent Connection Parameters" to 10 seconds instead of 5.

What is the use of the rma_cfg timeout on the remote machine?

Also, which one should I modify? Did I change the right one?

Thanks,
KS-Soft Europe
Posts: 2832
Joined: Tue May 16, 2006 4:41 am
Contact:

Post by KS-Soft Europe »

Robert_in_MTL wrote:I increased "Agent Connection Parameters" to 10 seconds instead of 5.
Actually, I think, 10 seconds is too small timeout. Probably, 30 or 60 sec would be enough. Quote from the manual:
http://www.ks-soft.net/hostmon.eng/mframe.htm#gentlist
================================
Timeout:
communication timeout in seconds. A maximum amount of time that HostMonitor will wait for an answer from the agent. Please note: this timeout should be big enough to allow an agent to perform a test before sending an answer to HostMonitor. When, for example, you are launching an external test and an external program needs 15 seconds to perform this test then the timeout should be set to 15 seconds plus an amount of time that is necessary for data exchange between HostMonitor and an agent.
================================
Robert_in_MTL wrote:What is the use of the rma_cfg timeout on the remote machine?
Quote from the manual:
http://www.ks-soft.net/hostmon.eng/rma- ... m#Settings
================================
Timeout
the maximum amount of time (in seconds) that agent will keep waiting for the complete request packet from HostMonitor (after initial TCP connection established) before dropping the connection.
================================
Robert_in_MTL wrote:Also, which one should I modify? Did I change the right one?
Yes, you did. You changed the right timeout.

Regard,
Max
Robert_in_MTL
Posts: 229
Joined: Tue Jun 20, 2006 1:20 pm
Location: Montreal, Quebec

Post by Robert_in_MTL »

Thank you,

very helpful answer, I increased all agents timeout to 30 seconds.

Even at 20 seconds, the SMTP test stopped timing out.

By the way, but it's not a big deal.http://www.ks-soft.net/hostmon.eng/mframe.htm#gentlist gets me a 404:

Code: Select all

Not Found
The requested URL /hostmon.eng/gentlist was not found on this server.
---
Tests seems to run smoother now.

Thank you for your help Max!
KS-Soft Europe
Posts: 2832
Joined: Tue May 16, 2006 4:41 am
Contact:

Post by KS-Soft Europe »

Robert_in_MTL wrote:By the way, but it's not a big deal.http://www.ks-soft.net/hostmon.eng/mframe.htm#gentlist gets me a 404:
Sorry, I gave you the wrong link. Here is a correct one: http://www.ks-soft.net/hostmon.eng/mfra ... #agentlist
Robert_in_MTL wrote:Thank you for your help Max!
You are welcome!

Regards,
Max
Post Reply