High Availability configuration

All questions related to installations, configurations and maintenance of Advanced Host Monitor (including additional tools such as RMA for Windows, RMA Manager, Web Servie, RCC).
Post Reply
BorisoE
Posts: 1
Joined: Thu Nov 19, 2009 9:00 pm

High Availability configuration

Post by BorisoE »

Hi There,

Does Advanced Host Monitor support “High Availability” (HA)?

HA example:
Topology:
- Operations center (production): AHM service (AHMS-M) ;
- Operations center (DR): AHM service (AHMS-S);
- Perimeter: passive/active RMAs;
- NOC office: RCC.

HA requirements:
- “Test” for HA availability monitoring;
- automated replication of “tests” databases between production and DR AHM instances (i.e. any change made at one AHM instance should be replicated to another if it’s available or log it if not (for further replication on availability restoration));
- Tools for “tests” databases synchronization and consistency checking;
- Replication of “events” for logging between AHM instances;
- Alarming and reporting from “active” AHM instances only to avoid duplications;
- Ability of RMAs to be connected to both production and DR AHM instances at the same time (!!!);
- Share RCCs licenses for production and DR AHM instances.


Is it possible to implement any of listed above?

Thank you –
User avatar
Stoltze
Posts: 174
Joined: Tue Feb 03, 2004 1:58 am
Location: Denmark

Post by Stoltze »

Looks just like me needs as well... :)
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

Sorry, HostMonitor is not a clustering monitoring solution yet. May be in version 9...
Some tasks can be easily implemented right now, some is pretty difficult.
E.g.
>>Ability of RMAs to be connected to both production and DR AHM instances at the same time
If you are talking about Passive RMA - no problem, agent can receive connections from several HostMonitors. If you are talking about Active RMA, then its not posible.

Regards
Alex
jivetolkein
Posts: 96
Joined: Thu Jul 19, 2007 4:35 am

Post by jivetolkein »

If you just need DR rather than HA (i.e. you can live with a few minutes of downtime) then using a shared file system for your HML files or a simple robocopy every few minutes (with a bit of rotation to avoid a corrupt file at the far end) might do the trick. Luckily the HM server isn't so complex so it's relatively easy to protect. We are running our production instance on an ESX cluster which offers a fair degree of redundancy in itself, and copy the files off .. I'm confident we could get it up and running manually on another server quicker than I could restore the BESR image we also take nightly.

The IP address change of the server is likely to be the biggest issue if your subnets aren't spanned to the DR site.

Else for true HA (99.99%) you might like to look at Lifekeeper - we use it for file share clustering mainly, but they seem to be capable of protecting any service (though it might need some work on the fail over scripting). Not tried it specifically for HM but it's somewhere to look first maybe.
Post Reply