High Availability configuration

BorisoE · Post by **BorisoE** » Thu Nov 19, 2009 9:41 pm

Hi There,

Does Advanced Host Monitor support “High Availability” (HA)?

HA example:
Topology:
- Operations center (production): AHM service (AHMS-M) ;
- Operations center (DR): AHM service (AHMS-S);
- Perimeter: passive/active RMAs;
- NOC office: RCC.

HA requirements:
- “Test” for HA availability monitoring;
- automated replication of “tests” databases between production and DR AHM instances (i.e. any change made at one AHM instance should be replicated to another if it’s available or log it if not (for further replication on availability restoration));
- Tools for “tests” databases synchronization and consistency checking;
- Replication of “events” for logging between AHM instances;
- Alarming and reporting from “active” AHM instances only to avoid duplications;
- Ability of RMAs to be connected to both production and DR AHM instances at the same time (!!!);
- Share RCCs licenses for production and DR AHM instances.

Is it possible to implement any of listed above?

Thank you –

Stoltze · Post by **Stoltze** » Fri Nov 20, 2009 12:31 am

Looks just like me needs as well...

KS-Soft · Post by **KS-Soft** » Fri Nov 20, 2009 1:35 pm

Sorry, HostMonitor is not a clustering monitoring solution yet. May be in version 9...
Some tasks can be easily implemented right now, some is pretty difficult.
E.g.
>>Ability of RMAs to be connected to both production and DR AHM instances at the same time
If you are talking about Passive RMA - no problem, agent can receive connections from several HostMonitors. If you are talking about Active RMA, then its not posible.

Regards
Alex

jivetolkein · Post by **jivetolkein** » Tue Nov 24, 2009 7:41 am

If you just need DR rather than HA (i.e. you can live with a few minutes of downtime) then using a shared file system for your HML files or a simple robocopy every few minutes (with a bit of rotation to avoid a corrupt file at the far end) might do the trick. Luckily the HM server isn't so complex so it's relatively easy to protect. We are running our production instance on an ESX cluster which offers a fair degree of redundancy in itself, and copy the files off .. I'm confident we could get it up and running manually on another server quicker than I could restore the BESR image we also take nightly.

The IP address change of the server is likely to be the biggest issue if your subnets aren't spanned to the DR site.

Else for true HA (99.99%) you might like to look at Lifekeeper - we use it for file share clustering mainly, but they seem to be capable of protecting any service (though it might need some work on the fail over scripting). Not tried it specifically for HM but it's somewhere to look first maybe.