KS-Soft. Network Management Solutions
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister    ProfileProfile    Log inLog in 

Overall SLA based on dependent tests

 
Post new topic   Reply to topic    KS-Soft Forum Index -> Reports
View previous topic :: View next topic  
Author Message
peterjwest



Joined: 28 Jul 2008
Posts: 17

PostPosted: Thu Mar 21, 2013 7:33 am    Post subject: Overall SLA based on dependent tests Reply with quote

Hi,

I'm trying to achieve something specific using Host Monitor but I have no idea if it's possible or not. Hopefully some kind soul on here might be able to give me a clue.

We have a complex Citrix environment which we would like to be able to give an 'overall' SLA report for.

My original idea was to make tests for each of the elements of the system that may affect the uptime of the system overall. I could then build the dependency hierarchy using the dependency section of the host monitor test and this would ultimately mean that we could just report on the top-level test and this should give us an overview of the uptime.

Unfortunately what actually seems to happen is that if a dependent test goes 'bad' then the parent test just doesn't run until it goes 'good' again. What this means is that the very top level test thinks we have 100% uptime when in fact we have had periods of downtime lower down the structure. They just don't get picked up because the top level test doesn't run.

I'm not sure i've explained this very well - but hopefully you get the idea. If not then please let me know and i'll try to give a better explanation.

Thanks

Pete
Back to top
View user's profile Send private message
KS-Soft



Joined: 03 Apr 2002
Posts: 12790
Location: USA

PostPosted: Thu Mar 21, 2013 7:51 am    Post subject: Reply with quote

Quote:
Unfortunately what actually seems to happen is that if a dependent test goes 'bad' then the parent test just doesn't run until it goes 'good' again.

Actually it works other way around.
E.g. you setup Ping test as Master test for CPU Usage, Service, Process tests. If Ping test returns "no answer" status, dependant tests will not be performed.
If you mark "Synchronize counters" option for CPU Usage, Service, Process tests then HostMonitor will increment "Bad" counters for dependant tests as well.
If you want some SLA report with "uptime" related to dependant test items, I do not see any problems.

Quote from the manual
Quote:
Synchronize counters
This option only applies to tests that have one or more master tests. When the option is turned off, and some test is not being launched because its launch condition has not been met, HostMonitors simply marks such a test with the "Wait for Master" status and does not change any counters. If, however, the option is turned on, HostMonitor will update statistics information accordingly to the Master test status. Thus, if a router on which other tests depend has been tested to a "No answer" status, HostMonitor will increment respective counters (like "Dead time", "Failed tests", etc) for router and for all dependent tests with the "Synchronize counters" option on.


Regards
Alex
Back to top
View user's profile Send private message Visit poster's website
peterjwest



Joined: 28 Jul 2008
Posts: 17

PostPosted: Fri Apr 05, 2013 4:52 am    Post subject: Reply with quote

Hi Alex,

Thanks for taking the time to reply - given your response I think I understand the setup a bit better now.

But the issue I now face is that I can only have 20 dependent tests - and it looks like making a hierarchy to handle the limit won't work.

I wish to aggregate the results from tests over 4 hosts and on each host I have 7 or 8 different tests. This gives me a total of 30+.

My original idea was to make it so that the primary test on each host would be a 'ping' test and this would be dependent on the other tests for that host. It seems to work because I can see that if the disk space test (which is a 'child' of the ping) fails then the failure count also updates on the Ping test.

But the problem is that the top level test doesn't also update. So although hosts 1,2 3 and 4 may have failure coulds of 5,5,10 and 5 respectively we only ever have a failure could of zero at the top level (in fact the count at the top level won't match any of these values because it only should increment by 1 irrespective of how many child tests have failed).

So to generate our SLA report for Citrix we need to limit the number of tests for the SLA to no more than 20 it would seem.

I don't know if i'm missing something so if you have any more ideas they would be most welcomed.

Thanks

Pete
Back to top
View user's profile Send private message
KS-Soft



Joined: 03 Apr 2002
Posts: 12790
Location: USA

PostPosted: Fri Apr 05, 2013 7:24 am    Post subject: Reply with quote

Sorry, I don't understand you
There are no limit for dependant tests, 100,000 test items can depend on single or several master tests. May be you are using "dependant" word instead of "master"?

Also I do not understand what master-dependant relation has to do with your report. If you want to create report for some group of tests, put these tests into folder and use folder level options (or "Generate reports" action) to create report for these tests.

Regards
Alex
Back to top
View user's profile Send private message Visit poster's website
peterjwest



Joined: 28 Jul 2008
Posts: 17

PostPosted: Fri Apr 05, 2013 9:41 am    Post subject: Reply with quote

I totally understand your confusion - it's proving hard for me to explain.

The issue is that if you perform an SLA Report on a number of tests that relate to a single system then you don't really get a true picture of the systems availability.

If, for example, you monitor a number of Services and then also perform a Ping test then all of those will show downtime if the Server isn't online.

What i'm trying to achieve is a single 'parent' test that will give the overall availability of the system based on a number of dependent tests.

The concept is that if one Service is offline then the parent would reflect that. If two Services were down then the parent would still show the same result because it doesn't matter if one or two services is offline, the failure of either one would ultimately result in downtime of the system.

I'm basically trying to build a structure which gives a very simple SLA report for a complex system - the people seeing these reports don't care what component of the system was down - they just want to know what %age uptime we have on our Citrix Environment for the month of January, February or whatever.

The sychronisation of counter data appears to always be passed 'up' the chain of tests so the only way for me to do what i'm attempting is to make sure that my top-level test is dependent on the tests below it. And this is where the 20 items limit kicks in.
Back to top
View user's profile Send private message
KS-Soft



Joined: 03 Apr 2002
Posts: 12790
Location: USA

PostPosted: Fri Apr 05, 2013 9:47 am    Post subject: Reply with quote

Quote:
The issue is that if you perform an SLA Report on a number of tests that relate to a single system then you don't really get a true picture of the systems availability.
If, for example, you monitor a number of Services and then also perform a Ping test then all of those will show downtime if the Server isn't online.

IMHO its a true picture - if server does not respond then its not available.

Quote:
The concept is that if one Service is offline then the parent would reflect that. If two Services were down then the parent would still show the same result because it doesn't matter if one or two services is offline, the failure of either one would ultimately result in downtime of the system

Do you mean you need item that will be "bad" when ANY of checks for some specific server is "bad"?
And this test should be "good" if ALL checks for some specific server is "good"?

Regards
Alex
Back to top
View user's profile Send private message Visit poster's website
peterjwest



Joined: 28 Jul 2008
Posts: 17

PostPosted: Fri Apr 05, 2013 10:15 am    Post subject: Reply with quote

KS-Soft wrote:
Do you mean you need item that will be "bad" when ANY of checks for some specific server is "bad"?
And this test should be "good" if ALL checks for some specific server is "good"?

That's pretty much it

We don't care how many checks fail - if any of them fail then the environment is not available.
Back to top
View user's profile Send private message
KS-Soft



Joined: 03 Apr 2002
Posts: 12790
Location: USA

PostPosted: Fri Apr 05, 2013 10:33 am    Post subject: Reply with quote

Then you may create 1 additional test with predefined "good" result (e.g. Ping localhost) and use the following options
- This test depends on expression (%FolderCurrent_BadTests% + %FolderCurrent_UnknownTests% == 0) or ((%FolderCurrent_BadTests%==1) and ("%Status%"=="Bad"))
- Otherwise status: Bad
- Synchronize counters: On
- Synchronize status and alerts: On

This test will have "Bad" status when ANY other test within folder (folder where test located) has Bad or Unknown status.
This test will have "Host is alive" status when ALL other tests within folder have "Host is alive", "Ok", "Normal", "Disabled"... statuses

Regards
Alex
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic    KS-Soft Forum Index -> Reports All times are GMT - 6 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group

KS-Soft Forum Index