NORMAL Status and %Recurrences%

All questions related to installations, configurations and maintenance of Advanced Host Monitor (including additional tools such as RMA for Windows, RMA Manager, Web Servie, RCC).
JuergenF
Posts: 331
Joined: Sun Jan 26, 2003 6:00 pm
Location: Germany, North Rhine-Westphalia

Post by JuergenF »

Hi Alex,

many thanks for your explanations.

I see the problem with example b)
Maybe Thomas can give us an idea of what exactly he wanted to test.

What do you think when will you release the next Beta with the new variables ?

Best regards

Juergen
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

What do you think when will you release the next Beta with the new variables
We need good name for SSSBR varable. Do you have some ideas?
I propose %FailureNo% or %FailureInteration%

Regards
Alex
JuergenF
Posts: 331
Joined: Sun Jan 26, 2003 6:00 pm
Location: Germany, North Rhine-Westphalia

Post by JuergenF »

Dear Alex,

please
- describe in words, what exactly do you want to count.
- When is the Variable Reset and to which Value
- When is it incremented - compared to the evaluation of the Normal and Warning Expression.

Best regards

Juergen
thomasschmeidl
Posts: 166
Joined: Sat Apr 15, 2006 2:14 pm
Location: Germany, Bavaria

Post by thomasschmeidl »

Hi Alex, Hi Juergen,

I can follow many of your arguments, too

I agree that it's worth trying to work with udv-variables to substitute complex expressions (I must admit that I didn't think about that possibility)

To support all these users who didn't participate in that thread, I think it would be very helpful to provide detailed examples in the manual!

Now let's look at the examples:

Example a) can be solved by introducing a new variable. OK
Example c) can be solved by a quite simple expression (although one threshold is redundant to the threshold in the test properties). OK

Example b), however, is unsolved, although it is a quite common implementation: You ping a server. You have a "warning" threshold in case of slow replies and a "bad" threshold in case of very slow/no replies. And you consider the first two irregular replies as "normal" to avoid too many false negative alarms.

On the first sight a solution could be:
Test threshold 1000ms
- Use Normal if (%SSSBR%<3)
- Use Warning if (%Reply% >200) and (%Reply%<=1000)
But this would cause the following behaviour:
Reply: 100 - - - - 300 - - - - 300 - - - - 300 - - - - 1200 - - - - 1200 - - - 1200
Status: OK - WARNING - WARNING - WARNING - NORMAL - NORMAL - BAD
which is not what we expect.

What is the problem?
IMHO the problem is that ping (and several other tests) do not distinguish between timeout and test threshold. That's why the warning range cannot be above the test threshold (and %SuggestedSimpleStatus% will be OK for results within the Warning range).

If it were possible to e.g. set the test threshold to 200ms but have a timeout of 1000ms it would work:
Test timeout 1000ms
Test threshold 200ms
- Use Normal if (%SSSBR%<3)
- Use Warning if (%Reply% >200) and (%Reply%<=1000)
This would cause the expected behaviour:
But this would cause the following behaviour:
Reply: 100 - - - - 300 - - - - 300 - - - - 300 - - - - 1200 - - - - 1200 - - - 1200
Status: OK - NORMAL - - NORMAL - -WARNING - - BAD - - - BAD- - - BAD


@Alex:
Would it be possible to enhance test properties by a bad test threshold which is less than the timeout (if no bad test threshold is given, test will be bad when timeout is reached)?

@Alex:
%FailureIteration% or %FailureRecurrences% would be a good name

@Juergen:
%FailureIteration% should behave as follows:
It is reset to 0 after performing a test with %SuggestedSimpleStatus%==OK
It is incremented after performing a test with %SuggestedSimpleStatus%==BAD
It does not change after performing a test with %SuggestedSimpleStatus%==UNKNOWN

Regards

Thomas
Last edited by thomasschmeidl on Tue Dec 26, 2006 2:20 pm, edited 1 time in total.
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

But this would cause the following behaviour:
Reply: 100 - - - - 300 - - - - 300 - - - - 300 - - - - 1200 - - - - 1200 - - - 1200
Status: OK - NORMAL - - NORMAL - -WARNING - - BAD - - - BAD- - - BAD
What status should be used when response time shows 1100, 300, 300? Bad? Warning? Ok? What about 300, 1100, 300?

Regards
Alex
JuergenF
Posts: 331
Joined: Sun Jan 26, 2003 6:00 pm
Location: Germany, North Rhine-Westphalia

Post by JuergenF »

Hi Thomas,

it's not clear to me what you expect as the result.
Please fill in:
Reply: 100 - - - - 300 - - - - 300 - - - - 300 - - - - 300 - - - - 1200 - - - - 1200 - - - - 300 - - - - 1200
Status: OK - WARNING - WARNING - ??????? - ??????? - ???????? - ???????? - ???????? - ????????

Reply: 100 - - - - 1200 - - - - 1200 - - - - 1200 - - - - 300 - - - - 300 - - - - 300 - - - - 1200
Status: OK - - - ??????? - ???????? - ???????? - ???????? - ???????? - ???????? - ????????


Do you want to treat the first 2 recurrences > 200 as Normal, even if they are > 1000 ?

Or only the first 2 recurrences > 200 and < 1000 ?
So Bad if (> 1000) OR (more than 2 times between 200 and 1000)

[EDIT: This was posted before reading Alex response]
thomasschmeidl
Posts: 166
Joined: Sat Apr 15, 2006 2:14 pm
Location: Germany, Bavaria

Post by thomasschmeidl »

@Alex and Juergen,

I want my test behave as written above:
You have a "warning" threshold in case of slow replies (>200) and a "bad" threshold in case of very slow/no replies (>1000). And you consider the first two irregular (=warning or bad) replies as "normal"
This means

@Alex
Reply: 1100 - - - - 300 - - - - 300
Status: NORMAL - - NORMAL - -WARNING

Reply: 300 - - - - 1100 - - - - 300
Status: NORMAL - - NORMAL - -WARNING


@Juergen
Reply: 100 - - - - 300 - - - - 300 - - - - 300 - - - - 300 - - - - 1200 - - - - 1200 - - - - 300 - - - - 1200
Status: OK - NORMAL - NORMAL - WARNING - WARNING - - - BAD - - - BAD - -WARNING - - BAD

Reply: 100 - - - - 1200 - - - - 1200 - - - - 1200 - - - - 300 - - - - 300 - - - - 300 - - - - 1200
Status: OK - - - NORMAL - NORMAL - - BAD - - WARNING - WARNING -WARNING - - BAD

Regards

Thomas
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

This does not comply with the rules from your 1st post
Ping test, threshold 1000ms
- NORMAL if <=2 bad recurrences;
- WARNING if (1000ms >= Reply time >200ms) and >2 recurrencies;
- BAD if Reply time > 1000ms and >2 recurrencies
Regards
Alex
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

I am afraid such rules will lead to one problem - nobody will understand how program works. Including me :(
I would like to keep it simple.

Regards
Alex
thomasschmeidl
Posts: 166
Joined: Sat Apr 15, 2006 2:14 pm
Location: Germany, Bavaria

Post by thomasschmeidl »

@Alex,

I see, my 1st post was not expressed clearly (I am sorry, but I am not a native speaker).
That's why I described it again in words:
You have a "warning" threshold in case of slow replies (>200) and a "bad" threshold in case of very slow/no replies (>1000). And you consider the first two irregular (=warning or bad) replies as "normal"
IMHO this is not a very strange setting.

What I wanted to point out is:
In this setting it would be necessary that the warning produces a SuggestedSimpleStatus = Bad.
This cannot be obtained by ticking the "treat Warning status as bad"-box.
This can only be obtained with a test threshold=200 and a test timeout=1000 or more.

Regards

Thomas
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

You have a "warning" threshold in case of slow replies (>200) and a "bad" threshold in case of very slow/no replies (>1000). And you consider the first two irregular (=warning or bad) replies as "normal"
...
Would it be possible to enhance test properties by a bad test threshold which is less than the timeout (if no bad test threshold is given, test will be bad when timeout is reached)?
This would do the trick. May be we implement this...

Regards
Alex
thomasschmeidl
Posts: 166
Joined: Sat Apr 15, 2006 2:14 pm
Location: Germany, Bavaria

Post by thomasschmeidl »

This would do the trick. May be we implement this...
Thank you for understanding.

If you need help providing examples for the manual, I will support you ;-)

Cheers

Thomas
JuergenF
Posts: 331
Joined: Sun Jan 26, 2003 6:00 pm
Location: Germany, North Rhine-Westphalia

Post by JuergenF »

Dear all,

here again my suggestion from 12 Dec 2006 22:21 in this thread (changed a bit. Initial Value is 0 now - can't remember why I suggested 1 before :roll: )
[EDIT: I remember: for humans it is a bit easier to count the recurrencies. Because the incrementation is done AFTER the expression is evaluated. If my suggestion will be an option we can discuss again how to set the initial value]

- %NormalStatusRecurrencies% is always set to 0 when Status = Ok
- %NormalStatusRecurrencies% is incremented by +1 after Status is set to Normal (after evaluating the expressions)
- %WarningStatusRecurrencies% is always set to 0 when Status = Ok
- %WarningStatusRecurrencies% is incremented by +1 after Status is set to Warning (after evaluating the expressions)

PING-Test threshold 1000ms
[x] Use "Normal" status if: (%Reply% > 200) and (%NormalStatusRecurrencies% < 2)
[x] Use "Warning" status if: (%Reply% > 200) and (%Reply% < 1000) and (%NormalStatusRecurrencies% >= 2)

Let's proof that:

Example: 1
Reply: 1100 - - - - - - 300 - - - - - 300
Status: NORMAL - - NORMAL - -WARNING
- - - - N=0,W=0 - - - N1, W0 - - - N2, W0

Example: 2
Reply: 300 - - - - - 1100 - - - - - 300
Status: NORMAL - - NORMAL - -WARNING
- - - - N=0,W=0 - - - N1, W0 - - - N2, W0

Example: 3
Reply: 100 - - - - 300 - - - - 300 - - - - 300 - - - - 300 - - - - 1200 - - - - 1200 - - - - 300 - - - - 1200
Status: OK - NORMAL - NORMAL - WARNING - WARNING - - - BAD - - - BAD - - WARNING - - BAD
- - N=0,W=0 - N0, W0 - N1, W0 - - N2, W0 - - - N2, W1 - - - N2, W2 - N2, W2 - N2, W2 - N2, W3

Example: 4
Reply: 100 - - - 1200 - - - - 1200 - - - 1200 - - - - 300 - - - - 300 - - - - 300 - - - - 1200
Status: OK - - NORMAL - NORMAL - - BAD - - WARNING - WARNING -WARNING - - BAD
- - N=0,W=0 - N0, W0 - - N1, W0 - - N2, W0 - N2, W0 - - N2, W1 - - N2, W2 - - N2, W3

I think that will work.

In this case %WarningStatusRecurrencies% is not needed for our expressions.
But you can think about setting BAD when you have more than 4 Recurrences with (%Reply% > 200) and (%Reply% < 1000).
[x] Use "Warning" status if: (%Reply% > 200) and (%Reply% < 1000) and (%NormalStatusRecurrencies% >= 2) and (%WarningStatusRecurrencies% < 2)

What do you think ?
Last edited by JuergenF on Wed Dec 27, 2006 1:27 am, edited 1 time in total.
thomasschmeidl
Posts: 166
Joined: Sat Apr 15, 2006 2:14 pm
Location: Germany, Bavaria

Post by thomasschmeidl »

Hi Juergen,
changed a bit. Initial Value is 0 now - can't remember why I suggested 1 before
I think this was the key for your solution.
PING-Test threshold 1000ms
[x] Use "Normal" status if: (%Reply% > 200) and (%NormalStatusRecurrencies% < 2)
[x] Use "Warning" status if: (%Reply% > 200) and (%Reply% < 1000) and (%NormalStatusRecurrencies% >= 2)
As far as I can estimate, this will work.
But you can think about setting BAD when you have more than 4 Recurrences with (%Reply% > 200) and (%Reply% < 1000).
[x] Use "Warning" status if: (%Reply% > 200) and (%Reply% < 1000) and (%NormalStatusRecurrencies% >= 2) and (%WarningStatusRecurrencies% < 2)
IMHO this will not work unless the test timeout is uncoupled from the "bad"-threshold ("bad" threshold must be 200, test timeout must be 1000 or more - see my posts above).

Cheers and good night

Thomas

PS: We are closed to the thread with the highest number of posts :D
KS-Soft
Posts: 13012
Joined: Wed Apr 03, 2002 6:00 pm
Location: USA
Contact:

Post by KS-Soft »

- %NormalStatusRecurrencies% is always set to 0 when Status = Ok
- %NormalStatusRecurrencies% is incremented by +1 after Status is set to Normal (after evaluating the expressions)
1) If HostMonitor would not reset %NormalStatusRecurrencies% on other status changes (e.g. Normal -> Bad), this will lead to problems in some cases
2) If HostMonitor will reset this variable on any status change, "(%Reply% > 200) and (%Reply% < 1000) and (%NormalStatusRecurrencies% >= 2) " wil not work well

Anybody else needs such complicated behavior?
Probably we should release HostMonitor as it is?

Regards
Alex
Post Reply