KS-Soft. Network Management Solutions
 FAQFAQ   SearchSearch   MemberlistMemberlist   UsergroupsUsergroups   RegisterRegister    ProfileProfile    Log inLog in 

Busy CPU Test - One possibility/example

 
Post new topic   Reply to topic    KS-Soft Forum Index -> Other
View previous topic :: View next topic  
Author Message
timn



Joined: 20 Nov 2003
Posts: 184
Location: United States

PostPosted: Tue Dec 30, 2003 7:40 am    Post subject: Busy CPU Test - One possibility/example Reply with quote

I'm just beginning to play with alert profiles and have been impressed with how flexible AHM is. I wanted to share a recent success involving testing for extremely busy CPU utilization.

Many of you have already gone far beyond this -- you should skip the remainder of this message if you long ago passed AHM 101.

Here's the scenario:


  1. Normally, I want to test a host's CPU utilization rate about once per minute.
  2. It is common on some of our hosts for CPU utilization to go to 100% for breif periods (typically 10 seconds or less). So I don't want to be alerted when this happens. i.e. I don't want to panic the 1st time I see 100% utilization.
  3. But if I see 100% utilization for 2 times in a row, I'd like to focus in on that test and run it more frequently, say once every 5 seconds until utilization stabilizes at something under 100%
  4. If the CPU stay at 100% for more than a minute (12 checks at 5 second intervals), I'd like to get an email notification
  5. I only want to go back to my original test interval if I get 4 'good' test results in a row -- indicating CPU utilization has dropped below 100% and maintained that value for 20 seconds.

This is surprisingly easy to achieve in AHM by building an Alert Profile.

First, create a CPU utilization test that runs once per minute and considers reply values greater than 99 to be bad.

Next, in the Test Properties dialog, under Alerts, click on Configure.

In the Action Profiles dialog, click on New and name your profile anything you want (I used "Busy CPU Watch")

Next, we are going to add 2 "Bad" status actions and 1 "Good" status action.

Under "Bad" status actions, click on "Add" and select "Change Test Interval".

In the Action Properties dialog, under "Condition to start action", set
"Start when" 2 "consecutive bad results occurs". Also, under Action Parameters, check the "Set to" line and enter 00:00:05 Then click 'OK'

This action says the when 2 'bad' results occur (at the original interval of once per minute), change the test interval to once every 5 seconds.

Now we want to state that if 12 total 'bad results are received, send an email notification. (Note: this is 2 'bad' results at the original interval and 10 more at the increased interval).

Under "Bad" status actions, click on "Add" and select "Send E-MAil (SMTP)". In the Action Properties dialog, configure all the parameters required for an email message. (I chose to make my message body template a bit generic using macro variables so that I could re-use the template for other hosts when their CPUs are in distress.)

Finally, we need to tell AHM that when 4 consecutive 'good' results are seen, reset the test interval to its original value.

Under "Good" status actions, click on "Add" and select "Change Test Interval".

In the Action Properties dialog, under "Condition to start action", set
"Start when" 4 "consecutive Good results occurs". Also, under Action Parameters, check the "Restore original value" box. Then click 'OK'

That's all. You can easily 'test' your alert by checking the "Reverse alert" check box in the Test Properties dialog. (Remember to uncheck it when you are done.)

Side notes:

I am running RMA on the remote host.

I decided to treat unknown statuses as 'bad'. This CPU test is dependent upon a Master ping test of the host. If the host cannot be pinged, the CPU test will not run.

But if the host can be pinged then it might be so busy that the RMA never gets a chance to respond, thus treating unknown statuses as "bad" is probably the right thing to do here. Your mileage may vary.
Back to top
View user's profile Send private message
FLynch



Joined: 18 Jun 2002
Posts: 75
Location: London UK

PostPosted: Tue Dec 30, 2003 5:27 pm    Post subject: Interesting Reply with quote

Hi,

Looks a good methodology to stop alert blizzards.

Also a good idea to post this sort of usage on the Forum.....encourages more innovative use of AHM.

Cheers
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    KS-Soft Forum Index -> Other All times are GMT - 6 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group

KS-Soft Forum Index