correlate entries in (log-)files. Find pairs of messages

If you have information, script, utility, or idea that can be useful for HostMonitor community, you welcome to share information in this forum.
Post Reply
JuergenF
Posts: 331
Joined: Sun Jan 26, 2003 6:00 pm
Location: Germany, North Rhine-Westphalia

correlate entries in (log-)files. Find pairs of messages

Post by JuergenF »

Dear all,

Max encouraged me to post my Script here.
I've got so much support and ideas from this forum, maybe I can give back a bit.

The situation:
I have a SYSLOG file on a Linux machine that picks up a lot of messages from different devices.
HM should raise an alert when there is no "interface up" message after a "port-secure-violation" occured (interface down is performed).

In other words:
- If there is only a message for Switch 192.168.167.190 Port Fa0/3 that the port is "in err-disable state", then raise "Bad" condition.
- If there is a message "Interface FastEthernet0/3, changed state to up" for 192.168.167.190 too, then all is OK (interface has recovered)

Please regard: There are multiple Switches and Interface-Ports
Here is my Script:

Code: Select all

#! /bin/awk -f
# Script returns Bad Status when there is no corresponding "Interface UP" message for each "err-disable state" message
# Script examins syslog file with messages from Cisco Switches. Lines look like 
# Aug 30 15:44:11 192.168.167.190 2471: Aug 30 15:42:08: %PM-4-ERR_DISABLE: psecure-violation error detected on Fa0/3, putting Fa0/3 in err-disable state
# Aug 30 15:46:02 192.168.167.190 2475: Aug 30 15:43:58: %LINK-3-UPDOWN: Interface FastEthernet0/3, changed state to up
#
# Aug 31 15:44:11 dcdw0015.wetter.dematic.de 3471: Aug 31 15:42:08.354: %PM-4-ERR_DISABLE: psecure-violation error detected on Gi0/15, putting Gi0/15 in err-disable state
# Aug 31 15:46:02 dcdw0015.wetter.dematic.de 3475: Aug 31 15:43:58.543: %LINK-3-UPDOWN: Interface GigabitEthernet0/15, changed state to up
#
# Possible Output (assumed that the "state to up" lines are missing):
# ScriptRes:Bad:192.168.167.190_Fa0/3 dcdw0015_Gi0/15
#
# ./HostMon.awk /var/log/warn
#
BEGIN { # BEGIN rule is executed once only, before the first input record is read
       # nothing to do
      }
      { # This is done for each Input-Line
        # $0 = represents the whole input record, $1 = 1st Parameter, $2 =2nd Parameter ...
        if (length ($0) > 0)
          {
           if ( (match($0, "192.168.16")) || (match($0, "dcdw")) )             # Messages from my Switches
             {
              if ( (match($0, "state to up")) || (match($0, "ERR_DISABLE")) )  # Interesting Interface changes
                {
                 gsub("  ", " ", $0)                           # eliminate double-spaces
                 if ($9 == "%PM-4-ERR_DISABLE:")               # error disable message
                   {
                    $14 = substr ($14,1,length ($14) -1)       # Extract Interface, get rid of ","
                    if (index ($4, ".") > 4)                   # If Full Qualified Domain Name (dcdw0015.wetter.dematic.de)
                       $4 = substr ($4,1,index ($4, ".") -1)   # Shorten to hostname (dcdw0015)
                    ind = ($4 "_" $14)                         # Create unique Index (dcdw0015_Gi0/15)
                    dev_arr[ind] = "Bad"                       # Set current Interface state "Bad"
#                   print ind, dev_arr[ind]                    # For debugging: Display Values
                   }
                 if ($9 == "%LINK-3-UPDOWN:")                  # This is a "Link state changed to up" message
                   {
                    gsub("FastEthernet", "Fa", $11)            # Shorten Interface name to Fa0/3
                    gsub("GigabitEthernet", "Gi", $11)         # ... or to Gi0/15 
                    $11 = substr ($11,1,length ($11) -1)       # Extract Interface, get rid of ","
                    if (index ($4, ".") > 4)                   # If Full Qualified Domain Name (dcdw0015.wetter.dematic.de)
                       $4 = substr ($4,1,index ($4, ".") -1)   # Shorten to hostname (dcdw0015)
                    ind = ($4 "_" $11)                         # Create unique Index (dcdw0015_Gi0/15)
                    dev_arr[ind] = "Ok"                        # Set current Interface state "Ok"
#                   print ind, dev_arr[ind]                    # For debugging: Display Values
                   }
                }
             }
          }
      }
END   { # an END rule is executed once only, after all the input is read
        StatusString = "Ok"                     # Prepare for ScriptRes
        Reply = ""                              # Prepare for ScriptRes
        for (ind in dev_arr)                    # For all Elements of Array dev_arr
          if (dev_arr[ind] == "Bad")            # If Interface is ERR_DISABLE
            {
             StatusString = "Bad"               # Script has to report Bad Status
             Reply = (Reply ind " ")            # Add Bad Interface to Reply-String
#             print ind, dev_arr[ind]           # For debugging: Display Values
            }
                                # ScriptRes:Bad:192.168.167.190_Fa0/3 dcdw0015_Gi0/15
                                # ScriptRes:Ok:
        print "ScriptRes:" StatusString ":" Reply 
      }
In addition I use the function "Use warning status" in HostMonitor for the first 3 Bad recurrenes to reduce the number of False-Positives.

Feel free to use or modify the Script to fit your requirements.

Best regards

Juergen
Post Reply