Skip to main content

Watch-dog

IPDR-Watchdog

Real time monitoring of trisul-hub and trisul-probe and alert if they down.

Consists of 3-scripts

  1. Detect the status of the trisul-system by analyzing the logs
  2. Send alert to syslog of the status of the trisul-system is down
  3. When the system is up after the certain interval then it send alert
  4. Accept max of 1 context at a time
tip

Configure mail in trisul to receive alert 🔔

Two Modes

  • FLOW MODE ( IPDR Customer )
  • COUNTER GROUP MODE ( Analytics Customer )
    • In this mode , the GUUID key need to be provied

How it works ?

  • Preliminary check

    • Check the config file is availabe for the given context
    • Extract log file location from given context config file
    • Checks the log file is accessbile by the script owner
    • If above condition is not met then it stops

  • Gather information from log

    • Access the latest log file & get the timestamp of last log
    • For Flow Mode - latest log with END-IPFLOW-FLUSH is fetched
    • For GUUID Mode - latest log with user given GUUID-KEY and END-CG-FLUSH
    • From the above logs you can get flushed value,timestamp,no.of.dropped packets.
  • Processing the gathering information

    • Check the fetched log file is empty or not.
    • Check the flushed value not zero.
    • Check the difference between the timestamp of log and system should not be more than fixed-sec
    • If the above condition is not met the alert is generated

Requirements

  • Run this script as root
  • Check cronjob is installed
  • Configure email in trisul ( If you need email alert )

Options

OptionDefault valueDescriptionUsage
-c CONTEXT-NAMEcontext0Context name-c all [prints the flusher for all context] or -c rack245'
-e CONTEXT-NAME-Ignore context for checking-e netflow -e headoffice
-s HOST-NAMEhostnameHost name-s Trisul-Server
-i INTERVAL-Check flushed item count for last n minutes-i 2
-g COUNTER_GROUP-Check this counter group in log-g {2314BB8E-2BCC-4B86-8AA2-677E5554C0FE}
-f CHECK_FLOW_COUNT-Check flushed flow count-f
-v VERBOSE-Print inforamtion like fluhsed flows-v
-m MATCH_ANY_ENIGINE-Check if any 1 engine is flushed-m
-r RESTART-Restart context if down-r

How to run this script

/usr/local/share/trisul-hub/run_hub_watchdog.sh -c context0 -f -i

Example

ipdr_watchdog

When you see the example the above script is executed with argument -i

  • This is the verbose option where it prints the values in terminal
  • When you run without -i then it only print the values if the context is not running

Example - How alert generated when the conditions not met

   /usr/local/share/trisul-hub/run_hub_watchdog.sh -c rack265

error_log

  • In above example you can see that the flush value is 0. So that it generated the alert to syslog.
  • For both flow mode & GUUID mode the alert will be same