Monitoring Alfresco in Nagios via OOTB support tools addon

Alfresco CE direct monitoring for Nagios via curl command

JMX information is available from Alfresco Enterprise 3.2, giving many possibilities to external monitoring tools like Nagios/Icinga for checking Alfresco variables. An example is the well-known Nagios/Icinga plugin for monitoring Alfresco. 

The most interesting information of this plugin is for Enterprise Edition (EE), although general monitoring commands (not JMX-based) may be used for Community Edition (CE) too. For example, in an Alfresco CE installation we use some Nagios plugins like:

  • check_http for direct monitorization of http(s) service (like 80 or 443)
  • check_tcp for checking Tomcat and Alfresco ports (like 8009, 8080, 8443 or 50500)
  • check_snmp for checking CPU, RAM, Load & Swap (via standard SNMP protocol)
  • check_esxi for checking similar metrics from VMware API point of view (if your instance is virtualized) 
  • check_tomcat  for monitoring threads and JVM
  • check_mysql  for monitoring your database pool connections

Last day, we tested Alfresco OOTB Support Tools addon for Community Edition, and this addon provides some of this useful information about JVM, threads or logged users, that can be consumed from Alfresco OOTB webscripts via curl command, for generating alerts and graphs. We can use the JSON information of the webscripts (thanks Axel Faust, for showing me this), in some shell scripts like this:

check_ootb_active_sessions.sh

#!/bin/bash
#
#  Author: Cesar Capillas
#
#  https://github.com/CesarCapillas
#

SERVER=$1
PORT=$2
USERNAME=$3
PASSWORD=$4
VAR=$5
WARNING=${6:-100}
CRITICAL=${7:-200}

if [ "$PORT" = "443" ]; then
   PROTOCOL="https"
else
   PROTOCOL="http"
fi

# Endpoint for Alfresco CE with OOTB Support Tools
ENDPOINT="$PROTOCOL://${SERVER}:${PORT}/alfresco/service/ootbee/admin/active-sessions?format=json"
# Endpoint for Alfresco EE with Support Tools addon
#ENDPOINT="$PROTOCOL://${SERVER}:${PORT}/alfresco/service/enterprise/admin/admin-activesessions?format=json"

if [[ "$1" == "" ]]; then
  echo "USAGE:"
  echo "  check_ootb_active_sessions.sh <SERVER> <PORT> <USERNAME> <PASSWORD> <VAR> <WARNING> <CRITICAL>"
  echo 
  echo "    where VAR=[NumActive|MaxActive|NumIdle|UserCountNonExpired|TicketCountNonExpired]"
  echo
  exit
fi

CURL=`curl --silent -u ${USERNAME}:${PASSWORD} -X GET ${ENDPOINT}`
CHCK=`echo $CURL | grep "$5"`

if [[ "$CHCK" == "" ]]; then
   CHECK="Failed"
else
   CHECK="OK"
   ACTIVE_SESSION_VAR=`echo $CURL | jshon -e $5 | sed 's/"//g'`
fi

if [[ "$CHECK" == "OK" ]]; then
   if (($ACTIVE_SESSION_VAR > $CRITICAL));then
      echo "CRITICAL: $5 = $ACTIVE_SESSION_VAR (>$CRITICAL)"
      exit 2
   fi
   if (($ACTIVE_SESSION_VAR > $WARNING));then
      echo "WARNING: $5 = $ACTIVE_SESSION_VAR (>$WARNING)"
      exit 1
   fi

   echo "INFO: Sessions ($5) = $ACTIVE_SESSION_VAR"
   exit 0
elif [[ "$CHECK" == "Failed" ]]; then
   echo "CRITICAL: ${SERVER}"
   exit 2
else
   echo "Check failed."
   exit 3
fi

check_ootb_performance_stats.sh

#!/bin/bash
#
#  Author: Cesar Capillas
#
#  https://github.com/CesarCapillas
#
#  License: see accompanying LICENSE file
#

SERVER=$1
PORT=$2
USERNAME=$3
PASSWORD=$4
VAR=$5
WARNING=${6:-10000}
CRITICAL=${7:-10000}
if [ "$PORT" = "443" ]; then
   PROTOCOL="https"
else
   PROTOCOL="http"
fi

# Endpoint for Alfresco CE with OOTB Support Tools
ENDPOINT="$PROTOCOL://${SERVER}:${PORT}/alfresco/service/ootbee/admin/admin-performance?format=json"
# Endpoint for Alfresco EE with Support Tools addon
#ENDPOINT="$PROTOCOL://${SERVER}:${PORT}/alfresco/service/enterprise/admin/admin-performance?format=json"

# Most useful are UsedMemory (JVM) and ThreadCount 
#   Memory is in Mb Ej; 4096M
#   Load is in percentage

if [[ "$1" == "" ]]; then
  echo "USAGE:"
  echo "  check_ootb_performance_stats.sh <SERVER> <PORT> <USERNAME> <PASSWORD> <VAR> <WARNING> <CRITICAL>"
  echo 
  echo "    where VAR=[MaxMemory|TotalMemory|UsedMemory|FreeMemory|ProcessLoad|SystemLoad|ThreadCount|PeakThreadCount]"
  echo 
  echo $CURL 
  exit
fi

CURL=`curl --silent -u ${USERNAME}:${PASSWORD} -X GET ${ENDPOINT}`
CHCK=`echo $CURL | grep "$5"`

if [[ "$CHCK" == "" ]]; then
   CHECK="Failed"
else
   CHECK="OK"
   PERFORMANCE_VAR=`echo $CURL | jshon -e $5`
fi

if [[ "$CHECK" == "OK" ]]; then
   if (($PERFORMANCE_VAR > $CRITICAL));then
      echo "CRITICAL: $5 = $PERFORMANCE_VAR (>$CRITICAL)"
      exit 2
   fi
   if (($PERFORMANCE_VAR > $WARNING));then
      echo "WARNING: $5 = $PERFORMANCE_VAR (>$WARNING)"
      exit 1
   fi

   echo "INFO: $5 = $PERFORMANCE_VAR"
   exit 0

elif [[ "$CHECK" == "Failed" ]]; then
   echo "CRITICAL: ${SERVER}"
   exit 2
else
   echo "Check failed."
   exit 3
fi

The two upper scripts use curl and jshon commands. The corresponding commands for Nagios look like:

ootb-commands.cfg

define command {
        command_name    check_performance_stats 
        command_line    /usr/lib/nagios/plugins/check_ootb_performance_stats.sh '$ARG1$' '$ARG2$' '$ARG3$' '$ARG4$' '$ARG5$' '$ARG6$' '$ARG7$' 
}

define command {
        command_name    check_active_sessions
        command_line    /usr/lib/nagios/plugins/check_ootb_active_sessions.sh '$ARG1$' '$ARG2$' '$ARG3$' '$ARG4$' '$ARG5$' '$ARG6$' '$ARG7$'
}

And finally we define services for an Alfresco host (alf5) in the next file:

services_ootb.cfg

define service {
        use                             generic-service
        host_name                       alf5
        service_description             [OOTB] Number of active database connections
        max_check_attempts              3
        normal_check_interval           10
        retry_check_interval            3
        check_command                   check_active_sessions!alfie.zylk.net!443!monitor!secret!NumActive!15!20
}

define service {
        use                             generic-service
        host_name                       alf5
        service_description             [OOTB] Number of logged users
        max_check_attempts              3
        normal_check_interval           10
        retry_check_interval            3
        check_command                   check_active_sessions!alfie.zylk.net!443!monitor!secret!UserCountNonExpired!15!20
}

define service {
        use                             generic-service
        host_name                       alf5
        service_description             [OOTB] Number of tickets
        max_check_attempts              3
        normal_check_interval           10
        retry_check_interval            3
        check_command                   check_active_sessions!alfie.zylk.net!443!monitor!secret!TicketCountNonExpired!15!20
}

define service {
        use                             generic-service
        host_name                       alf5
        service_description             [OOTB] JVM Used Memory
        max_check_attempts              3
        normal_check_interval           10
        retry_check_interval            3
        check_command                   check_performance_stats!alfie.zylk.net!443!monitor!secret!UsedMemory!3500!4000
}

define service {
        use                             generic-service
        host_name                       alf5
        service_description             [OOTB] Number of Threads
        max_check_attempts              3
        normal_check_interval           10
        retry_check_interval            3
        check_command                   check_performance_stats!alfie.zylk.net!443!monitor!secret!ThreadCount!225!250
}

define service {
        use                             generic-service
        host_name                       alf5
        service_description             [OOTB] Process Load
        max_check_attempts              3
        normal_check_interval           10
        retry_check_interval            3
        check_command                   check_performance_stats!alfie.zylk.net!443!monitor!secret!ProcessLoad!75!85
}

define service {
        use                             generic-service
        host_name                       alf5
        service_description             [OOTB] System Load
        max_check_attempts              3
        normal_check_interval           10
        retry_check_interval            3
        check_command                   check_performance_stats!alfie.zylk.net!443!monitor!secret!SystemLoad!85!95
}

Similar webscripts are found for the original support tools work of Antonio Soler, so it is quite simple to change the corresponding webscript endpoints in the shell scripts. Other safer possibility may be, instead of a direct monitoring, to run this script locally from Alfresco Server, and to expose those metrics via NRPE or  SNMP custom comands.

And the result for this:

Links:

 

00

More Blog Entries

thumbnail

1 Comment

MJ
markus joos 2 Years Ago

that is very cool!

00