CloudWatch alarms

One of the methods to be alerted on an issue with one of your AWS resources is to use a CloudWatch alert that will send an email (or page) to a subscriber. One issue with this approach is that β€œAn alarm invokes actions only when the alarm changes state.”. This is documented at

Using Amazon CloudWatch alarms

In other words, you will receive one email or page when the event occurs. As an example, if you have an alert that is based on the CPU utilization of the database exceeding a certain threshold, the alert will be sent out once when the database first breaches the threshold. If you have a requirement to be paged every (for example) five minutes while the issue continues, you will need to develop some other methodology. One of the suggestions is to use a combination of services such as Eventbridge and Lambda to keep checking the alarm state.

As an option for DBAs used to handling alerts via scripting, I created a small script (see below) that can be executed via cron on an EC2 server that will monitor the CloudWatch alarm status and keep paging while the alarm is occurring.

#!/bin/bash

#
## alarms that are to be checked
#
alarm_names=("mydb-CPUUtilizationAlarm" "mydb-FreeLocalStorageAlarm")

#
## write out a file header
#
echo 'Report on CloudWatch alarms' > alarm_status.txt
echo 'Executed at ' `date` >> alarm_status.txt
echo ' ' >> alarm_status.txt
printf "%-50s %-10s\n" "Alarm Name" "Status" >> alarm_status.txt
pager_flag='n'

#
## get the status of each alarm
#
for cur_value in "${alarm_names[@]}"
do
    alarm_status=`aws cloudwatch describe-alarms --alarm-names "$cur_value" --output="json" | grep -iE '"StateValue"' | sed 's/StateValue\|"\|\':'\|','//g'`
      alarm_status=$(echo "$alarm_status" | tr -d ' ')
      if  [ "$alarm_status" == "ALARM" ]; then
          pager_flag='y'
      fi
      printf "%-50s %-5s\n" "$cur_value" "$alarm_status" >> alarm_status.txt
done

#
## mail to the appropriate person/team
# 
mail -s "CloudWatch alarm status" dba-group@company.com < alarm_status.txt

#
## check if a page out is required
#
if  [ $pager_flag == 'y' ]; then
    mail -s "CloudWatch alarm status - Alarm detected" dba-pager@company.com < alarm_status.txt
fi