Harnessing Amazon Bedrock: A Simple Image Generation Demo with Stability AI

In today’s rapidly evolving AI landscape, Amazon Bedrock is a quick, powerful, and easy to use platform for leveraging foundation models. This blog post will guide you through creating a simple web solution to demonstrate Stability AI’s image generation capabilities using Amazon Bedrock.

The goal is to build a simple website that accepts user prompts and generates corresponding images. This demo demonstrates the potential of AI-driven image creation and provides hands-on experience together with actual code. There is no reliance on PowerPoint decks 🙂 .

Infrastructure Setup
For this demonstration, I will be running the web server on an Amazon EC2 instance running Amazon Linux 2. The instance will be equipped with Python 3, Flask for web framework functionality, and boto3 for AWS SDK interactions.

A second windows EC2 instance located in the same VPC and subnet will be used to access the web page. The security group of the Linux server should allow connections from the windows server on port 5000.

Accessing Stability AI Model
Before executing the code, you must request access to the Stability AI model via the Amazon Bedrock console. This step allows the EC2 instance to execute the Stability AI model.

IAM Policy Configuration
To allow our EC2 instance to access the Stability AI model, we need to attach the below IAM policy:

        {
            "Effect": "Allow",
            "Action": [
                "bedrock:InvokeModel",
                "bedrock:InvokeModelWithResponseStream"
            ],
            "Resource": [
                "arn:aws:bedrock:us-east-1::foundation-model/stability.stable-diffusion-xl-v1"
            ]
        }

This can be added as an inline policy to the IAM role used by your EC2 instance.

Upload the python code (see download link at bottom of page) to a file named my_ai_web.py and start the web server with the command python ./my_ai_web.py as shown in the below image.

Sign on to the windows server and navigate to the second ip displayed in the above image. The following web page should be displayed:

Enter a prompt and click on the “Generate Image” button as shown below:

At this point the code makes a call to bedrock to process the prompt and generate the image:

After the image is generated, it is displayed as shown below:

Python/Flask code download link

Using IAM Authentication with Redshift

Managing user access to your Amazon Redshift database can be a headache, especially when it comes to maintaining individual user accounts and passwords. However, there’s a better way – you can leverage your existing AWS Identity and Access Management (IAM) permissions to generate temporary database credentials for your users.

By configuring your system to authenticate users based on their IAM credentials, you can eliminate the need to create and manage individual database user accounts. Users can simply log in to your Redshift database using their IAM credentials, and your system will generate the necessary database credentials on-the-fly.

This approach offers several benefits. It streamlines user management, as you can leverage your existing IAM permissions structure to control access to the database. It also enhances security, as you don’t have to worry about users sharing or mishandling database passwords.

The official AWS documentation is at Using IAM authentication to generate database user credentials.

In my particular use case, I had to connect to a Redshift database with temporary IAM authentication and there is one small wrinkle that cost me a couple of hours. I am documenting this together with the code in the hope that I might save someone else some time in the future.

Below is my python code

import boto3
from botocore.exceptions import ClientError
import psycopg
psycopg._encodings._py_codecs["UNICODE"] = "utf-8"
psycopg._encodings.py_codecs.update(
    (k.encode(), v) for k, v in psycopg._encodings._py_codecs.items()
)
from psycopg import OperationalError

def connect_to_the_database(db_user, cluster_identifier, db_name, db_host, db_port):

    print('Original DB user     : ', db_user)

    redshift = boto3.client('redshift', 'us-west-1')
    try:
        credentials = redshift.get_cluster_credentials(
            DbUser = db_user,
            DbName = db_name,
            ClusterIdentifier = cluster_identifier,
            DurationSeconds = 900,
            AutoCreate = False,
        )
    except ClientError as e:
        print('Error getting cluster credentials')
    
    db_password = credentials.get("DbPassword")
    db_user     = credentials.get("DbUser")

    print('Modified IAM DB user : ', db_user)
    print('IAM DB password      : ', db_password)

    connection_args = 'host={} port={} dbname={} user={} password={}'.format(
        db_host,
        db_port,
        db_name,
        db_user,
        db_password
    )
    try:
        conn = psycopg.connect(connection_args)
    except psycopg.Error as e:
        print('Error connecting to the database')

    return conn

#
## main processing
#
if __name__ == '__main__':
    db_user='my_db_user'
    cluster_identifier='my-redshift-cluster'
    db_name='test'
    db_host='my-redshift-cluster.xxx.us-west-1.redshift.amazonaws.com'
    db_port=5439
    conn = connect_to_the_database(db_user, cluster_identifier, db_name, db_host, db_port)

Note that in the main processing the user name is “my_db_user” and this is the name passed to the function “connect_to_the_database”. This is confirmed by the print statement at the start of the function:

print('Original DB user     : ', db_user)

My expectation was that the call to BOTO3’s get_cluster_credentials would return a password that I would use to connect to the database with. However, you also have to extract the modified user name from the returned “credentials” dictionary object to get the modified user name. This modified user name and temporary credentials are used to connect to the database.

The output of the program’s print statements is below:

Original DB user     :  my_db_user
Modified IAM DB user :  IAM:my_db_user
IAM DB password      :  JmRCSmqxEyw2Seo89mWUg2H1PG9pONROZJBXA==

Note the difference between the original DB user and the modified IAM DB user. The user that you want to connect to the database has to be prefixed with the text “IAM:”. This, IAM:my_db_user, is the user name that will appear in the stv_sessions, stv_inflight etc. views.

Create and deploy a lambda written in Python accessing PostgreSQL or Redshift

In this blog I will demonstrate the creation of a Lambda written in Python to access a Postgres or Redshift database. As both these database engines are based on Postgres, the same process will work.

In order to access Postgres from Lambda we need to create and deploy a Lambda layer containing Psycopg, the most popular PostgreSQL database adapter for the Python programming language. I will be deploying version 3 of Psycopg.

Prerequisites
The first step is to create the software in a zip file required by the layer. A typical approach is to perform these steps on an EC2. This does require that an EC2 be deployed with the required permissions (IAM Role). Instead, I will be performing these commands in a Cloudshell environment. If you are connected to the AWS Management Console, either search for “Cloudshell” or click on the icon usually located in the top right corner of the web page.

Important note : This blog is using Python version 3.12 in all the commands. If you need to use a different version, change “3.12” to the version you need.

Install Psycopg
Make a directory to house the software. It is important that this directory structure be exactly as defined below as we will be zipping this directory to create the layer.

mkdir -p dc-layer/python

And then change directory to the above directory

cd dc-layer

Install the software with the command

pip3 install psycopg[binary] \
		--python-version 3.12 \
		--only-binary=:all: \
		--platform manylinux2014_x86_64 \
		-t dc-layer/python

The following directory structure and files should exist

/home/cloudshell-user $ pwd
/home/cloudshell-user/dc-layer/python
/home/cloudshell-user $ ls -l
total 160
drwxr-xr-x. 6 cloudshell-user cloudshell-user   4096 Jul  8 20:37 psycopg
drwxr-xr-x. 2 cloudshell-user cloudshell-user   4096 Jul  8 20:37 psycopg-3.2.1.dist-info
drwxr-xr-x. 3 cloudshell-user cloudshell-user   4096 Jul  8 20:37 psycopg_binary
drwxr-xr-x. 2 cloudshell-user cloudshell-user   4096 Jul  8 20:37 psycopg_binary-3.2.1.dist-info
drwxr-xr-x. 2 cloudshell-user cloudshell-user   4096 Jul  8 20:37 psycopg_binary.libs
drwxr-xr-x. 2 cloudshell-user cloudshell-user   4096 Jul  8 20:37 __pycache__
drwxr-xr-x. 2 cloudshell-user cloudshell-user   4096 Jul  8 20:37 typing_extensions-4.12.2.dist-info
-rw-r--r--. 1 cloudshell-user cloudshell-user 134451 Jul  8 20:37 typing_extensions.py

Zip the software for the lambda layer
Go up one level to the directory that contains the “python” directory and zip the software installed in the previous step with

zip -r layer.zip python

The directory should now contain a zip file

/home/cloudshell-user $ pwd; ls -l
/home/cloudshell-user/dc-layer
total 4668
-rw-r--r--. 1 cloudshell-user cloudshell-user 4772949 Jul  8 20:43 layer.zip
drwxr-xr-x. 9 cloudshell-user cloudshell-user    4096 Jul  8 20:37 python

Publish the layer
Publish the layer using AWS CLI with the below command

aws lambda publish-layer-version --layer-name dc-psycopg3-layer \
    --zip-file fileb://layer.zip \
    --compatible-runtimes python3.12 \
    --compatible-architectures "x86_64"

Confirm that the layer was published
From the AWS Management Console, navigate to the Lambda Layers page and confirm that the above layer is present.

Create a Lambda function
Create a new lambda function using Python 3.12

Add the layer previously uploaded to the function’s definition

Lambda function – sample code
Below is sample code that connects to a Redshift cluster and performs a select on a table

import logging
import psycopg
from psycopg import OperationalError
psycopg._encodings._py_codecs["UNICODE"] = "utf-8"
psycopg._encodings.py_codecs.update(
    (k.encode(), v) for k, v in psycopg._encodings._py_codecs.items()
)

logger = logging.getLogger()
logger.setLevel(logging.INFO)

def lambda_handler(event, context):

    print('**** Connecting to Redshift')
    my_hostname = 'my-cluster.us-east-1.redshift.amazonaws.com'
    my_database = 'my-databse'
    my_user     = 'my-database-user'
    my_password = 'my-db-user-password'
    my_port     =  5439

    connection_args = 'host={} port={} dbname={} user={} password={}'.format(
        my_hostname,
        my_port,
        my_database,
        my_user,
        my_password
    )

    try:
        logger.info("Establishing connection to Redshift")
        conn = psycopg.connect(connection_args)
        logger.info("SUCCESS: Connection to Redshift succeeded!")
    except psycopg.Error as e:
        logger.error("ERROR: Unexpected error: Could not connect to Redshift.")
        logger.error(e)


    sql_text = "select col_01, col_02 from my_schema.my_table where col_01 = 'dc-test'"
    print(sql_text)

    try:
        cur = conn.cursor()
        cur.execute(sql_text)
        for row in cur.fetchall():
            col_01 = row[0]
            col_02 = row[1]
            print(col_01, col_02)
        cur.close()
    except psycopg.Error as e:
        print('SQL failed. Failing text is ', sql_text)
        print(f"Error from database",e)

Examples of Redshift’s Unload, Copy, and Super Data Type

Recently, my focus has been on using Amazon Redshift for a data warehousing solution, and I wanted to share some insights gained through working with unload, copy, and super data types. Unloading data allows for efficient extraction from Redshift to various storage options, facilitating downstream analytics and archival processes. Similarly, the COPY command serves as a linchpin for ingesting data into Redshift, enabling seamless integration from diverse sources. Super data types are a flexible option for accommodating complex data structures without the need for continuous schema changes.

Unload and rename a subset of columns
If you need to unload a subset of columns from a source table for load into a target table with different column names, this can be accomplished as follows:

unload (' select   old_col_01     as new_col_01
                  ,old_col_23     as new_col_02
                  ,old_col_42     as new_col_03
           from    my_schema.my_table_name
          where    old_col_24     is not null
'
)
to 's3://my-s3-bucket/table_01_data_'
iam_role  'arn:aws:iam::93073150:role/my-redshift-role'
JSON
;

The column name old_col_01 from the source table is being renamed to new_col_01 for the target table.

The target S3 bucket should not contain any files with the prefix in the unload command. A quick way to clean up the files without the need to use the AWS Console, filter and select the files and type “permanently delete” is to use AWS Cloudshell and the command:

aws s3 rm s3://my-s3-bucket/ --recursive --exclude "*" --include "table_01_data_*"

Please make sure that you have populated the exclude and include options correctly. There is no warning or confirmation. All the files matching the exclude and include are deleted with no recovery.

Unload a subset of rows from a table
The Redshift unload command does not allow the use of a limit clause directly in the select. The work around is to use it in a sub select as shown below:

unload (' select * 
          from   my_schema.my_table_name
          where  col_01 in 
                 (    
                   select col_01 
                   from   my_schema.my_table_name 
                   limit 400
                 ) 
'
)
to 's3://my-s3-bucket/table_01_data_'
iam_role  'arn:aws:iam::93073150:role/my-redshift-role'
JSON
;

The sub select refers to the same table and applies the limit clause.

Copy a certain column(s) from a file that contains other columns
If you have a file in S3 the contains an unload from a table with a number of columns and you only want to populate 1 or more columns in a different table, you can do it with the below command:

copy       my_schema.my_table_name
           (col_01, col_02)
from      's3://my-s3-bucket/table_01_data_'
iam_role  'arn:aws:iam::93073150:role/my-redshift-role'
region    'us-east-1'
FORMAT     JSON 'auto ignorecase'
;

Columns named col_01 and col_02 are copied into table name my_schema.my_table name while other columns in the S3 file are ignored.

Copying data and Redshift super data type
If you have a file containing JSON data and you need to load it into a Redshift table containing a super data type, you can use the following command:

copy       my_schema.my_table_name
from      's3://my-s3-bucket/ table_01_data_'
iam_role  'arn:aws:iam::93073150:role/my-redshift-role'
region    'us-east-1'
FORMAT     JSON 'noshred'
;

All of the data in the S3 file will be loaded into the super data type column in JSON format.

The data in the super data type column containing JSON data can be queried as follows. Assume that the table name is my_schema.my_super_data_table containing a super data type column named my_json_data. This JSON data contains elements named create_time, account_id etc. You can refer to the individual elements as follows:

select   my_json_data."create_time"::timestamp::date as create_date
        ,my_json_data."account_id" as account_id
        ,my_json_data."cust_name"  as cust_name
from     my_schema.my_super_data_table
where    my_json_data."zip_code" = 12345
;

Python Script for Deleting Old Files from S3 Bucket

I had a requirement to delete about 68,0000 files from multiple folders in an S3 bucket. This Python script automates the process of deleting old files from an Amazon S3 bucket based on pattern matching of the name of the folder/file. It connects to the S3 bucket, identifies files older than a specified timeframe, and deletes them while keeping a detailed audit trail. Here’s a breakdown of the script:

1. Setting Up:

The script imports the necessary modules: datetime for date manipulation, boto3 for interacting with S3, and timedelta for time calculations.
It defines variables for bucket name, prefix, file paths for storing S3 file names and files to be deleted, and the target file pattern for identification.

2. Gathering Files from S3:

A connection is established to S3 using boto3.
The list_objects_v2 paginator retrieves all files under the specified bucket and prefix. If you do not use this logic, only the first 1000 files will be listed.
The script iterates over each page and extracts the file names, storing them in a text file (files_in_s3).
A timestamp is recorded to indicate the completion of this stage.

3. Identifying Files for Deletion:

The script calculates the date two months ago using timedelta and datetime.
It iterates through the list of files from S3 and checks if they:
- Start with the specified pattern (my-file-name-pattern).
- Contain the two-month-ago date (yy_months_ago) in their name.
If both conditions are met, the file name is written to another text file (files_to_delete) for deletion.
A timestamp and a count of files marked for deletion are printed.

4. Deleting Identified Files:

The script iterates through the list of files to be deleted.
For each file, it extracts the folder and region information.
It checks if the current folder or region is different from the previous one. If yes, it prints a timestamp indicating the start of deletion for that specific folder/region.
The script then uses the delete_object function to remove the file from the S3 bucket.

5. Completion and Audit Trail:

A final timestamp marks the completion of file deletion.
The script prints “End of program” as a closing message.

Benefits:

Automates deletion of old files, reducing storage costs and improving data management.
Maintains an audit trail of files identified for deletion and their removal timestamps.
Customizable to different bucket configurations and deletion criteria.

Note:

This script assumes the necessary AWS credentials are configured for accessing S3 resources.
Modify the script parameters like bucket name, prefix, pattern, and file paths as needed for your specific scenario.

This script provides a comprehensive and efficient way to manage and delete old files in your S3 bucket, ensuring optimal storage utilization and data governance.

Code:

from datetime import datetime, timedelta
import boto3

now = datetime.now()
print(f"Starting at : {now}")
print(' ')

#
## Bucket details
#
bucket_name = 'my-bucket'
bucket_prefix = 'my-prefix/'
files_in_s3 = 'C:/dean/python/s3_list.txt'
files_to_delete = 'C:/dean/python/s3_delete.txt'

#
## Connect to S3 and get the file names
#
s3 = boto3.client('s3')
paginator = s3.get_paginator('list_objects_v2')
page_iterator= paginator.paginate(Bucket=bucket_name, Prefix=bucket_prefix)
with open(files_in_s3, 'w') as f:
    for page in page_iterator:
        contents = page.get('Contents', [])
        for item in contents:
            f.write(item['Key'] + '\n')  
now = datetime.now()
print(f"Collected files from S3 at {now}")
print(' ')

#
## find the n-2 month
#
n_months_ago = datetime.now() - timedelta(days=60)
yy_months_ago = n_months_ago.strftime('%Y/%m')
print(f"Deleting files for {yy_months_ago}")
print(' ')

#
## Write the files to be deleted to an audit trail
#
file_ctr = 0
file_out= open(files_to_delete, 'w')
with open(files_in_s3, 'r') as f:
    for line in f:
        file_name = line.strip()
        if  file_name.startswith('my-file-name-pattern'):
            if  yy_months_ago in file_name:
                file_out.write(file_name + '\n')
                file_ctr = file_ctr + 1
now = datetime.now()
print(f"Identified files to delete at {now}")
temp = 'Number of files to delete ' + str(file_ctr)                
print(temp)
print(' ')
file_out.close()

#
## Delete the files
#
prev_folder = ''
prev_region = ''
with open(files_to_delete, 'r') as f:
    for line in f:
        cur_folder = line.split('/')[3]
        cur_region = line.split('/')[4]
        if cur_folder != prev_folder or cur_region != prev_region:
            now = datetime.now()
            print(f"Deleting files from {cur_folder}/{cur_region} at {now}")
            prev_folder = cur_folder
            prev_region = cur_region
        file_name = line.strip()
        s3.delete_object(Bucket=bucket_name, Key=file_name)
print(' ')
now = datetime.now()
print(f"Completed file deletion at {now}")
print(' ')
print('End of program')

Amazon CodeWhisperer demo

I have recently begun using Amazon CodeWhisperer and am continually impressed by its ability to save me time by automating typing, reducing syntax lookup, and error reduction. The tool is available for multiple IDEs, including Visual Studio Code and PyCharm, and supports multiple languages, such as Python, Java, and C#. I have been using it for Python in Visual Studio Code.

The first step is to add the AWS Toolkit extension to VS Code:

The next step is to sign up for CodeWhisperer, using a personal email address. The product is free for individual use with unlimited code suggestions. The extension exchanges a code with an Amazon website to enable the product to run. After this brief sign-up process, you are ready to go.

For example, to create a Python program to select data from a Postgres table, I provided the following comment line:

#connect to a postgres database and cust_id, cust_name, create_ts from public.dc_customer ordered by cust_name

and hit the enter key at the end of the command line. CodeWhisperer starts suggesting the required lines of code such as “import psycopg2”, “try:” etc.

Each suggestion shows up in grey text and if you want to accept the suggestion, use the tab key and it will be converted to code with the proper highlighting and indentation.

I continued to accept the suggestions in the next few screenshots:

After the “finally” block, all of the logic requested in the first comment line was in place. I made two changes to the code created by CodeWhisperer; added in the dbname, host, password etc. and added “FETCH FIRST 10 ROWS ONLY”.

The code executed without any errors:

Complete details on Amazon CodeWhisperer can be found on this page.

Sharing an AWS customer managed KMS key between accounts

My client had a requirement to clone an Aurora database from the production account to a test account. In adherence to standard security practices, the production Aurora instance was configured with encryption utilizing a customer-managed Key Management Service (KMS) key. To enable the successful cloning of the database into the test account, a prerequisite step is to share the KMS key from the production account with the test account.

If the key was created via the console, we can navigate to the KMS page and filter for the key as shown below:

If you click on the Alias and then the Key Policy tab and scroll down

there is an option to add other AWS account

However, keys created via a cloud formation template such as below:

Resources:
  #
  ## Create a key
  #
  rCreateDBKMSCMK:
    Type: AWS::KMS::Key
    DeletionPolicy: Retain
    Properties:
      KeyPolicy:
        Version: '2012-10-17'
        Statement:
        - Effect: Allow
          Principal:
            AWS: 'arn:aws:iam::111111111121:root'
          Action: 'kms:*'
          Resource: '*'
      Tags:
      - Key: Name
        Value: dc-test-key-03
  #
  ## Create an alias for the key
  #
  rCreateDBKMSCMKAlias:
    Type: 'AWS::KMS::Alias'
    DeletionPolicy: Retain
    Properties:
      AliasName: 'alias/dc-test-key-03-alias'
      TargetKeyId: !Ref rCreateDBKMSCMK

lack the add other AWS account button:

In order to allow sharing, the below needs to be added to the key’s policy.

In this example, account 101010101010 is the key owner and is sharing the key with account 707070707070.
Typically the key policy will already contain permissions similar to the code in black. The code in red is needed to enable the share.
In this example, I am sharing with the root account. This can be changed as per your security requirements.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::101010101010:root"
            },
            "Action": "kms:*",
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::707070707070:root"
            },
            "Action": [
                        "kms:Encrypt",
                        "kms:Decrypt",
                        "kms:ReEncrypt*",
                        "kms:GenerateDataKey*",
                        "kms:DescribeKey"
                      ],
            "Resource": "*"
        },
        {
            "Sid": "Allow attachment of persistent resources",
            "Effect": "Allow",
            "Principal": {
                "AWS": "arn:aws:iam::707070707070:root"
            },
            "Action": [
                "kms:CreateGrant",
                "kms:ListGrants",
                "kms:RevokeGrant"
            ],
            "Resource": "*",
            "Condition": {
                "Bool": {
                    "kms:GrantIsForAWSResource": "true"
                }
            }
        }
    ]
}

After the above policy change has been made, the key will be shared with the other account. This can be verified by signing on to the 707070707070 account and issuing the below command to describe the key:

aws kms describe-key --key-id=arn:aws:kms:us-east-1:101010101010:key/6897

Load testing with pgbench

Introduction

PostgreSQL is renowned for its robustness and reliability, making it a go-to choice for a wide range of applications. To ensure your PostgreSQL database performs at its best, benchmarking is a crucial step in the process. In this blog post, I will delve into the powerful capabilities of pgbench, a versatile tool for running benchmark tests on PostgreSQL.

Out of the box, pgbench uses a scenario loosely based on TPC-B, running a sequence of five SELECT, UPDATE, and INSERT commands per transaction. While this default scenario provides valuable insights into your system’s performance, there are cases where more specialized testing is necessary.

In this blog, I will explore the art of crafting customized benchmarks using specific tables and SQL statements. This approach enables you to tailor your benchmark to the unique demands of your application, ensuring your PostgreSQL database is optimized for real-world workloads.

Create an EC2

Create an Amazon EC2 running Amazon Linux. These instructions use an Instance type of t2.2xlarge with the below configuration:

cat /etc/image-id
image_name="al2023-ami"
image_version="2023"
image_arch="x86_64"
image_file="al2023-ami-2023.2.20231016.0-kernel-6.1-x86_64.xfs.gpt"
image_stamp="f733-4bf8"
image_date="20231016220519"
recipe_name="al2023 ami"
recipe_id="ed84f07e-e06c-a3cd-759b-d254-59e2-3d69-b61eb10b"

Please note that amazon-linux-extras is not available for Amazon Linux 2023.

Install postgresql15

Install postgresql15 with the following command

sudo yum install postgresql15

The output will be similar to:

sudo yum install postgresql15
Last metadata expiration check: 0:11:01 ago on Mon Oct 23 16:46:45 2023.
Dependencies resolved.
Package                                                  Architecture                          Version                                             Repository                                  Size
.
.
Lines omitted for brevity
.
.
  Verifying        : postgresql15-private-libs-15.0-1.amzn2023.0.4.x86_64                                                                                                                        2/2

Installed:
  postgresql15-15.0-1.amzn2023.0.4.x86_64                                                    postgresql15-private-libs-15.0-1.amzn2023.0.4.x86_64

Complete!

Install postgresql-contrib package

This package contains various extension modules that are included in the PostgreSQL distribution including pgbench. Install the package with the following command:

sudo yum install postgresql15-contrib

The output will be similar to:

sudo yum install postgresql15-contrib
Last metadata expiration check: 0:32:28 ago on Mon Oct 23 16:46:45 2023.
Dependencies resolved.
.
.
Lines omitted for brevity
.
.
  Verifying        : uuid-1.6.2-50.amzn2023.0.2.x86_64                                                                                                                                           3/3

Installed:
  libxslt-1.1.34-5.amzn2023.0.2.x86_64                          postgresql15-contrib-15.0-1.amzn2023.0.4.x86_64                          uuid-1.6.2-50.amzn2023.0.2.x86_64

Complete!

Verify that pgbench has been installed:

which pgbench

Output:

/usr/bin/pgbench

Create a working directory/folder

Create a working directory for pgbench related artifacts

mkdir pgbench; cd pgbench

Create a database for pgbench

For the purpose of this testing, I will be creating a new database named pg-bench. Log on as the RDS administrator and run the below commands:

CREATE DATABASE "dc-pgbench"
    WITH
    OWNER = dbadmin
    ENCODING = 'UTF8'
    LC_COLLATE = 'en_US.UTF-8'
    LC_CTYPE = 'en_US.UTF-8'
    TABLESPACE = pg_default
    CONNECTION LIMIT = -1
    IS_TEMPLATE = False;

Connect to this new database and create a user for pgbench with the following commands:

create user dcpgbench with encrypted password 'mypassword';
grant all privileges on database "dc-pgbench" to dcpgbench;

Confirm connectivity to the Postgres RDS instance

In the pgbench directory, create the below file to facilitate logging on to the database without having to type a long command:

Edit a file

vi psql.ksh

Enter the below commands after modifying for your instance:

export PGB_HOST=red-primary-writer.xxxxx.us-east-1.rds.amazonaws.com
export PGB_PORT=5432
export PGB_DB=dc-pgbench
export PGB_USER=dcpgbench
PGPASSWORD=mypassword psql --host=$PGB_HOST --port=$PGB_PORT --username=$PGB_USER --dbname=$PGB_DB

Change permissions

chmod u+x psql.ksh

Test connectivity

./psql.ksh

If successful, the connection will look like:

psql (15.0, server 14.6)
SSL connection (protocol: TLSv1.2, cipher: AES128-SHA256, compression: off)
Type "help" for help.

dc-pgbench=> SELECT current_database();
 current_database
------------------
 dc-pgbench
(1 row)

dc-pgbench=>

I also issued the command

SELECT current_database();

To confirm that I am connected to the correct database.

Create a data structures that will be used to stress test the database instance

I will be using the below structures to create a load on the database. The workload will be executed by user dcpgbench.

Sequences for keys

The below sequences will be used to generate customer, order and, item data respectively:

CREATE SEQUENCE if not exists dc_cust_id
INCREMENT 1
START 1
MINVALUE 1
MAXVALUE 9223372036854775807
CACHE 1;

CREATE SEQUENCE if not exists dc_order_id
INCREMENT 1
START 1
MINVALUE 1
MAXVALUE 9223372036854775807
CACHE 1;

CREATE SEQUENCE if not exists dc_item_id
INCREMENT 1
START 1
MINVALUE 1
MAXVALUE 9223372036854775807
CACHE 1;

GRANT USAGE ON SEQUENCE dc_cust_id TO dcpgbench;
GRANT USAGE ON SEQUENCE dc_order_id TO dcpgbench;
GRANT USAGE ON SEQUENCE dc_item_id TO dcpgbench;

Tables

The below tables will store customer, items and order data respectively:

CREATE TABLE dc_customer
(
  cust_id      BIGINT                         NOT NULL,
  cust_name    VARCHAR(60)                    NOT NULL,
  create_ts    timestamp(0) without TIME zone NOT NULL,
  CONSTRAINT   dc_cust_pk PRIMARY KEY (cust_id)
)
;

CREATE TABLE dc_item
(
  item_id      BIGINT                         NOT NULL,
  item_name    VARCHAR(60)                    NOT NULL,
  create_ts    timestamp(0) without TIME zone NOT NULL,
  CONSTRAINT   dc_item_pk PRIMARY KEY (item_id)
)
;

CREATE TABLE dc_order
(
  order_id     BIGINT NOT NULL,
  cust_id      BIGINT NOT NULL references dc_customer (cust_id),
  item_id      BIGINT NOT NULL references dc_item (item_id),
  quantity     BIGINT NOT NULL,
  create_ts    timestamp(0) without TIME zone NOT NULL,
  CONSTRAINT   dc_order_pk PRIMARY KEY (order_id)
)
;

GRANT SELECT, INSERT, UPDATE, DELETE 
ON dc_customer, dc_item, dc_order
TO dcpgbench
;

Load test 01 – Write intensive

I will be using the below script to perform a load test. At a high level, this test will create data in our three tables simulating writes. This data will then be used in a later script to simulate read intensive data.

Command to execute the script
Execute the script with the below command

nohup ./load-test-01.ksh > load-test-01.out 2>&1 &

Contents of the script
Below are the contents of the script with comments on each line of code

#!/bin/bash

#
## database endpoint
#
export PGB_HOST=red-primary-writer.xxxxx.us-east-1.rds.amazonaws.com
#
## database port
#
export PGB_PORT=5432
#
## bench mark database
#
export PGB_DB=dc-pgbench
#
## The user name to connect as
#
export PGB_USER=dcpgbench
#
## Number of clients simulated, that is, number of concurrent database sessions
#
export PGB_CLIENTS=50
#
## Number of worker threads within pgbench
#
export PGB_THREADS=20
#
## filename containing the SQLs to be executed
#
export PGB_SQL_FILE=load_test_01.sql
#
## Run the test for this many seconds
#
export PGB_RUN_TIME=300
#
## Set the filename prefix for the log files created by --log
#
export PGB_LOG_PREFIX=dc-pgb
#
## Sampling rate, used when writing data into the log, to reduce the
## amount of log generated. 1.0 means all transactions will be logged,
## 0.05 means only 5% of the transactions will be logged
#
export PGB_SAMPLE_RATE=0.05
#
## make sure we are in the correct directory
#
cd /home/ec2-user/pgbench
#
## run the test
#
PGPASSWORD=mypassword pgbench --client=$PGB_CLIENTS --jobs=$PGB_THREADS --time=$PGB_RUN_TIME --username=$PGB_USER -d $PGB_DB --host=$PGB_HOST --port=$PGB_PORT --file=$PGB_SQL_FILE --log --log-prefix
=$PGB_LOG_PREFIX --sampling-rate=$PGB_SAMPLE_RATE --no-vacuum

Contents of the SQL to generate write activity
The following SQL commands will be executed to generate some load on the database. These SQL statements will be saved in file named load_test_01.sql which will be referenced by the script named load-test-01.ksh.

--
--get the next sequence value for the order id and 
--item id
--
SELECT nextval('dc_cust_id') \gset cust_id_
SELECT nextval('dc_item_id') \gset item_id_
--
--Create a random value from the MMSS of the current 
--timestamp to be used as an order quaintity
--
SELECT cast(substring(TO_CHAR(CURRENT_TIMESTAMP, 'YYYY-MM-DD HH12:MMSS') from 15 for 4) as int) as order_qty \gset
--
--Insert a row into the customer table
--
INSERT
INTO   dc_customer
(
  cust_id,
  cust_name,
  create_ts
)
values
(
    :cust_id_nextval,
     concat('cust-',CURRENT_TIMESTAMP),
     CURRENT_TIMESTAMP
)
;
--
--Insert a row into the item table
--
INSERT
INTO   dc_item
(
  item_id,
  item_name,
  create_ts
)
values
(
    :item_id_nextval,
     concat('item-',CURRENT_TIMESTAMP),
     CURRENT_TIMESTAMP
)
;
--
--Insert a row into the order table
--
INSERT
INTO   dc_order
(
  order_id,
  cust_id,
  item_id,
  quantity,
  create_ts
)
values
(
     nextval('dc_order_id'),
    :cust_id_nextval,
    :item_id_nextval,
    :order_qty,
     CURRENT_TIMESTAMP
)
;

Load test 02 – Read intensive

I will be using the below script to perform a read load test. At a high level, this test will perform multiple reads on our three tables. The previous test has written approximately 150,000 rows to our three test tables.

Command to execute the script
Execute the script with the below command

nohup ./load-test-02.ksh > load-test-02.out >2&1 &

Contents of the script
Below are the contents of the script with comments on each line of code

#!/bin/bash

#
## database endpoint
#
export PGB_HOST=red-primary-writer.xxxxx.us-east-1.rds.amazonaws.com
#
## database port
#
export PGB_PORT=5432
#
## bench mark database
#
export PGB_DB=dc-pgbench
#
## The user name to connect as
#
export PGB_USER=dcpgbench
#
## Number of clients simulated, that is, number of concurrent database sessions
#
export PGB_CLIENTS=200
#
## Number of worker threads within pgbench
#
export PGB_THREADS=20
#
## filename containing the SQLs to be executed
#
export PGB_SQL_FILE=load_test_02.sql
#
## Run the test for this many seconds
#
export PGB_RUN_TIME=300
#
## make sure we are in the correct directory
#
cd /home/ec2-user/pgbench
#
## run the test
#
PGPASSWORD=mypassword pgbench --client=$PGB_CLIENTS --jobs=$PGB_THREADS --time=$PGB_RUN_TIME --username=$PGB_USER -d $PGB_DB --host=$PGB_HOST --port=$PGB_PORT --file=$PGB_SQL_FILE --no-vacuum

Contents of the SQL to generate read activity
The plan for this script is to select random data from our three tables. These SQL statements will be saved in file named load_test_01.sql which will be referenced by the script named load-test-01.ksh .

First, I establish the current value of the sequences with the below SQLs:

SELECT last_value FROM dc_cust_id;
SELECT last_value FROM dc_item_id;
SELECT last_value FROM dc_order_id;

Each of the sequences is current at a value of approximately 151,000 and hence I will use 150,000 as the upper range in the below random function.

SELECT floor(random()*(150000)) as dc_cust_id \gset
--
--select the orders
--
select 
            cus.cust_id 
           ,cus.cust_name 
           ,itm.item_id 
           ,itm.item_name 
           ,ord.order_id 
           ,ord.quantity 
           ,ord.create_ts 
from        dc_order ord
inner join  dc_customer cus
        on  cus.cust_id = ord.cust_id 
inner join  dc_item itm
        on  itm.item_id = ord.item_id 
where       cus.cust_id = :dc_cust_id
order by 
            cus.cust_id 
           ,ord.create_ts 
;

Monitoring the activity – Performance Insights

The impact of the test can be measured by Performance Insights. I ran a test with PGB_CLIENTS set to 500 users for a period of 5 minutes. The Average active sessions (AAS) displayed this spike in Performance insights as a result of the load placed on the database:

As expected the top user was dcpgbench, the user running the workload:

The top SQLs reported by Performance Insights also list the SQL statements that I was executing

Create a Lambda alerting process

Introduction

The goal is to create a lambda that runs once a day and sends an alert on all EC2 instances that are currently configured in all regions for a given account. The solution consists of

A role to provide permissions
An SNS topic that can be subscribed to by users who wish to be notified
A lambda written in python to identify the EC2 instances
A scheduling process consisting of an EventBridge rule and an EventBridge trigger

Create the role

Navigate to the IAM Dashboard and click on “Roles” in the left panel

Click on the orange “Create role” button
Select “AWS service” under the “Trusted entity type”
Select “Lambda” under the “Use case”
Under the “Permissions policies” search for “AWSLambdaBasicExecutionRole” and select it
Click on the orange “Next” button
Provide a “Role name” and meaningful “Description”

Click on the orange “Create role” button. We will be modifying the role later to add more permissions.

Return to the IAM Roles dashboard and search for the role as we have to add two more permissions

Click on the “Role name” and then on “Add permissions”, “Attach policies” on the next page
On the next page, add the “AmazonEC2ReadOnlyAccess” and then repeat to add the

“AmazonSNSFullAccess” policies.

The role creation is now complete.

Create the SNS topic

To demonstrate the AWS Command Line Interface (CLI), we will create the topic via a CLI command rather than the console. The AWS CLI command can be executed either from an EC2 instance with the required permissions or from cloud shell.I will be using cloud shell as it does not require any setup. The command is as follows

aws sns create-topic --name dc-running-assets-sns-topic

The output will display the ARN of the SNS topic. Save the ARN as it will be needed later.

Navigate to the “Amazon SNS” “Topics” dashboard and search for the SNS topic with the name from the above create command. Click on the “Name” and then on the orange “Create subscription” button on the next page. On the next page, populate the “Protocol” as “Email” and the “Endpoint” with your email address and click on the orange create subscription button

You will receive an email requesting you to confirm subscription. After you click on the “Confirm subscription” link, you will be taken to the subscription confirmation webpage. This can also be confirmed by returning to the SNS dashboard and checking the subscriptions. Additionally, you will receive a subscription confirmation email.

Create the lambda function in python

Navigate to the Lambda functions page in the console and click on the orange “Create function” button.

On the “Create function” web page
Select the “Author from scratch” option
Populate the “Function name”. I will use dc-running-assets-lambda
Select Python 3.9 under the “Runtime” drop down
Select x86_64 under “Architecture”
Under the “Change default execution role”
Select “Use an existing role”
Populate the role created above in the “Existing role” drop down

Finally click on the orange “Create function” button

On the next page, click on the “Code” tab if not already selected and replace the prepopulated code with the code below after making the following modifications

Replace the sns_topic_arn variable with the arn of the SNS topic created earlier
Comment or uncomment the lines with comments “Running instances only” or “All instance” depending on your use case
The “import os” is in place in the even you need to debug with the use of print statements

import boto3
import os

def lambda_handler(event, context):
    
    sns_topic_arn = 'arn:aws:sns:us-east-2:xxxxx:dc-running-assets-sns-topic'
    
    ec2_regions = [region['RegionName'] for region in boto3.client('ec2').describe_regions()['Regions']]
    all_instances = []
    
    for region in ec2_regions:
        all_instances.append(' ')
        all_instances.append(f"**** Region: {region} ***")
        
        ec2 = boto3.client('ec2', region_name=region)

        # Running instances only
        #response = ec2.describe_instances(Filters=[{'Name': 'instance-state-name', 'Values': ['running']}])
        
        # All instances
        response = ec2.describe_instances()
        
        for reservation in response['Reservations']:
            for instance in reservation['Instances']:
                instance_id = instance['InstanceId']
                instance_state = instance['State']['Name']
                instance_type = instance['InstanceType']
                private_ip = instance['PrivateIpAddress']
                all_instances.append(f"Region: {region}, Inst. ID: {instance_id}, State: {instance_state}, Type: {instance_type}, Prvt. IP: {private_ip}")

    if all_instances:
        sns = boto3.client('sns')
        message = "List of EC2 Instances:\n" + '\n'.join(all_instances)
        sns.publish(TopicArn=sns_topic_arn, Subject="List of EC2 Instances", Message=message)
    
    return {
        'statusCode': 200,
        'body': 'Email sent successfully'
    }

After pasting the code, click on the “Deploy” button and the “Changes not deployed” message will be removed.

Configuring timeouts

Lambda functions are created with a default timeout of 3 seconds. This particular lambda needs approximately 45 seconds to execute as it loops through all the regions and all the EC2 in each region hence we need to increase the default timeout. This is accomplished as follows:

Select the “Configuration” tab to the right of the “Code” tab and click on “General configuration”
Click on the “Edit” button
On the “Edit basic settings” page, enter the following
I added a description in the “Description – optional” box
Change the “Timeout” to 45 seconds

Create an AWS event to trigger the lambda on a set schedule

Create the scheduler as follows:

On the lambda page, click on the “Add trigger” button in the “Function overview” section at the top of the page
On the Add trigger page, type “Schedule” into the “Select a source” box and select “EventBridge (CloudWatch events)”
On the “Trigger configuration” page, select “Create a new rule” and populate

“Rule name” with the name of the rule
“rule description” with a meaningful description

Under “Rule type”

Select “Schedule expression”
Enter the schedule in the “Schedule expression” box. For example, “cron(0 20 * * ? *)” indicates that the schedule is every day at 20:00 hours

Click on the orange “Add” button to create the rule

Conclusion

The lambda function will now execute as per the defined schedule and email the list of servers from the account.

Selectively recovering data in an Aurora PostgreSQL database

Using automated backups and snapshots we can restore an AWS RDS database
– to the most current point in time
– to a prior point in time
– use the restored database going forward
– or restore particular tables

In summary, we have multiple options depending on the nature and extent of the data lost.

In this example, I will demonstrate a method to restore certain rows in one table in the original database.

Create a test table

CREATE TABLE dc_schema.test_table (
                object_id numeric NOT NULL,
                "owner" varchar(128) NULL,
                object_name varchar(128) NULL,
                CONSTRAINT test_table_pk PRIMARY KEY (object_id)
);

and inserted some data. For demonstration purposes, let us assume that the rows with object_id between 30,000 and 40,000 are critical to the application. Further, assume that someone accidentally deleted this data with the following SQL:

select *
from   dc_schema.test_table daac 
where  object_id between 30000 and 40000
;

We now need this data restored as it is critical to the operation of the application.

On the AWS RDS Console page, select the DB cluster, click on “Actions” and then on “Restore to point in time”. On the next page, populate the custom date and time to a point before the data was deleted. Most of the options will be populated except for a few such as the DB instance identifier.

Click on the orange button at the bottom right of the page to begin the recovery. This will take some time to occur.

Some points to keep in mind with respect to the restored database:

– You can use the restored DB instance as soon as its status is available. The DB instance continues to load data in the background. This is known as lazy loading.

– If you access data that hasn’t been loaded yet, the DB instance immediately downloads the requested data from Amazon S3, and then continues loading the rest of the data in the background

– To help mitigate the effects of lazy loading on tables to which you require quick access, you can perform operations that involve full-table scans, such as SELECT *. This allows Amazon RDS to download all of the backed-up table data from S3.

The restore process will create a new instance. While the “restore” is occurring, you can continue to use the original database. To simulate this, I ran a Python program to add another 100,000 rows to the table from which the data was accidentally deleted. The situation of creating keys in the range of the data that was deleted has to be handled by the application staff. In this particular example, I am creating keys higher than the range of keys that were deleted to ensure that there is no primary key collision.

After the restore was completed, I connected to the restored database to check if my data is present. After this was confirmed, I switched back to the original database from where the critical data was deleted and created a link to the recovered database with the following commands:

Create an extension:

CREATE EXTENSION dblink;

Create a dblink:

SELECT dblink_connect
(
  'link_to_recovery’, 
  'dbname=clientdb 
   port=5432  
   host= red-recovery-cluster…….rds.amazonaws.com 
   user=dbadmin
   password=password_here'
);

Execute the following SQL to select the data from the recovered database and insert it into the original database:

insert
into    dc_schema.test_table
SELECT  * 
from    dblink
(
'link_to_recovery', 
'SELECT object_id, owner, object_name 
 FROM   dc_schema.test_table 
 where  object_id between 30000 and 40000'
)
AS 
p(object_id numeric, owner varchar(128), object_name varchar(128))
;

Assuming that you have recovered all the data that was accidentally deleted, you can drop the recovered database. I will do this via AWS CLI commands running in CloudShell.

First, we delete the instance created under the cluster with the command:

aws rds delete-db-instance \
  --db-instance-identifier red-recovery-cluster \
  --skip-final-snapshot \
  --delete-automated-backups

After the instance is deleted, I deleted the cluster with the following command:

aws rds delete-db-cluster \
  --db-cluster-identifier red-recovery-cluster \
  --skip-final-snapshot

Please note that the above commands delete the database along with automated backups and without a final snapshot.