Load testing with pgbench

Introduction

PostgreSQL is renowned for its robustness and reliability, making it a go-to choice for a wide range of applications. To ensure your PostgreSQL database performs at its best, benchmarking is a crucial step in the process. In this blog post, I will delve into the powerful capabilities of pgbench, a versatile tool for running benchmark tests on PostgreSQL.

Out of the box, pgbench uses a scenario loosely based on TPC-B, running a sequence of five SELECT, UPDATE, and INSERT commands per transaction. While this default scenario provides valuable insights into your system’s performance, there are cases where more specialized testing is necessary.

In this blog, I will explore the art of crafting customized benchmarks using specific tables and SQL statements. This approach enables you to tailor your benchmark to the unique demands of your application, ensuring your PostgreSQL database is optimized for real-world workloads.

Create an EC2

Create an Amazon EC2 running Amazon Linux. These instructions use an Instance type of t2.2xlarge with the below configuration:

cat /etc/image-id
image_name="al2023-ami"
image_version="2023"
image_arch="x86_64"
image_file="al2023-ami-2023.2.20231016.0-kernel-6.1-x86_64.xfs.gpt"
image_stamp="f733-4bf8"
image_date="20231016220519"
recipe_name="al2023 ami"
recipe_id="ed84f07e-e06c-a3cd-759b-d254-59e2-3d69-b61eb10b"

Please note that amazon-linux-extras is not available for Amazon Linux 2023.

Install postgresql15

Install postgresql15 with the following command

sudo yum install postgresql15

The output will be similar to:

sudo yum install postgresql15
Last metadata expiration check: 0:11:01 ago on Mon Oct 23 16:46:45 2023.
Dependencies resolved.
Package                                                  Architecture                          Version                                             Repository                                  Size
.
.
Lines omitted for brevity
.
.
  Verifying        : postgresql15-private-libs-15.0-1.amzn2023.0.4.x86_64                                                                                                                        2/2

Installed:
  postgresql15-15.0-1.amzn2023.0.4.x86_64                                                    postgresql15-private-libs-15.0-1.amzn2023.0.4.x86_64

Complete!

Install postgresql-contrib package

This package contains various extension modules that are included in the PostgreSQL distribution including pgbench. Install the package with the following command:

sudo yum install postgresql15-contrib

The output will be similar to:

sudo yum install postgresql15-contrib
Last metadata expiration check: 0:32:28 ago on Mon Oct 23 16:46:45 2023.
Dependencies resolved.
.
.
Lines omitted for brevity
.
.
  Verifying        : uuid-1.6.2-50.amzn2023.0.2.x86_64                                                                                                                                           3/3

Installed:
  libxslt-1.1.34-5.amzn2023.0.2.x86_64                          postgresql15-contrib-15.0-1.amzn2023.0.4.x86_64                          uuid-1.6.2-50.amzn2023.0.2.x86_64

Complete!

Verify that pgbench has been installed:

which pgbench

Output:

/usr/bin/pgbench

Create a working directory/folder

Create a working directory for pgbench related artifacts

mkdir pgbench; cd pgbench

Create a database for pgbench

For the purpose of this testing, I will be creating a new database named pg-bench. Log on as the RDS administrator and run the below commands:

CREATE DATABASE "dc-pgbench"
    WITH
    OWNER = dbadmin
    ENCODING = 'UTF8'
    LC_COLLATE = 'en_US.UTF-8'
    LC_CTYPE = 'en_US.UTF-8'
    TABLESPACE = pg_default
    CONNECTION LIMIT = -1
    IS_TEMPLATE = False;

Connect to this new database and create a user for pgbench with the following commands:

create user dcpgbench with encrypted password 'mypassword';
grant all privileges on database "dc-pgbench" to dcpgbench;

Confirm connectivity to the Postgres RDS instance

In the pgbench directory, create the below file to facilitate logging on to the database without having to type a long command:

Edit a file

vi psql.ksh

Enter the below commands after modifying for your instance:

export PGB_HOST=red-primary-writer.xxxxx.us-east-1.rds.amazonaws.com
export PGB_PORT=5432
export PGB_DB=dc-pgbench
export PGB_USER=dcpgbench
PGPASSWORD=mypassword psql --host=$PGB_HOST --port=$PGB_PORT --username=$PGB_USER --dbname=$PGB_DB

Change permissions

chmod u+x psql.ksh

Test connectivity

./psql.ksh

If successful, the connection will look like:

psql (15.0, server 14.6)
SSL connection (protocol: TLSv1.2, cipher: AES128-SHA256, compression: off)
Type "help" for help.

dc-pgbench=> SELECT current_database();
 current_database
------------------
 dc-pgbench
(1 row)

dc-pgbench=>

I also issued the command

SELECT current_database();

To confirm that I am connected to the correct database.

Create a data structures that will be used to stress test the database instance

I will be using the below structures to create a load on the database. The workload will be executed by user dcpgbench.

Sequences for keys

The below sequences will be used to generate customer, order and, item data respectively:

CREATE SEQUENCE if not exists dc_cust_id
INCREMENT 1
START 1
MINVALUE 1
MAXVALUE 9223372036854775807
CACHE 1;

CREATE SEQUENCE if not exists dc_order_id
INCREMENT 1
START 1
MINVALUE 1
MAXVALUE 9223372036854775807
CACHE 1;

CREATE SEQUENCE if not exists dc_item_id
INCREMENT 1
START 1
MINVALUE 1
MAXVALUE 9223372036854775807
CACHE 1;

GRANT USAGE ON SEQUENCE dc_cust_id TO dcpgbench;
GRANT USAGE ON SEQUENCE dc_order_id TO dcpgbench;
GRANT USAGE ON SEQUENCE dc_item_id TO dcpgbench;

Tables

The below tables will store customer, items and order data respectively:

CREATE TABLE dc_customer
(
  cust_id      BIGINT                         NOT NULL,
  cust_name    VARCHAR(60)                    NOT NULL,
  create_ts    timestamp(0) without TIME zone NOT NULL,
  CONSTRAINT   dc_cust_pk PRIMARY KEY (cust_id)
)
;

CREATE TABLE dc_item
(
  item_id      BIGINT                         NOT NULL,
  item_name    VARCHAR(60)                    NOT NULL,
  create_ts    timestamp(0) without TIME zone NOT NULL,
  CONSTRAINT   dc_item_pk PRIMARY KEY (item_id)
)
;

CREATE TABLE dc_order
(
  order_id     BIGINT NOT NULL,
  cust_id      BIGINT NOT NULL references dc_customer (cust_id),
  item_id      BIGINT NOT NULL references dc_item (item_id),
  quantity     BIGINT NOT NULL,
  create_ts    timestamp(0) without TIME zone NOT NULL,
  CONSTRAINT   dc_order_pk PRIMARY KEY (order_id)
)
;

GRANT SELECT, INSERT, UPDATE, DELETE 
ON dc_customer, dc_item, dc_order
TO dcpgbench
;

Load test 01 – Write intensive

I will be using the below script to perform a load test. At a high level, this test will create data in our three tables simulating writes. This data will then be used in a later script to simulate read intensive data.

Command to execute the script
Execute the script with the below command

nohup ./load-test-01.ksh > load-test-01.out 2>&1 &

Contents of the script
Below are the contents of the script with comments on each line of code

#!/bin/bash

#
## database endpoint
#
export PGB_HOST=red-primary-writer.xxxxx.us-east-1.rds.amazonaws.com
#
## database port
#
export PGB_PORT=5432
#
## bench mark database
#
export PGB_DB=dc-pgbench
#
## The user name to connect as
#
export PGB_USER=dcpgbench
#
## Number of clients simulated, that is, number of concurrent database sessions
#
export PGB_CLIENTS=50
#
## Number of worker threads within pgbench
#
export PGB_THREADS=20
#
## filename containing the SQLs to be executed
#
export PGB_SQL_FILE=load_test_01.sql
#
## Run the test for this many seconds
#
export PGB_RUN_TIME=300
#
## Set the filename prefix for the log files created by --log
#
export PGB_LOG_PREFIX=dc-pgb
#
## Sampling rate, used when writing data into the log, to reduce the
## amount of log generated. 1.0 means all transactions will be logged,
## 0.05 means only 5% of the transactions will be logged
#
export PGB_SAMPLE_RATE=0.05
#
## make sure we are in the correct directory
#
cd /home/ec2-user/pgbench
#
## run the test
#
PGPASSWORD=mypassword pgbench --client=$PGB_CLIENTS --jobs=$PGB_THREADS --time=$PGB_RUN_TIME --username=$PGB_USER -d $PGB_DB --host=$PGB_HOST --port=$PGB_PORT --file=$PGB_SQL_FILE --log --log-prefix
=$PGB_LOG_PREFIX --sampling-rate=$PGB_SAMPLE_RATE --no-vacuum

Contents of the SQL to generate write activity
The following SQL commands will be executed to generate some load on the database. These SQL statements will be saved in file named load_test_01.sql which will be referenced by the script named load-test-01.ksh.

--
--get the next sequence value for the order id and 
--item id
--
SELECT nextval('dc_cust_id') \gset cust_id_
SELECT nextval('dc_item_id') \gset item_id_
--
--Create a random value from the MMSS of the current 
--timestamp to be used as an order quaintity
--
SELECT cast(substring(TO_CHAR(CURRENT_TIMESTAMP, 'YYYY-MM-DD HH12:MMSS') from 15 for 4) as int) as order_qty \gset
--
--Insert a row into the customer table
--
INSERT
INTO   dc_customer
(
  cust_id,
  cust_name,
  create_ts
)
values
(
    :cust_id_nextval,
     concat('cust-',CURRENT_TIMESTAMP),
     CURRENT_TIMESTAMP
)
;
--
--Insert a row into the item table
--
INSERT
INTO   dc_item
(
  item_id,
  item_name,
  create_ts
)
values
(
    :item_id_nextval,
     concat('item-',CURRENT_TIMESTAMP),
     CURRENT_TIMESTAMP
)
;
--
--Insert a row into the order table
--
INSERT
INTO   dc_order
(
  order_id,
  cust_id,
  item_id,
  quantity,
  create_ts
)
values
(
     nextval('dc_order_id'),
    :cust_id_nextval,
    :item_id_nextval,
    :order_qty,
     CURRENT_TIMESTAMP
)
;

Load test 02 – Read intensive

I will be using the below script to perform a read load test. At a high level, this test will perform multiple reads on our three tables. The previous test has written approximately 150,000 rows to our three test tables.

Command to execute the script
Execute the script with the below command

nohup ./load-test-02.ksh > load-test-02.out >2&1 &

Contents of the script
Below are the contents of the script with comments on each line of code

#!/bin/bash

#
## database endpoint
#
export PGB_HOST=red-primary-writer.xxxxx.us-east-1.rds.amazonaws.com
#
## database port
#
export PGB_PORT=5432
#
## bench mark database
#
export PGB_DB=dc-pgbench
#
## The user name to connect as
#
export PGB_USER=dcpgbench
#
## Number of clients simulated, that is, number of concurrent database sessions
#
export PGB_CLIENTS=200
#
## Number of worker threads within pgbench
#
export PGB_THREADS=20
#
## filename containing the SQLs to be executed
#
export PGB_SQL_FILE=load_test_02.sql
#
## Run the test for this many seconds
#
export PGB_RUN_TIME=300
#
## make sure we are in the correct directory
#
cd /home/ec2-user/pgbench
#
## run the test
#
PGPASSWORD=mypassword pgbench --client=$PGB_CLIENTS --jobs=$PGB_THREADS --time=$PGB_RUN_TIME --username=$PGB_USER -d $PGB_DB --host=$PGB_HOST --port=$PGB_PORT --file=$PGB_SQL_FILE --no-vacuum

Contents of the SQL to generate read activity
The plan for this script is to select random data from our three tables. These SQL statements will be saved in file named load_test_01.sql which will be referenced by the script named load-test-01.ksh .

First, I establish the current value of the sequences with the below SQLs:

SELECT last_value FROM dc_cust_id;
SELECT last_value FROM dc_item_id;
SELECT last_value FROM dc_order_id;

Each of the sequences is current at a value of approximately 151,000 and hence I will use 150,000 as the upper range in the below random function.

SELECT floor(random()*(150000)) as dc_cust_id \gset
--
--select the orders
--
select 
            cus.cust_id 
           ,cus.cust_name 
           ,itm.item_id 
           ,itm.item_name 
           ,ord.order_id 
           ,ord.quantity 
           ,ord.create_ts 
from        dc_order ord
inner join  dc_customer cus
        on  cus.cust_id = ord.cust_id 
inner join  dc_item itm
        on  itm.item_id = ord.item_id 
where       cus.cust_id = :dc_cust_id
order by 
            cus.cust_id 
           ,ord.create_ts 
;

Monitoring the activity – Performance Insights

The impact of the test can be measured by Performance Insights. I ran a test with PGB_CLIENTS set to 500 users for a period of 5 minutes. The Average active sessions (AAS) displayed this spike in Performance insights as a result of the load placed on the database:

As expected the top user was dcpgbench, the user running the workload:

The top SQLs reported by Performance Insights also list the SQL statements that I was executing

CloudWatch alarms

One of the methods to be alerted on an issue with one of your AWS resources is to use a CloudWatch alert that will send an email (or page) to a subscriber. One issue with this approach is that “An alarm invokes actions only when the alarm changes state.”. This is documented at

Using Amazon CloudWatch alarms

In other words, you will receive one email or page when the event occurs. As an example, if you have an alert that is based on the CPU utilization of the database exceeding a certain threshold, the alert will be sent out once when the database first breaches the threshold. If you have a requirement to be paged every (for example) five minutes while the issue continues, you will need to develop some other methodology. One of the suggestions is to use a combination of services such as Eventbridge and Lambda to keep checking the alarm state.

As an option for DBAs used to handling alerts via scripting, I created a small script (see below) that can be executed via cron on an EC2 server that will monitor the CloudWatch alarm status and keep paging while the alarm is occurring.

#!/bin/bash

#
## alarms that are to be checked
#
alarm_names=("mydb-CPUUtilizationAlarm" "mydb-FreeLocalStorageAlarm")

#
## write out a file header
#
echo 'Report on CloudWatch alarms' > alarm_status.txt
echo 'Executed at ' `date` >> alarm_status.txt
echo ' ' >> alarm_status.txt
printf "%-50s %-10s\n" "Alarm Name" "Status" >> alarm_status.txt
pager_flag='n'

#
## get the status of each alarm
#
for cur_value in "${alarm_names[@]}"
do
    alarm_status=`aws cloudwatch describe-alarms --alarm-names "$cur_value" --output="json" | grep -iE '"StateValue"' | sed 's/StateValue\|"\|\':'\|','//g'`
      alarm_status=$(echo "$alarm_status" | tr -d ' ')
      if  [ "$alarm_status" == "ALARM" ]; then
          pager_flag='y'
      fi
      printf "%-50s %-5s\n" "$cur_value" "$alarm_status" >> alarm_status.txt
done

#
## mail to the appropriate person/team
# 
mail -s "CloudWatch alarm status" dba-group@company.com < alarm_status.txt

#
## check if a page out is required
#
if  [ $pager_flag == 'y' ]; then
    mail -s "CloudWatch alarm status - Alarm detected" dba-pager@company.com < alarm_status.txt
fi

Creating an AWS RDS instance via the command line interface (CLI)

We recently ran into an unusual circumstance working with a customer and were unable to create an RDS instance via the console and ended up creating the instance via the command line. This is a useful command in that we can use it in a script to create multiple instances.

The complete documentation for all options can be found at AWS CLI reference.

Code with explanation of the options

I will attempt to explain each option of the command below. The complete code in one unit is at the bottom of the post.

aws rds create-db-instance \
--db-name MYDB \

The name of the database. This option has different results depending on the database engine you use. This example is for Oracle.

--db-instance-identifier mydb \

The DB instance identifier. This parameter is stored as a lowercase string.

--allocated-storage 300 \

The amount of storage in gibibytes (GiB) to allocate for the DB instance. The constraints on the storage are different depending on the database engine you use.

--db-instance-class db.m5.2xlarge \

The compute and memory capacity of the DB instance. Not all DB instance classes are available in all Amazon Web Services Regions, or for all database engines.

--engine oracle-se2-cdb \

The type of database engine to be used for this instance. In this example, it is Oracle, Standard Edition container database.

--master-username admin \

The account name for the master user.

--master-user-password admin_pwd \

The password for the master user. Please pick a secure password.

--vpc-security-group-ids "sg1" "sg2" \

A list of Amazon EC2 VPC security groups to associate with this DB instance in the format “string 01””String 02”

--db-subnet-group-name “sub_net_1” \

A DB subnet group to associate with this DB instance.

--db-parameter-group-name default.oracle-se2-cdb-19 \

The name of the DB parameter group to associate with this DB instance. If you do not specify a value, then the default DB parameter group for the specified DB engine and version is used.

--backup-retention-period 2 \

The number of days for which automated backups are retained. Setting this parameter to a positive number enables backups. Setting this parameter to 0 disables automated backups.

--port 1521 \

The database port. There are different default values depending on the database engine.

--no-multi-az \

Specifies that the database is not to be deployed in multi-AZ mode. This is a Boolean and can be specified as –multi-az or –no-multi-az

--engine-version 19.0.0.0.ru-2022-04.rur-2022-04.r1 \

The version number of the database engine to use. For a list of valid engine versions, use the DescribeDBEngineVersions operation.

--auto-minor-version-upgrade \

Indicates whether minor engine upgrades are applied automatically to the DB instance during the maintenance window. By default, minor engine upgrades are applied automatically.

--license-model license-included \

License model information for this DB instance.

--option-group-name my_option_group \

Indicates that the DB instance should be associated with the specified option group.

--character-set-name WE8ISO8859P1 \

For supported engines, such as Oracle, this value indicates that the DB instance should be associated with the specified Character Set.

--nchar-character-set-name AL16UTF16 \

The name of the NCHAR character set for the Oracle DB instance.

--no-publicly-accessible \

Whether the DB instance is publicly accessible

--storage-type gp2 \

Specifies the storage type to be associated with the DB instance.

--storage-encrypted \

Specifies that the DB instance storage is encrypted with the below KMS Key

--kms-key-id  \

Specify the ARN of the KMS key to be used to encrypt the storage

--copy-tags-to-snapshot \

Specifies that the tags associated with the database instance will be applied to the snapshots of the database

--no-enable-iam-database-authentication \

A value that indicates whether to enable mapping of Amazon Web Services Identity and Access Management (IAM) accounts to database accounts. IAM database authentication works with MariaDB, MySQL, and PostgreSQL. With this authentication method, you don’t need to use a password when you connect to a DB instance. Instead, you use an authentication token.

--enable-performance-insights \

Whether to enable Performance Insights for the DB instance. Performance Insights expands on existing Amazon RDS monitoring features to illustrate and help you analyze your database performance.

--performance-insights-kms-key-id  \

The ARN of the KMS key used to encrypt your performance insights. If you do not specify a value, then Amazon RDS uses your default KMS key.

--performance-insights-retention-period 7 \

The duration to retain the performance insights data in days.

--deletion-protection \

Specifies that this DB instance has deletion protection enabled.

--max-allocated-storage 500 \

The upper limit in gibibytes (GiB) to which Amazon RDS can automatically scale the storage of the DB instance.

--region us-east-1

The region that the database will be deployed in.

Complete code in one unit

aws rds create-db-instance \
--db-name MYDB \
--db-instance-identifier mydb \
--allocated-storage 300 \
--db-instance-class db.m5.2xlarge \
--engine oracle-se2-cdb \
--master-username admin \
--master-user-password admin_pwd \
--vpc-security-group-ids "sg1" "sg2" \
--db-subnet-group-name “sub_net_1” \
--db-parameter-group-name default.oracle-se2-cdb-19 \
--backup-retention-period 2 \
--port 1521 \
--no-multi-az \
--engine-version 19.0.0.0.ru-2022-04.rur-2022-04.r1 \
--auto-minor-version-upgrade \
--license-model license-included \
--option-group-name my_option_group \
--character-set-name WE8ISO8859P1 \
--nchar-character-set-name AL16UTF16 \
--no-publicly-accessible \
--storage-type gp2 \
--storage-encrypted \
--kms-key-id  \
--copy-tags-to-snapshot \
--no-enable-iam-database-authentication \
--enable-performance-insights \
--performance-insights-kms-key-id  \
--performance-insights-retention-period 7 \
--deletion-protection \
--max-allocated-storage 500 \
--region us-east-1

Oracle Active Data Guard on AWS RDS

On Aug 23, 2022, AWS announced support for managed Oracle Data Guard Switchover and Automated Backups for replicas. See the announcement at:

AWS RDS for Oracle now supports managed Oracle Data Guard

Oracle DBAs are familiar with creating a physical standby in an on-premises environment and opening it in read-only (i.e. Active Data Guard) mode. AWS has now automated the process of creating the standby and the process of switching over in the context of an RDS (i.e. managed database service). All the manual tasks of taking an RMAN backup, transferring to the standby, restoring and recovering the database, setting up Data Guard Manager, etc. can be accomplished by a few clicks on the AWS console. This blog describes the steps required to create and test this scenario.

Prerequisites
1. The database instance size must be db.t3.large at the minimum.
2. The primary database should be using an option group that is exclusive to this database.
3. Ensure you have the required KMS key in both regions
4. Create a parameter group in the target region that exactly matches the parameter group from the source region

Creating a replica
After you have created a primary database in any region:
1. Select the primary database and under the actions button, select “Create replica”.
2. Choose multi-AZ on the target also
3. Click on the create replica button. Replica creation will take approximately 1.5 hours. This may vary based on the size of the database.
4. Change the backup option on the replica. The replica will be created with the backup retention period set to “0” as shown below. This is contrary to the standard practice of not backing up a data guard standby. However, for the replica to switch over to primary at some point in the future, the retention period should be greater than “0”.

Performing a switchover across regions
A switchover is a role reversal between the primary database and one of its standby databases. A switchover guarantees no data loss. This is typically done for planned maintenance of the primary system. During a switchover, the primary database transitions to a standby role, and the standby database transitions to the primary role.

1. Confirm there is no lag. This can be done via the console under the “Replication” section under the “Lag” column or via SQL

SELECT ARCH.thread#                        "Thread",
       ARCH.sequence#                      "Last Sequence Received",
       APPL.sequence#                      "Last Sequence Applied",
       ( ARCH.sequence# - APPL.sequence# ) "Difference"
FROM   (SELECT thread#,
               sequence#
        FROM   v$archived_log
        WHERE  ( thread#, first_time ) IN (SELECT thread#,
                                                  Max(first_time)
                                           FROM   v$archived_log
                                           GROUP  BY thread#)) ARCH,
       (SELECT thread#,
               sequence#
        FROM   v$log_history
        WHERE  ( thread#, first_time ) IN (SELECT thread#,
                                                  Max(first_time)
                                           FROM   v$log_history
                                           GROUP  BY thread#)) APPL
WHERE  ARCH.thread# = APPL.thread#
ORDER  BY 1;

2. Confirm that there is no pending maintenance on either the primary of the replica

3. Initiate the switchover from the replica (not the primary!).

4. Select the RDS instance and click on “Switch over replica”. And agree to the warning panel that is displayed

5. The former replica will display the message “The Switchover to the read replica started” under “Logs & events”, “Recent events” section. (Remember to sort the messages such that the latest messages are displayed.)

6. After the switchover is complete, both databases will display “The Switchover to the read replica finished successfully”

7. Applications will have to reconnect to the databases

8. You can confirm that the data is replicating from the new primary via the console or SQL

Test reboot with multi-AZ failover – Primary
Initiate a reboot with failover on the current primary.

1. Select the RDS instance, under “Actions”, and click on “Reboot”. Check the option “Reboot With Failover?”

2. After about 8 to 10 minutes, the database will complete the multi-AZ failover. Under “Logs & events”, “Recent events” section, look for messages “Multi-AZ instance failover started” and “Multi-AZ instance failover completed”.

A similar test can be done on the Replica.

Observations
1. After a switchover, both the databases may show as Read-Only. This can be ignored

2. You cannot temporarily shut down the database for the customary 7 days as it has a read replica

3. Ensure that there is no pending maintenance before issuing a switchover

4. In the event that you are using a modified character set on your primary, the console may show the replica’s character set as “AL32UTF8”. However, in the database, it matches the primary in the database. This can be confirmed with

select  property_value 
from    DATABASE_properties 
where   property_name='NLS_CHARACTERSET'
;

5. The console will display a message that the switchover is complete. However, this is displayed prematurely. The actual switchover will take additional time. Monitor the status of the two database instances under the “Status” column.

Copying a file from a Linux on-prem server to an AWS RDS Oracle instance

Today we had a requirement to copy a file from a Linux on-prem server to an AWS RDS Oracle instance. We did this using a DB link and utl_file utility. For the rest of this post, “ONP” will refer to the on-prem instance, and “ADS” will refer to the AWS RDS Oracle instance.

On the AWS RDS Oracle instance, create a DB link to the on-prem instance

create database link AWS_TO_ON_PREM CONNECT TO ONPUSER
IDENTIFIED BY SECRET-PASSWORD
USING '(DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = ONPREM.HOST.COM)(PORT = 1776))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = ONPDB)
    )
  )'
   ;

On the AWS RDS Oracle instance, create a directory to store the file

exec rdsadmin.rdsadmin_util.create_directory('ADS_DIRECTORY');

Check that the directory got created

Select * from dba_directories;

As AWS manages the path internally, the above SQL will return the actual directory path.

On the on-prem instance, create a directory pointing to the location of the file that you want to copy to AWS

CREATE OR REPLACE DIRECTORY
FOR_AWS AS '/onprem/host/file/location/file_for_aws.bin';

GRANT READ, WRITE ON DIRECTORY FOR_AWS TO ONPUSER;

Now to copy the file from the on-prem instance to the AWS instance. This is accomplished via the below PL/SQL which is to be executed while connected to the AWS RDS Oracle instance. In our case, the data was binary (raw) data

declare
  ONP    utl_file.file_type@AWS_TO_ON_PREM;
  ADS    utl_file.file_type;
  ldata  raw(32767);
begin
  ONP := utl_file.fopen@AWS_TO_ON_PREM(location=>'FOR_ADS', filename=>'file_for_aws.bin', open_mode=>'rb');
  ADS := utl_file.fopen(location=>'ADS_DIRECTORY', filename=>'file_for_aws.bin', open_mode=>'wb');
  begin
    loop
      begin
        utl_file.get_raw@AWS_TO_ON_PREM(ONP, ldata, 32767);
        utl_file.put_raw(ADS, ldata, true);
      exception
        when no_data_found then
          exit;
      end;
    end loop;
  end;
  utl_file.fclose@AWS_TO_ON_PREM (ONP);
  utl_file.fclose(ADS);  
exception
  when others then
    utl_file.fclose@AWS_TO_ON_PREM(ONP);
    utl_file.fclose(ADS);
    raise;
end;
/

After the above is complete, check that the file size in AWS matches the file size from on-prem with this sql

select    filesize, mtime 
from      table(rdsadmin.rds_file_util.listdir('ADS_DIRECTORY')) 
where     filename='file_for_aws.bin'
;

If you need to delete a file in AWS, you can use this command

exec utl_file.fremove('directory-name','file-name');

Importing data into an AWS Oracle RDS instance

The below MS Word Document contains the steps that I used to transfer data from an on-prem server/database into an Oracle RDS instance in the AWS Cloud.

AWS RDS Oracle Importing data