Selectively recovering data in an Aurora PostgreSQL database

Using automated backups and snapshots we can restore an AWS RDS database
– to the most current point in time
– to a prior point in time
– use the restored database going forward
– or restore particular tables

In summary, we have multiple options depending on the nature and extent of the data lost.

In this example, I will demonstrate a method to restore certain rows in one table in the original database.

Create a test table

CREATE TABLE dc_schema.test_table (
                object_id numeric NOT NULL,
                "owner" varchar(128) NULL,
                object_name varchar(128) NULL,
                CONSTRAINT test_table_pk PRIMARY KEY (object_id)
);

and inserted some data. For demonstration purposes, let us assume that the rows with object_id between 30,000 and 40,000 are critical to the application. Further, assume that someone accidentally deleted this data with the following SQL:

select *
from   dc_schema.test_table daac 
where  object_id between 30000 and 40000
;

We now need this data restored as it is critical to the operation of the application.

On the AWS RDS Console page, select the DB cluster, click on “Actions” and then on “Restore to point in time”. On the next page, populate the custom date and time to a point before the data was deleted. Most of the options will be populated except for a few such as the DB instance identifier.

Click on the orange button at the bottom right of the page to begin the recovery. This will take some time to occur.

Some points to keep in mind with respect to the restored database:

– You can use the restored DB instance as soon as its status is available. The DB instance continues to load data in the background. This is known as lazy loading.

– If you access data that hasn’t been loaded yet, the DB instance immediately downloads the requested data from Amazon S3, and then continues loading the rest of the data in the background

– To help mitigate the effects of lazy loading on tables to which you require quick access, you can perform operations that involve full-table scans, such as SELECT *. This allows Amazon RDS to download all of the backed-up table data from S3.

The restore process will create a new instance. While the “restore” is occurring, you can continue to use the original database. To simulate this, I ran a Python program to add another 100,000 rows to the table from which the data was accidentally deleted. The situation of creating keys in the range of the data that was deleted has to be handled by the application staff. In this particular example, I am creating keys higher than the range of keys that were deleted to ensure that there is no primary key collision.

After the restore was completed, I connected to the restored database to check if my data is present. After this was confirmed, I switched back to the original database from where the critical data was deleted and created a link to the recovered database with the following commands:

Create an extension:

CREATE EXTENSION dblink;

Create a dblink:

SELECT dblink_connect
(
  'link_to_recovery’, 
  'dbname=clientdb 
   port=5432  
   host= red-recovery-cluster…….rds.amazonaws.com 
   user=dbadmin
   password=password_here'
);

Execute the following SQL to select the data from the recovered database and insert it into the original database:

insert
into    dc_schema.test_table
SELECT  * 
from    dblink
(
'link_to_recovery', 
'SELECT object_id, owner, object_name 
 FROM   dc_schema.test_table 
 where  object_id between 30000 and 40000'
)
AS 
p(object_id numeric, owner varchar(128), object_name varchar(128))
;

Assuming that you have recovered all the data that was accidentally deleted, you can drop the recovered database. I will do this via AWS CLI commands running in CloudShell.

First, we delete the instance created under the cluster with the command:

aws rds delete-db-instance \
  --db-instance-identifier red-recovery-cluster \
  --skip-final-snapshot \
  --delete-automated-backups

After the instance is deleted, I deleted the cluster with the following command:

aws rds delete-db-cluster \
  --db-cluster-identifier red-recovery-cluster \
  --skip-final-snapshot

Please note that the above commands delete the database along with automated backups and without a final snapshot.