Oracle Active Data Guard on AWS RDS

On Aug 23, 2022, AWS announced support for managed Oracle Data Guard Switchover and Automated Backups for replicas. See the announcement at:

AWS RDS for Oracle now supports managed Oracle Data Guard

Oracle DBAs are familiar with creating a physical standby in an on-premises environment and opening it in read-only (i.e. Active Data Guard) mode. AWS has now automated the process of creating the standby and the process of switching over in the context of an RDS (i.e. managed database service). All the manual tasks of taking an RMAN backup, transferring to the standby, restoring and recovering the database, setting up Data Guard Manager, etc. can be accomplished by a few clicks on the AWS console. This blog describes the steps required to create and test this scenario.

Prerequisites
1. The database instance size must be db.t3.large at the minimum.
2. The primary database should be using an option group that is exclusive to this database.
3. Ensure you have the required KMS key in both regions
4. Create a parameter group in the target region that exactly matches the parameter group from the source region

Creating a replica
After you have created a primary database in any region:
1. Select the primary database and under the actions button, select “Create replica”.
2. Choose multi-AZ on the target also
3. Click on the create replica button. Replica creation will take approximately 1.5 hours. This may vary based on the size of the database.
4. Change the backup option on the replica. The replica will be created with the backup retention period set to “0” as shown below. This is contrary to the standard practice of not backing up a data guard standby. However, for the replica to switch over to primary at some point in the future, the retention period should be greater than “0”.

Performing a switchover across regions
A switchover is a role reversal between the primary database and one of its standby databases. A switchover guarantees no data loss. This is typically done for planned maintenance of the primary system. During a switchover, the primary database transitions to a standby role, and the standby database transitions to the primary role.

1. Confirm there is no lag. This can be done via the console under the “Replication” section under the “Lag” column or via SQL

SELECT ARCH.thread#                        "Thread",
       ARCH.sequence#                      "Last Sequence Received",
       APPL.sequence#                      "Last Sequence Applied",
       ( ARCH.sequence# - APPL.sequence# ) "Difference"
FROM   (SELECT thread#,
               sequence#
        FROM   v$archived_log
        WHERE  ( thread#, first_time ) IN (SELECT thread#,
                                                  Max(first_time)
                                           FROM   v$archived_log
                                           GROUP  BY thread#)) ARCH,
       (SELECT thread#,
               sequence#
        FROM   v$log_history
        WHERE  ( thread#, first_time ) IN (SELECT thread#,
                                                  Max(first_time)
                                           FROM   v$log_history
                                           GROUP  BY thread#)) APPL
WHERE  ARCH.thread# = APPL.thread#
ORDER  BY 1;

2. Confirm that there is no pending maintenance on either the primary of the replica

3. Initiate the switchover from the replica (not the primary!).

4. Select the RDS instance and click on “Switch over replica”. And agree to the warning panel that is displayed

5. The former replica will display the message “The Switchover to the read replica started” under “Logs & events”, “Recent events” section. (Remember to sort the messages such that the latest messages are displayed.)

6. After the switchover is complete, both databases will display “The Switchover to the read replica finished successfully”

7. Applications will have to reconnect to the databases

8. You can confirm that the data is replicating from the new primary via the console or SQL

Test reboot with multi-AZ failover – Primary
Initiate a reboot with failover on the current primary.

1. Select the RDS instance, under “Actions”, and click on “Reboot”. Check the option “Reboot With Failover?”

2. After about 8 to 10 minutes, the database will complete the multi-AZ failover. Under “Logs & events”, “Recent events” section, look for messages “Multi-AZ instance failover started” and “Multi-AZ instance failover completed”.

A similar test can be done on the Replica.

Observations
1. After a switchover, both the databases may show as Read-Only. This can be ignored

2. You cannot temporarily shut down the database for the customary 7 days as it has a read replica

3. Ensure that there is no pending maintenance before issuing a switchover

4. In the event that you are using a modified character set on your primary, the console may show the replica’s character set as “AL32UTF8”. However, in the database, it matches the primary in the database. This can be confirmed with

select  property_value 
from    DATABASE_properties 
where   property_name='NLS_CHARACTERSET'
;

5. The console will display a message that the switchover is complete. However, this is displayed prematurely. The actual switchover will take additional time. Monitor the status of the two database instances under the “Status” column.