The recovery I did not want to do…but ended up doing anyway! (Oracle Point in Time Recovery)

For many years now, I have been secretly dreading to ever have to recover one of the databases that I support as it has allocated space of 3.1 TB and used space of 1.3 TB. However, today the day finally arrived when a member of the application team ran a script that accidently deleted data from multiple tables. The impact of the deletes was made worse by cascading RI. Bottom line, we had to recover the database to a moment in time before the bad script was executed.

We checked the log history with the below SQL:

alter session set nls_date_format = 'dd-MON-yy hh24.mi.ss';

select * from V$LOG_HISTORY 
where first_time > to_date('15-OCT-18 18.00','dd-MON-yy hh24.mi')
order by first_time, thread#
;

The database was running on a 4 node RAC. We shut down the RAC with:

srvctl stop database –d PRODDB –o immediate

And confirmed that the database was down with:

srvctl status database –d PRODDB

When the database was down, started node 4 of the RAC up with:

STARTUP MOUNT;

And began the recovery with the below script (run as nohup as it would take about 4 hours to restore a database of this size)

#!/bin/ksh
PATH=$PATH:/oracle/home/product/bin
export ORACLE_SID=PRODDB4
export ORACLE_HOME=/oracle/home/product
rman <<EOF
connect target /
run
{
    ALLOCATE CHANNEL disk1 DEVICE TYPE DISK FORMAT '/backup02/PRODDB/backup';
    ALLOCATE CHANNEL disk2 DEVICE TYPE DISK FORMAT '/backup02/PRODDB/backup';
    ALLOCATE CHANNEL disk3 DEVICE TYPE DISK FORMAT '/backup03/PRODDB/backup';
    ALLOCATE CHANNEL disk4 DEVICE TYPE DISK FORMAT '/backup03/PRODDB/backup';
    SET UNTIL TIME "TO_DATE('15-OCT-2018 19:00:00','DD-MON-YYYY  HH24:MI:SS')"; 
    restore database;
    recover database;
    release channel disk1;
    release channel disk2;
    release channel disk3;
    release channel disk4;
}
 
exit;
EOF

The nohup.out will contain output such as:

channel disk3: restored backup piece 1
channel disk3: restore complete, elapsed time: 00:07:35
channel disk3: starting datafile backup set restore
channel disk3: specifying datafile(s) to restore from backup set
channel disk3: restoring datafile 00028 to +ASM/proddb/perfx_08.dbf
channel disk3: restoring datafile 00058 to +ASM/proddb/perfd_24.dbf
channel disk3: reading from backup piece /backup03/PRODDB/backup/PRODDB20181015cutfmu04_1_1.bus
channel disk2: piece handle=/backup02/PRODDB/backup/PRODDB20181015cstfmtrd_1_1.bus tag=HOTBKUPFULL
channel disk2: restored backup piece 1
channel disk2: restore complete, elapsed time: 00:07:45
channel disk2: starting datafile backup set restore
channel disk2: specifying datafile(s) to restore from backup set
channel disk2: restoring datafile 00005 to +ASM/proddb/prefd_20.dbf
channel disk2: restoring datafile 00025 to +ASM/proddb/perfx_05.dbf
channel disk2: reading from backup piece /backup03/PRODDB/backup/PRODDB20181015cvtfmu2q_1_1.bus
channel disk4: piece handle=/backup03/PRODDB/backup/PRODDB20181015cktfmtg5_1_1.bus tag=HOTBKUPFULL

You can also check the status with:

select inst_id, sid, round((elapsed_seconds/60),0) as elap, round((time_remaining/60),0) as remain, message 
from gv$session_longops 
where time_remaining > 0 
and target_desc = 'restore'
order by start_time asc
;

If you want to see what is occurring on each data file, you can comment out the “AND” as shown below:

select inst_id, sid, round((elapsed_seconds/60),0) as elap, round((time_remaining/60),0) as remain, message 
from gv$session_longops 
where time_remaining > 0 
--and target_desc = 'restore'
order by start_time asc
;

In the alert log, you should see messages like:

Tue Oct 16 02:19:39 2018
Incomplete Recovery applied until change 141794746611 time 10/15/2018 19:00:00
Media Recovery Complete (PRODDB4)
Completed: alter database recover logfile '+ASM/proddb/archivelog/4_20404_972607420.arc'
Tue Oct 16 02:22:36 2018

After the recover is complete, open the database with a resetlogs:

alter database open resetlogs

The alert log will show messages such as:

RESETLOGS after incomplete recovery UNTIL CHANGE 141794757937583
Archived Log entry 164482 added for thread 1 sequence 20270 ID 0x4d158c31 dest 1:
Archived Log entry 164483 added for thread 2 sequence 20003 ID 0x4d158c31 dest 1:
Archived Log entry 164484 added for thread 3 sequence 20643 ID 0x4d158c31 dest 1:
Archived Log entry 164485 added for thread 4 sequence 20420 ID 0x4d158c31 dest 1:
Resetting resetlogs activation ID 1293257777 (0x4d158c31)

The instance will be shut down and opened in read write mode. Since this is a RAC, I issued a

shut down immediate;

and restarted with SRVCTL

srvctl start database –d PRODDB

and monitored the alert logs to see a successful start of all nodes.

Important – Take a backup of the database as soon as possible as you no longer have a recovery point after the reset logs.

Author: Dean Capps

Database consultant at Amazon Web Services.