For many years now, I have been secretly dreading to ever have to recover one of the databases that I support as it has allocated space of 3.1 TB and used space of 1.3 TB. However, today the day finally arrived when a member of the application team ran a script that accidently deleted data from multiple tables. The impact of the deletes was made worse by cascading RI. Bottom line, we had to recover the database to a moment in time before the bad script was executed.
We checked the log history with the below SQL:
alter session set nls_date_format = 'dd-MON-yy hh24.mi.ss'; select * from V$LOG_HISTORY where first_time > to_date('15-OCT-18 18.00','dd-MON-yy hh24.mi') order by first_time, thread# ;
The database was running on a 4 node RAC. We shut down the RAC with:
srvctl stop database –d PRODDB –o immediate
And confirmed that the database was down with:
srvctl status database –d PRODDB
When the database was down, started node 4 of the RAC up with:
STARTUP MOUNT;
And began the recovery with the below script (run as nohup as it would take about 4 hours to restore a database of this size)
#!/bin/ksh PATH=$PATH:/oracle/home/product/bin export ORACLE_SID=PRODDB4 export ORACLE_HOME=/oracle/home/product rman <<EOF connect target / run { ALLOCATE CHANNEL disk1 DEVICE TYPE DISK FORMAT '/backup02/PRODDB/backup'; ALLOCATE CHANNEL disk2 DEVICE TYPE DISK FORMAT '/backup02/PRODDB/backup'; ALLOCATE CHANNEL disk3 DEVICE TYPE DISK FORMAT '/backup03/PRODDB/backup'; ALLOCATE CHANNEL disk4 DEVICE TYPE DISK FORMAT '/backup03/PRODDB/backup'; SET UNTIL TIME "TO_DATE('15-OCT-2018 19:00:00','DD-MON-YYYY HH24:MI:SS')"; restore database; recover database; release channel disk1; release channel disk2; release channel disk3; release channel disk4; } exit; EOF
The nohup.out will contain output such as:
channel disk3: restored backup piece 1 channel disk3: restore complete, elapsed time: 00:07:35 channel disk3: starting datafile backup set restore channel disk3: specifying datafile(s) to restore from backup set channel disk3: restoring datafile 00028 to +ASM/proddb/perfx_08.dbf channel disk3: restoring datafile 00058 to +ASM/proddb/perfd_24.dbf channel disk3: reading from backup piece /backup03/PRODDB/backup/PRODDB20181015cutfmu04_1_1.bus channel disk2: piece handle=/backup02/PRODDB/backup/PRODDB20181015cstfmtrd_1_1.bus tag=HOTBKUPFULL channel disk2: restored backup piece 1 channel disk2: restore complete, elapsed time: 00:07:45 channel disk2: starting datafile backup set restore channel disk2: specifying datafile(s) to restore from backup set channel disk2: restoring datafile 00005 to +ASM/proddb/prefd_20.dbf channel disk2: restoring datafile 00025 to +ASM/proddb/perfx_05.dbf channel disk2: reading from backup piece /backup03/PRODDB/backup/PRODDB20181015cvtfmu2q_1_1.bus channel disk4: piece handle=/backup03/PRODDB/backup/PRODDB20181015cktfmtg5_1_1.bus tag=HOTBKUPFULL
You can also check the status with:
select inst_id, sid, round((elapsed_seconds/60),0) as elap, round((time_remaining/60),0) as remain, message from gv$session_longops where time_remaining > 0 and target_desc = 'restore' order by start_time asc ;
If you want to see what is occurring on each data file, you can comment out the “AND” as shown below:
select inst_id, sid, round((elapsed_seconds/60),0) as elap, round((time_remaining/60),0) as remain, message from gv$session_longops where time_remaining > 0 --and target_desc = 'restore' order by start_time asc ;
In the alert log, you should see messages like:
Tue Oct 16 02:19:39 2018 Incomplete Recovery applied until change 141794746611 time 10/15/2018 19:00:00 Media Recovery Complete (PRODDB4) Completed: alter database recover logfile '+ASM/proddb/archivelog/4_20404_972607420.arc' Tue Oct 16 02:22:36 2018
After the recover is complete, open the database with a resetlogs:
alter database open resetlogs
The alert log will show messages such as:
RESETLOGS after incomplete recovery UNTIL CHANGE 141794757937583 Archived Log entry 164482 added for thread 1 sequence 20270 ID 0x4d158c31 dest 1: Archived Log entry 164483 added for thread 2 sequence 20003 ID 0x4d158c31 dest 1: Archived Log entry 164484 added for thread 3 sequence 20643 ID 0x4d158c31 dest 1: Archived Log entry 164485 added for thread 4 sequence 20420 ID 0x4d158c31 dest 1: Resetting resetlogs activation ID 1293257777 (0x4d158c31)
The instance will be shut down and opened in read write mode. Since this is a RAC, I issued a
shut down immediate;
and restarted with SRVCTL
srvctl start database –d PRODDB
and monitored the alert logs to see a successful start of all nodes.
Important – Take a backup of the database as soon as possible as you no longer have a recovery point after the reset logs.