The recovery I did not want to do…but ended up doing anyway! (Oracle Point in Time Recovery)

For many years now, I have been secretly dreading to ever have to recover one of the databases that I support as it has allocated space of 3.1 TB and used space of 1.3 TB. However, today the day finally arrived when a member of the application team ran a script that accidently deleted data from multiple tables. The impact of the deletes was made worse by cascading RI. Bottom line, we had to recover the database to a moment in time before the bad script was executed.

We checked the log history with the below SQL:

alter session set nls_date_format = 'dd-MON-yy hh24.mi.ss';

select * from V$LOG_HISTORY 
where first_time > to_date('15-OCT-18 18.00','dd-MON-yy hh24.mi')
order by first_time, thread#
;

The database was running on a 4 node RAC. We shut down the RAC with:

srvctl stop database –d PRODDB –o immediate

And confirmed that the database was down with:

srvctl status database –d PRODDB

When the database was down, started node 4 of the RAC up with:

STARTUP MOUNT;

And began the recovery with the below script (run as nohup as it would take about 4 hours to restore a database of this size)

#!/bin/ksh
PATH=$PATH:/oracle/home/product/bin
export ORACLE_SID=PRODDB4
export ORACLE_HOME=/oracle/home/product
rman <<EOF
connect target /
run
{
    ALLOCATE CHANNEL disk1 DEVICE TYPE DISK FORMAT '/backup02/PRODDB/backup';
    ALLOCATE CHANNEL disk2 DEVICE TYPE DISK FORMAT '/backup02/PRODDB/backup';
    ALLOCATE CHANNEL disk3 DEVICE TYPE DISK FORMAT '/backup03/PRODDB/backup';
    ALLOCATE CHANNEL disk4 DEVICE TYPE DISK FORMAT '/backup03/PRODDB/backup';
    SET UNTIL TIME "TO_DATE('15-OCT-2018 19:00:00','DD-MON-YYYY  HH24:MI:SS')"; 
    restore database;
    recover database;
    release channel disk1;
    release channel disk2;
    release channel disk3;
    release channel disk4;
}
 
exit;
EOF

The nohup.out will contain output such as:

channel disk3: restored backup piece 1
channel disk3: restore complete, elapsed time: 00:07:35
channel disk3: starting datafile backup set restore
channel disk3: specifying datafile(s) to restore from backup set
channel disk3: restoring datafile 00028 to +ASM/proddb/perfx_08.dbf
channel disk3: restoring datafile 00058 to +ASM/proddb/perfd_24.dbf
channel disk3: reading from backup piece /backup03/PRODDB/backup/PRODDB20181015cutfmu04_1_1.bus
channel disk2: piece handle=/backup02/PRODDB/backup/PRODDB20181015cstfmtrd_1_1.bus tag=HOTBKUPFULL
channel disk2: restored backup piece 1
channel disk2: restore complete, elapsed time: 00:07:45
channel disk2: starting datafile backup set restore
channel disk2: specifying datafile(s) to restore from backup set
channel disk2: restoring datafile 00005 to +ASM/proddb/prefd_20.dbf
channel disk2: restoring datafile 00025 to +ASM/proddb/perfx_05.dbf
channel disk2: reading from backup piece /backup03/PRODDB/backup/PRODDB20181015cvtfmu2q_1_1.bus
channel disk4: piece handle=/backup03/PRODDB/backup/PRODDB20181015cktfmtg5_1_1.bus tag=HOTBKUPFULL

You can also check the status with:

select inst_id, sid, round((elapsed_seconds/60),0) as elap, round((time_remaining/60),0) as remain, message 
from gv$session_longops 
where time_remaining > 0 
and target_desc = 'restore'
order by start_time asc
;

If you want to see what is occurring on each data file, you can comment out the “AND” as shown below:

select inst_id, sid, round((elapsed_seconds/60),0) as elap, round((time_remaining/60),0) as remain, message 
from gv$session_longops 
where time_remaining > 0 
--and target_desc = 'restore'
order by start_time asc
;

In the alert log, you should see messages like:

Tue Oct 16 02:19:39 2018
Incomplete Recovery applied until change 141794746611 time 10/15/2018 19:00:00
Media Recovery Complete (PRODDB4)
Completed: alter database recover logfile '+ASM/proddb/archivelog/4_20404_972607420.arc'
Tue Oct 16 02:22:36 2018

After the recover is complete, open the database with a resetlogs:

alter database open resetlogs

The alert log will show messages such as:

RESETLOGS after incomplete recovery UNTIL CHANGE 141794757937583
Archived Log entry 164482 added for thread 1 sequence 20270 ID 0x4d158c31 dest 1:
Archived Log entry 164483 added for thread 2 sequence 20003 ID 0x4d158c31 dest 1:
Archived Log entry 164484 added for thread 3 sequence 20643 ID 0x4d158c31 dest 1:
Archived Log entry 164485 added for thread 4 sequence 20420 ID 0x4d158c31 dest 1:
Resetting resetlogs activation ID 1293257777 (0x4d158c31)

The instance will be shut down and opened in read write mode. Since this is a RAC, I issued a

shut down immediate;

and restarted with SRVCTL

srvctl start database –d PRODDB

and monitored the alert logs to see a successful start of all nodes.

Important – Take a backup of the database as soon as possible as you no longer have a recovery point after the reset logs.

Python to the rescue – comparing text (fields) in a file

Python has increasingly become my go-to language for solving the annoying little problems that occur on a day to day basis. A good book for beginners is “Automate the Boring Stuff with Python: Practical Programming for Total Beginners” by Al Sweigart.

I have a process that runs multi streamed on a server. The input parameters and the log files containing the output are read from and written to different directories which are based on the stream name. Each log file is parsed at the end of the process to ensure that all the input parameters have been processed. I noticed that two streams were encountering issues as the stream name did not match the directory name in the input parameters. Rather than manually find the mismatch, I found the mismatch in a couple of minutes with the below python code:

#!/usr/bin/python
import pdb
import os
import sys
import subprocess
import commands
import time
import socket
import locale
locale.setlocale(locale.LC_ALL, 'en_US')
#
##-----------------------------------------------------------------------------
## Main Logic
##-----------------------------------------------------------------------------
#
all_parms=open('/opt/oracle/all_parms.txt','r')

for line in all_parms:
        #
        ## Filter out some non-data lines
        #
        if  '/dbbackup' in line:
                stream_name,fld_01,fld_02,fld_03,fld_04,temp,fld_05,fld_06=line.split("|")
                cmd_01,cmd_02,cmd_03,location=temp.split("/")
                if  stream_name != location:
                        print "Found a mismatch",stream_name,location
                        print line
                        print " "

The input file looks like:

Stream_name|field01|field02|field03|field04|/directory/name/stream_name|field05|field06

Oracle 12c – Copy password file in ASM on primary to ASM on physical standby

Prior to Oracle 12c, the password file was stored in ORACLE_HOME/dbs directory. In case of a RAC environment (without ACFS) we would copy this file to each node of the RAC. In the case of a RAC standby this file would also be copied to all the nodes of the standby RAC. 12c now stores this file in ASM which means we only need one copy of the file per RAC cluster.

If you display the database configuration with command

srvctl config database -d CUSTPR

one of the displayed parameters will be the location of the file in ASM:

Password file: +DG_ORA_DATA/CUSTPR/orapwCUSTPR

Despite the above file name, the actual physical file will be in a different location pointed to by a link such as:

ASMCMD> ls -l +DG_ORA_DATA/CUSTPR/orapwCUSTPR
Type      Redund  Striped  Time             Sys  Name
PASSWORD  UNPROT  COARSE   SEP 28 17:00:00  N    orapwCUSTPR => +DG_ORA_DATA/CUSTPR/PASSWORD/pwdCUSTPR.347.967084532

Storing this file in ASM adds a small complexity in moving the file to a standby RAC that houses the physical (data guard) standby as we can no longer just FTP the file directly. The file has to be copied from ASM to a regular unix file system, transferred to a destination file system and then put into ASM. This is accomplished with the ORAPWD command.

On the primary RAC, copy the file to a unix file system with the command

This command is issued from within ASMCMD

pwcopy +DG_ORA_DATA/CUSTPR/orapwCUSTPR /oracle_home/prod/12/db_1/dbs/orapwCUSTDR

Note that the destination file has been renamed to the file (database) name required on the standby; orapwCUSTPR changed to orapwCUSTDR.

After the file has been extracted from ASM, transfer it with any method such as SFTP, SCP etc. to a destination unix folder. After it is transferred, copy it to ASM with the below command

This command is issued from within ASMCMD

pwcopy /oracle_home/prod/12/db_1/dbs/ orapwCUSTDR +DG_UIIOMPDCLU33_ORA1/A7VZRPR/PASSWORD/ orapwCUSTDR

As a precautionary measure, I shutdown the standby RAC with the appropriate SRVCTL commands. After the above copy was completed, I restarted the standby and tested connectivity from the primary with

sqlplus sys@CUSTDR as sysdba