Lost connection to your session during a restore database

If you are restoring a database in rman and you lose connection to your session due to a network blip, you can resume the restore after the network connection is reestablished.

For example, the restore is being run as follows:

[prodsrvr4-RACNODE4] rman target=/
Recovery Manager: Release 11.2.0.4.0 - Production on Wed Feb 14 10:01:36 2018
Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.
connected to target database: PRODDB (DBID=1057984567, not open)
RMAN> restore database;
channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00120 to +DG_PRD2/proddb/customer_52.dbf
channel ORA_DISK_1: reading from backup piece /dbbackup06/PRODDB/backup/PRODDB201802137lsr3u55_1_1.bus

and you lose network connectivity, you can log back on and restart (i.e. resume) as follows:

 [prodsrvr4-RACNODE4] rman target=/
 
Recovery Manager: Release 11.2.0.4.0 - Production on Wed Feb 14 10:01:36 2018
 
Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.
 
connected to target database: PRODDB (DBID=1057984567, not open)
 
RMAN> restore database ;
 
Starting restore at 14-FEB-18
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=177 instance=RACNODE4 device type=DISK
 
skipping datafile 1; already restored to file +DG_PRD2/proddb/system01.dbf
skipping datafile 140; already restored to file +DG_PRD1/proddb/order_65.dbf
.
.
.
channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00069 to +DG_PRD2/proddb/customer_29.dbf
channel ORA_DISK_1: restoring datafile 00203 to +DG_PRD2/proddb/undotbs102.dbf
channel ORA_DISK_1: reading from backup piece /dbbackup05/PRODDB/backup/PRODDB201802136bsr3ph4_1_1.bus
channel ORA_DISK_1: piece handle=/dbbackup05/PRODDB/backup/PRODDB201802136bsr3ph4_1_1.bus tag=HOTBKUPFULL
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:15:55

Running the SQL Health Check (SQLHC) report

If you have a SQL performance issue, Oracle Support may request you to run the SQL Health Check script. This is a pretty cool process, so I thought I would document it.

Download the SQLHC.zip archive from the Oracle support web site. The document id at the time of writing this blog post was Doc ID 1366133.1.

Extract the zip archive. I did this on my laptop. There are three files sqldx.sql, sqlhc.sql, and sqlhcxec.sql.

Place these three files in a directory on your server.

Find the SQL id of the problem SQL. This can be accomplished via AWR, ADDM, etc. In my case, the SQL was running for greater than 5 minutes so I was able to get the SQL id quite easily.

Run the SQLHC tool as follows:
Sign on as sysdba
Issue the command as follows:
START sqlhc.sql T 3t3ypamk8wu40

the first parameter “T” indicates that my site has the SQL tunning pack.
(Valid values are Tuning(T), Diagnostics (D) or None (N)). If site has both Tuning and Diagnostics licenses then specify T (Oracle Tuning pack includes Oracle Diagnostics)

the second parameter is the sql_id of the problem SQL statement

The process will display a lot of information on the terminal and write the output to a file named sqlhc_date_time_sqlid.zip. For example:
sqlhc_20180212_180710_3t3ypamk8wu40.zip

Attach this file to your open SR with Oracle.

ORA-01157 while activating a physical standby database

We had previously created a 2 node RAC physical standby database. Today, we were attempting to bring it up as a primary by issuing the commands:

startup mount;
ALTER DATABASE ACTIVATE PHYSICAL STANDBY DATABASE;

In the alert log we saw the error:

ORA-01157: cannot identify/lock data file 2 - see DBWR trace file
ORA-01110: data file 2: '+DG_OLD_SERVER/dbname/ts_name_data_file_01.dbf'
ORA-17503: ksfdopn:2 Failed to open file +DG_OLD_SERVER/dbname/ts_name_data_file_01.dbf
ORA-15001: diskgroup "+DG_OLD_SERVER" does not exist or is not mounted

The database was attempting to lock a file with the name of the old primary’s ASM file systems. I checked in v$datafile:

 
select    a.name
from      v$datafile a
order by  a.file#
;

and all the file names were prefixed by the name of the old primary’s ASM file systems. A check in ASM on the new server revealed that the files existed in ASM prefixed with the new server’s ASM file system names. I suspect that while building the standby the below command was not issued after the recover database:

switch datafile all; 

We brought up the database in mount mode and issued commands to rename the files from the old ASM file system to the new ASM file system:

ALTER DATABASE RENAME FILE '+DG_OLD_SERVER/dbname/ts_name_data_file_01.dbf'        TO '+DG_NEW_SERVER/dbname/ts_name_data_file_01.dbf';

After all the files were renamed, I checked in v$datafile to confirm the new names. After this we were able to activate the physical standby database as a primary.

Update – February 05, 2018
I built another standby and checked the names of the data files after the recover database and the file names were correctly changed as per the DB_FILE_NAME_CONVERT parameter and hence I cannot explain what went wrong in the original build. However, there is no need to issue the “switch datafile all” command.

Tracing TNSPING to identify where the failure is occurring (TNS-12535)

If you need to trace the TNSPING command, add the 2 lines below to the sqlnet.ora file:

TNSPING.TRACE_LEVEL=ADMIN
TNSPING.TRACE_DIRECTORY=/apps/opt/oracle/product/12.1.0.2/db_1/network/admin

Then run the tnsping command:

[oracle@server]$ tnsping DEANDR
 
TNS Ping Utility for Linux: Version 12.1.0.2.0 - Production on 30-JAN-2018 19:57:30
 
Copyright (c) 1997, 2014, Oracle.  All rights reserved.
 
Used parameter files:
oracle_home/network/admin/sqlnet.ora
 
 
Used TNSNAMES adapter to resolve the alias
Attempting to contact (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = server.domain.com)(PORT = 2340)) (CONNECT_DATA = (SERVER = DEDICATED) (SID = DEANDR)))
TNS-12535: TNS:operation timed out

In this case I had set the trace files to be written to the ORACLE_HOME/network/admin directory. A file named tnsping.trc will be created and will contain information as to the details of the trace command.

CRS failure – failure occurred at: sskgpcreates (ORA-27302, ORA-27301, ORA-27300, ORA-27154, CRS-5017)

I was trying to bring up a new RAC physical standby node and encountered the below error:

[racnode2-DEVDBTPN2] srvctl start database -d DEVDBTPN 
PRCR-1079 : Failed to start resource ora.devdbtpn.db
CRS-5017: The resource action "ora.devdbtpn.db start" encountered the following error: 
ORA-27154: post/wait create failed
ORA-27300: OS system dependent operation:semget failed with status: 28
ORA-27301: OS failure message: No space left on device
ORA-27302: failure occurred at: sskgpcreates
. For details refer to "(:CLSN00107:)" in "/oracle/base/diag/crs/racnode1/crs/trace/crsd_oraagent_oracle.trc".
 
CRS-2674: Start of 'ora.devdbtpn.db' on 'racnode1' failed
CRS-2632: There are no more servers to try to place resource 'ora.devdbtpn.db' on that would satisfy its placement policy

The OS information was

Linux racnode1 2.6.32-696.13.2.el6.x86_64 #1 SMP Fri Sep 22 12:32:14 EDT 2017 x86_64 x86_64 x86_64 GNU/Linux

There were many other instances running on this RAC node. The issue turned out to be a system parameter setting for the semaphores.

Edit the sysctl.conf file as follows:

vi /etc/sysctl.conf

and change this line

kernel.sem=250 32000 100 128

to this

kernel.sem=250 64000 100 256

Basically double the 2nd and 4th parameter. Then to make the changes go into effect issue the command

sysctl –p

Stop and start the RAC DB with srvctl commands.

Instructions to run the SQL Tuning Advisor

A brief write up on running the SQL Tuning Advisor. It seems that every time, I run an ADDM report I see output that includes “Run SQL Tuning Advisor on the SELECT statement with SQL_ID”

          Findings and Recommendations
          ----------------------------
 
Finding 1: Top SQL Statements
Impact is .22 active sessions, 95% of total activity.
-----------------------------------------------------
SQL statements consuming significant database time were found. These
statements offer a good opportunity for performance improvement.
 
   Recommendation 1: SQL Tuning
   Estimated benefit is .22 active sessions, 95% of total activity.
   ----------------------------------------------------------------
   Action
      Run SQL Tuning Advisor on the SELECT statement with SQL_ID
      "gbwan1836hj3c".
      Related Object
         SQL statement with SQL_ID gbwan1836hj3c.
         SELECT time_key + (:"SYS_B_00" / :"SYS_B_01") period,
         in_value,
         out_value
         FROM  (SELECT time_key,

Create an SQL Tuning task as follows:

DECLARE
  l_sql_tune_task_id  VARCHAR2(100);
BEGIN
  l_sql_tune_task_id := DBMS_SQLTUNE.create_tuning_task (
                          sql_id      => 'gbwan1836hj3c',
                          scope       => DBMS_SQLTUNE.scope_comprehensive,
                          time_limit  => 500,
                          task_name   => 'gbwan1836hj3c_tuning_task',
                          description => 'Tuning task1 for statement gbwan1836hj3c');
  DBMS_OUTPUT.put_line('l_sql_tune_task_id: ' || l_sql_tune_task_id);
END;
/

Run the tuning task

EXEC DBMS_SQLTUNE.execute_tuning_task(task_name => 'gbwan1836hj3c_tuning_task');

The above will take a couple of minutes. Depending on the complexity of the query, the time_limit may need to be increased.

After the task is complete, query the output:

set long 65536
set longchunksize 65536
set linesize 100
select dbms_sqltune.report_tuning_task('gbwan1836hj3c_tuning_task11') from dual;

I was using SQL Developer and the output will look a bit strange. Copy and paste it into Notepad to see it properly.

RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process

I was in the process of building a physical standby that had to be put on hold for a few days while other more pressing priorities were handled. I thought it would be sufficient to change the below parameter:

NAME                                 TYPE        VALUE
log_archive_dest_state_2             string      DEFER

However, I then noticed that the time taken by the archived log backup script had increased from a couple of minutes to over three hours. This was because the archived logs were no longer being deleted as RMAN was displaying the below message:

channel disk2: backup set complete, elapsed time: 00:01:35
channel disk2: deleting archived log(s)
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=/archlogs/1_3426_959714474.arc thread=1 sequence=3426
RMAN-08137: WARNING: archived log not deleted, needed for standby or upstream capture process
archived log file name=/archlogs/1_3427_959714474.arc thread=1 sequence=3427

In order to resolve this issue I had to remove the entry for the physical standby in the below parameter:

NAME                                 TYPE        VALUE
log_archive_dest_2                   string

After this was done, the next run of the archived log backup process completed in a few minutes and subsequent runs returned to the expected couple of minutes execution time.

Oracle 12c – Data guard standby fallen behind – logs files available on primary

I had a situation today where in a standby had stopped applying logs about 20 days ago and the monitoring was somehow ignored. As we still had the logs available on the primary database, I copied them to an external file system on the primary with this script:

#!/bin/ksh
for i in $(asmcmd ls +DG_ASM_PROD/PRODDB/ARCHIVELOG/2017_11_16); 
do
  asmcmd cp +DG_ASM_PROD/PRODDB/ARCHIVELOG/2017_11_16/$i /backup/dean
done
for i in $(asmcmd ls +DG_ASM_PROD/PRODDB/ARCHIVELOG/2017_11_17); 
do
  asmcmd cp +DG_ASM_PROD/PRODDB/ARCHIVELOG/2017_11_17/$i /backup/dean
done
.
.
.
.
for i in $(asmcmd ls +DG_ASM_PROD/PRODDB/ARCHIVELOG/2017_12_06); 
do
  asmcmd cp +DG_ASM_PROD/PRODDB/ARCHIVELOG/2017_12_06/$i /backup/dean
done

After all the files were copied from asm (+DG_ASM_PROD/PRODDB/ARCHIVELOG) to a non-asm file system (/backup/dean), I FTPed all the files by logging on to the standby and issuing commands:

cd /backup/PRODDR/dean
sftp oracle@prodsrvr.verizon.com
get /backup/dean/* .

Probably would have been faster to zip the files, but this was a quick and dirty solution and we have good network speed.

After the files have landed on the standby, I generated SQL statements to register them with the below awk command:

ls -l thread* | awk '{print "ALTER DATABASE REGISTER PHYSICAL LOGFILE '\''/backup/PRODDR/dean/"$9 "'\'';"}' > temp.sql

After the commands were ready in the temp.sql file, I stopped MRP with

alter database recover managed standby database cancel;

And registered all the logs by executing the above sql file

SQL> @temp.sql

After all the files were registered, I restarted mrp with

ALTER DATABASE RECOVER MANAGED STANDBY DATABASE  DISCONNECT FROM SESSION;

And waited for log apply to catch up.

Connected to an idle instance despite the fact that the database processes are running

For some odd reason, we were seeing a message indicating that the database was down:

[oracle@proddb1 trace]$ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Thu Nov 30 23:21:39 2017

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> 

All the Oracle processes were running

[oracle@proddb1 trace]$ ps -ef | grep -i mon
579      19405     1  0 Nov27 ?        00:00:21 ora_pmon_PRODDB1
579      19437     1  0 Nov27 ?        00:00:07 ora_smon_PRODDB1
579      19447     1  0 Nov27 ?        00:01:53 ora_mmon_PRODDB1
579      20369     1  0 Nov27 ?        00:00:02 ora_tmon_PRODDB1

Oracle SID, home, base etc were set correctly:

[oracle@proddb1 trace]$ echo $ORACLE_BASE;echo $ORACLE_HOME;echo $ORACLE_SID;hostname;uname -a
/apps/opt/oracle/product
/apps/opt/oracle/product/12.1.0.2/db_1
PRODDB1
proddb1.vzbi.com
Linux proddb1.vzbi.com 3.10.0-327.el7.x86_64 #1 SMP Thu Oct 29 17:29:29 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux

The one unusual thing was that the “*mon” processes were owned by userid 579 rather than “oracle”. Checked the /etc/passwd:

[oracle@proddb1]$ cat /etc/passwd | grep -i oracle
oracle:x:1579:122:Oracle Support:/home/oracle:/bin/bash

The Systems Administrator had moved the Oracle userid from 579 to 1579 to allow FTP/SCP etc. commands. This was done while Oracle was running. I killed the SMON process to shut down the database and brought it back up. However, the issue persisted. I had to remove the below files:

[oracle@proddb1 ~]$ ls -la /var/tmp/.oracle
total 8
drwxrwxrwt  2 root dba  4096 Nov 30 22:49 .
drwxrwxrwt. 5 root root 4096 Nov 27 16:18 ..
srwxrwxrwx  1  579 dba     0 Nov 22 19:55 s#13746.1
srwxrwxrwx  1  579 dba     0 Nov 22 19:55 s#13746.2
srwxrwxrwx  1  579 dba     0 Nov 30 22:49 s#18231.1
srwxrwxrwx  1  579 dba     0 Nov 30 22:49 s#18231.2
srwxrwxrwx  1  579 dba     0 Nov 10 19:42 s#20024.1
srwxrwxrwx  1  579 dba     0 Nov 10 19:42 s#20024.2
srwxrwxrwx  1  579 dba     0 Nov 22 18:26 s#23729.1
srwxrwxrwx  1  579 dba     0 Nov 22 18:26 s#23729.2
srwxrwxrwx  1  579 dba     0 Nov 22 18:16 s#3272.1
srwxrwxrwx  1  579 dba     0 Nov 22 18:16 s#3272.2
srwxrwxrwx  1  579 dba     0 Nov 22 21:10 s#8843.1
srwxrwxrwx  1  579 dba     0 Nov 22 21:10 s#8843.2
srwxrwxrwx  1  579 dba     0 Nov 30 22:49 sREGISTER_PROD

[oracle@proddb1 datapump]$ ls -l /dev/shm 
total 9348
-rwxrwx--- 1 579 dba   4096 Nov 27 21:11 JOXSHM_EXT_0_PRODDB1_458756
-rwxrwx--- 1 579 dba   4096 Nov 27 21:11 JOXSHM_EXT_100_PRODDB1_458756
-rwxrwx--- 1 579 dba   4096 Nov 27 21:11 JOXSHM_EXT_101_PRODDB1_458756
-rwxrwx--- 1 579 dba   4096 Nov 27 21:11 JOXSHM_EXT_102_PRODDB1_458756
-rwxrwx--- 1 579 dba   8192 Nov 27 21:11 JOXSHM_EXT_103_PRODDB1_458756
-rwxrwx--- 1 579 dba   4096 Nov 27 21:11 JOXSHM_EXT_104_PRODDB1_458756
-rwxrwx--- 1 579 dba   4096 Nov 27 21:11 JOXSHM_EXT_105_PRODDB1_458756

In addition to the above, the listener and other files/semaphores were owned by the “old” oracle id (579). Finally, we decided to bounce the server to get rid of all the old semaphores.

Losing index after disabling constraints

I had a table defined with a partitioned primary key. In order to load the data, I issued

ALTER TABLE SCHEMA.TABLE_NAME DISABLE CONSTRAINT PK_TABLE_NAME;

After this alter was issued, the index entry vanished from DBA_CONSTRAINTS, DBA_IND_PARTITIONS. When I enabled the index, the index was created as a non-partitioned index. In order to get past this, I had to change the alter to

ALTER TABLE SCHEMA.TABLE_NAME DISABLE CONSTRAINT PK_TABLE_NAME KEEP INDEX;