Oracle 12c – Data guard standby fallen behind – logs files available on primary

I had a situation today where in a standby had stopped applying logs about 20 days ago and the monitoring was somehow ignored. As we still had the logs available on the primary database, I copied them to an external file system on the primary with this script:

#!/bin/ksh
for i in $(asmcmd ls +DG_ASM_PROD/PRODDB/ARCHIVELOG/2017_11_16); 
do
  asmcmd cp +DG_ASM_PROD/PRODDB/ARCHIVELOG/2017_11_16/$i /backup/dean
done
for i in $(asmcmd ls +DG_ASM_PROD/PRODDB/ARCHIVELOG/2017_11_17); 
do
  asmcmd cp +DG_ASM_PROD/PRODDB/ARCHIVELOG/2017_11_17/$i /backup/dean
done
.
.
.
.
for i in $(asmcmd ls +DG_ASM_PROD/PRODDB/ARCHIVELOG/2017_12_06); 
do
  asmcmd cp +DG_ASM_PROD/PRODDB/ARCHIVELOG/2017_12_06/$i /backup/dean
done

After all the files were copied from asm (+DG_ASM_PROD/PRODDB/ARCHIVELOG) to a non-asm file system (/backup/dean), I FTPed all the files by logging on to the standby and issuing commands:

cd /backup/PRODDR/dean
sftp oracle@prodsrvr.verizon.com
get /backup/dean/* .

Probably would have been faster to zip the files, but this was a quick and dirty solution and we have good network speed.

After the files have landed on the standby, I generated SQL statements to register them with the below awk command:

ls -l thread* | awk '{print "ALTER DATABASE REGISTER PHYSICAL LOGFILE '\''/backup/PRODDR/dean/"$9 "'\'';"}' > temp.sql

After the commands were ready in the temp.sql file, I stopped MRP with

alter database recover managed standby database cancel;

And registered all the logs by executing the above sql file

SQL> @temp.sql

After all the files were registered, I restarted mrp with

ALTER DATABASE RECOVER MANAGED STANDBY DATABASE  DISCONNECT FROM SESSION;

And waited for log apply to catch up.

Odd listing of file when using an environment variable

I had a file similar to the one shown below

$ more dean.txt
2017-12-02 18:16:57 17279 [ERROR] ** TEST ERROR ** Error_code: 1007

When I ran the below command from the command line, it displays the correct information:

$ cat dean.txt | grep -i error_code                              
2017-12-02 18:16:57 17279 [ERROR] ** TEST ERROR ** Error_code: 1007

However, when I set up an environment variable as:

$ DEAN=`cat dean.txt | grep -i error_code`                       

And I run the command via the environment variable:

$ echo $DEAN                                                     

2017-12-02 18:16:57 17279 [ERROR] 0 bin COPYING COPYING.AGPLv3 COPYING.GPLv2 COPYING-jemalloc dean.txt docs include lib log_archives man my.cnf my.cnf.bak.20171121 my.cnf.dean.bak mysql-files mysql-test PATENTS README.md README.MySQL share support-files TEST ERROR 0 bin COPYING COPYING.AGPLv3 COPYING.GPLv2 COPYING-jemalloc dean.txt docs include lib log_archives man my.cnf my.cnf.bak.20171121 my.cnf.dean.bak mysql-files mysql-test PATENTS README.md README.MySQL share support-files Error_code: 1007

It lists all the files in the directory because it is interpreting the “*” in the file’s contents as a listing of all files.

Change the file contents to:

$ more dean.txt
2017-12-02 18:16:57 17279 [ERROR]  TEST ERROR  Error_code: 1007

And reset the environment variable:

$ DEAN=`cat dean.txt | grep -i error_code`

And execute the command

$ echo $DEAN                              
2017-12-02 18:16:57 17279 [ERROR] TEST ERROR Error_code: 1007

The workaround is to put the variable in quotes as shown below.

$ more dean.txt
2017-12-02 18:16:57 17279 [ERROR] ** TEST ERROR ** Error_code: 1007

$ DEAN=`cat dean.txt | grep -i error_code`                       

$ echo "$DEAN"
2017-12-02 18:16:57 17279 [ERROR] ** TEST ERROR ** Error_code: 1007

Connected to an idle instance despite the fact that the database processes are running

For some odd reason, we were seeing a message indicating that the database was down:

[oracle@proddb1 trace]$ sqlplus / as sysdba

SQL*Plus: Release 12.1.0.2.0 Production on Thu Nov 30 23:21:39 2017

Copyright (c) 1982, 2014, Oracle.  All rights reserved.

Connected to an idle instance.

SQL> 

All the Oracle processes were running

[oracle@proddb1 trace]$ ps -ef | grep -i mon
579      19405     1  0 Nov27 ?        00:00:21 ora_pmon_PRODDB1
579      19437     1  0 Nov27 ?        00:00:07 ora_smon_PRODDB1
579      19447     1  0 Nov27 ?        00:01:53 ora_mmon_PRODDB1
579      20369     1  0 Nov27 ?        00:00:02 ora_tmon_PRODDB1

Oracle SID, home, base etc were set correctly:

[oracle@proddb1 trace]$ echo $ORACLE_BASE;echo $ORACLE_HOME;echo $ORACLE_SID;hostname;uname -a
/apps/opt/oracle/product
/apps/opt/oracle/product/12.1.0.2/db_1
PRODDB1
proddb1.vzbi.com
Linux proddb1.vzbi.com 3.10.0-327.el7.x86_64 #1 SMP Thu Oct 29 17:29:29 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux

The one unusual thing was that the “*mon” processes were owned by userid 579 rather than “oracle”. Checked the /etc/passwd:

[oracle@proddb1]$ cat /etc/passwd | grep -i oracle
oracle:x:1579:122:Oracle Support:/home/oracle:/bin/bash

The Systems Administrator had moved the Oracle userid from 579 to 1579 to allow FTP/SCP etc. commands. This was done while Oracle was running. I killed the SMON process to shut down the database and brought it back up. However, the issue persisted. I had to remove the below files:

[oracle@proddb1 ~]$ ls -la /var/tmp/.oracle
total 8
drwxrwxrwt  2 root dba  4096 Nov 30 22:49 .
drwxrwxrwt. 5 root root 4096 Nov 27 16:18 ..
srwxrwxrwx  1  579 dba     0 Nov 22 19:55 s#13746.1
srwxrwxrwx  1  579 dba     0 Nov 22 19:55 s#13746.2
srwxrwxrwx  1  579 dba     0 Nov 30 22:49 s#18231.1
srwxrwxrwx  1  579 dba     0 Nov 30 22:49 s#18231.2
srwxrwxrwx  1  579 dba     0 Nov 10 19:42 s#20024.1
srwxrwxrwx  1  579 dba     0 Nov 10 19:42 s#20024.2
srwxrwxrwx  1  579 dba     0 Nov 22 18:26 s#23729.1
srwxrwxrwx  1  579 dba     0 Nov 22 18:26 s#23729.2
srwxrwxrwx  1  579 dba     0 Nov 22 18:16 s#3272.1
srwxrwxrwx  1  579 dba     0 Nov 22 18:16 s#3272.2
srwxrwxrwx  1  579 dba     0 Nov 22 21:10 s#8843.1
srwxrwxrwx  1  579 dba     0 Nov 22 21:10 s#8843.2
srwxrwxrwx  1  579 dba     0 Nov 30 22:49 sREGISTER_PROD

[oracle@proddb1 datapump]$ ls -l /dev/shm 
total 9348
-rwxrwx--- 1 579 dba   4096 Nov 27 21:11 JOXSHM_EXT_0_PRODDB1_458756
-rwxrwx--- 1 579 dba   4096 Nov 27 21:11 JOXSHM_EXT_100_PRODDB1_458756
-rwxrwx--- 1 579 dba   4096 Nov 27 21:11 JOXSHM_EXT_101_PRODDB1_458756
-rwxrwx--- 1 579 dba   4096 Nov 27 21:11 JOXSHM_EXT_102_PRODDB1_458756
-rwxrwx--- 1 579 dba   8192 Nov 27 21:11 JOXSHM_EXT_103_PRODDB1_458756
-rwxrwx--- 1 579 dba   4096 Nov 27 21:11 JOXSHM_EXT_104_PRODDB1_458756
-rwxrwx--- 1 579 dba   4096 Nov 27 21:11 JOXSHM_EXT_105_PRODDB1_458756

In addition to the above, the listener and other files/semaphores were owned by the “old” oracle id (579). Finally, we decided to bounce the server to get rid of all the old semaphores.

Losing index after disabling constraints

I had a table defined with a partitioned primary key. In order to load the data, I issued

ALTER TABLE SCHEMA.TABLE_NAME DISABLE CONSTRAINT PK_TABLE_NAME;

After this alter was issued, the index entry vanished from DBA_CONSTRAINTS, DBA_IND_PARTITIONS. When I enabled the index, the index was created as a non-partitioned index. In order to get past this, I had to change the alter to

ALTER TABLE SCHEMA.TABLE_NAME DISABLE CONSTRAINT PK_TABLE_NAME KEEP INDEX;

scp error – no matching cipher found

I was performing an scp from an older server to a newer server with the command:

scp local_file remote_user@remote_server.something.com:/remote_directory

and got an error message:

no matching cipher found: client 3des-cbc,blowfish-cbc,cast128-cbc server aes128-ctr,aes192-ctr,aes256-ctr,aes256-cbc
lost connection

I was ble to get around this by explicitly providing the cipher:

scp -o Ciphers=aes256-ctr local_file remote_user@remote_server.something.com:/remote_directory