2018 – A DBA's blog

Cleaning up Oracle audit files resulting in “cannot execute [Argument list too long]”

Just a quick little Unix tip:

Over the Christmas/New Year’s holidays one of my application’s audit file destination started filling up causing the database to stop future logons and generate page outs etc. The root cause appeared to be a script running on an application server that for some unknown reason was attempting to logon and failing about 2000 times a minute. This generated about the same number of audit files filling the mount point and/or exceeding the inodes. The regular housekeeping script was unable to clean up the files as it was executing the command similar to:

rm *_ora_3*_20181231*.aud

This would result in the error:

-ksh: rm: /bin/rm: cannot execute [Argument list too long]

This being the holiday season, we were unable to find a contact to either stop or fix the application script. After some Google searches, I was able to rapidly delete the files with the below command:

find /app/oracle/admin/instance1/adump -maxdepth 1 -name "*.aud" -print0 | xargs -0 rm

Using Python to search for key words in text files

As I have previously blogged, Python is very useful in solving everyday tasks that you may encounter. This week, I was tasked with reviewing resumes of potential candidates. As some of the resumes were unexpectedly long (8 pages in one instance), I came up with a list of search terms and then used the bellow python code to identify the presence of these terms in the resumes. The script generates a skills matrix of candidate name and skill as a csv that can be opened in Excel and filtered as needed. Additionally, it counts the number of search terms found in each resume to rank candidates.

This process does not in any way replace reading the resumes. It is just a convenient way to generate a cross reference of skills and candidates.

import os
import string

searchstrings = ('install', 'patch', 'upgrade', 'migrate', 'shell', 'scripts', 'rac', 'asm', 'performance', 'data guard', 'golden gate', 'import', 'export', 'pump', 'loader', 'rman', 'recovery', 'tde', 'db2', 'mysql', 'sybase', 'mongo', 'teradata', 'postgres', 'postgresql', 'casandra', 'mssql', 'aws', 'jenkins', 'json', 'cloud', 'oci')
src_dict = ("C:/temp/Resumes/temp/") #Specify base directory
 
report_file_name = 'C:/temp/Resumes/temp.txt'
report_file = open(report_file_name, 'w')
#temp = 'Name,Skill' + '\n'
#report_file.write(temp)
                
for resumes in os.listdir(src_dict):
    #print (resumes, 'contains the below terms')
    files = os.path.join(src_dict, resumes)
    #print (files)
    strng = open(files)
    for line in strng.readlines():
        #print (line)
        for word in searchstrings:
            if  word in line.lower():
                #print ('    Found', word, 'at line-->', line.rstrip())
                temp = resumes + ',' + word + '\n'
                #print (resumes,',',word)
                report_file.write(temp)
                
report_file.close()

#
## Sort the data to remove the duplicates
#
duplicates_file = open(report_file_name, 'r').readlines()
content_set = sorted(set(duplicates_file))

unique_file_name = 'C:/temp/Resumes/report.csv'
unique_file = open(unique_file_name, 'w')
for line in content_set:
    unique_file.write(line)

unique_file.close()

#
## Count the number of skills that each person has
#
unique_file_name = 'C:/temp/Resumes/report.csv'
with open('C:/temp/Resumes/report.csv') as unique_file:
    line = unique_file.readline()
fields = line.split(",")
prev_name = fields[0]
unique_file.close()
#print (prev_name)
skill_ctr = 0

unique_file = open(unique_file_name, 'r')
for line in unique_file:
    fields = line.split(",")
    curr_name = fields[0]
    if  curr_name == prev_name:
        skill_ctr = skill_ctr + 1
    else:
        temp = prev_name + ' has ' + str(skill_ctr) + ' skills.'
        print (temp)
        prev_name = curr_name
        skill_ctr = 0
temp = curr_name + ' has ' + str(skill_ctr) + ' skills.'
print (temp)

unique_file.close()

Completed my AWS Certified DevOps Engineer – Professional

Very excited to have passed the AWS Certified DevOps Engineer – Professional exam.

The recovery I did not want to do…but ended up doing anyway! (Oracle Point in Time Recovery)

For many years now, I have been secretly dreading to ever have to recover one of the databases that I support as it has allocated space of 3.1 TB and used space of 1.3 TB. However, today the day finally arrived when a member of the application team ran a script that accidently deleted data from multiple tables. The impact of the deletes was made worse by cascading RI. Bottom line, we had to recover the database to a moment in time before the bad script was executed.

We checked the log history with the below SQL:

alter session set nls_date_format = 'dd-MON-yy hh24.mi.ss';

select * from V$LOG_HISTORY 
where first_time > to_date('15-OCT-18 18.00','dd-MON-yy hh24.mi')
order by first_time, thread#
;

The database was running on a 4 node RAC. We shut down the RAC with:

srvctl stop database –d PRODDB –o immediate

And confirmed that the database was down with:

srvctl status database –d PRODDB

When the database was down, started node 4 of the RAC up with:

STARTUP MOUNT;

And began the recovery with the below script (run as nohup as it would take about 4 hours to restore a database of this size)

#!/bin/ksh
PATH=$PATH:/oracle/home/product/bin
export ORACLE_SID=PRODDB4
export ORACLE_HOME=/oracle/home/product
rman <<EOF
connect target /
run
{
    ALLOCATE CHANNEL disk1 DEVICE TYPE DISK FORMAT '/backup02/PRODDB/backup';
    ALLOCATE CHANNEL disk2 DEVICE TYPE DISK FORMAT '/backup02/PRODDB/backup';
    ALLOCATE CHANNEL disk3 DEVICE TYPE DISK FORMAT '/backup03/PRODDB/backup';
    ALLOCATE CHANNEL disk4 DEVICE TYPE DISK FORMAT '/backup03/PRODDB/backup';
    SET UNTIL TIME "TO_DATE('15-OCT-2018 19:00:00','DD-MON-YYYY  HH24:MI:SS')"; 
    restore database;
    recover database;
    release channel disk1;
    release channel disk2;
    release channel disk3;
    release channel disk4;
}
 
exit;
EOF

The nohup.out will contain output such as:

channel disk3: restored backup piece 1
channel disk3: restore complete, elapsed time: 00:07:35
channel disk3: starting datafile backup set restore
channel disk3: specifying datafile(s) to restore from backup set
channel disk3: restoring datafile 00028 to +ASM/proddb/perfx_08.dbf
channel disk3: restoring datafile 00058 to +ASM/proddb/perfd_24.dbf
channel disk3: reading from backup piece /backup03/PRODDB/backup/PRODDB20181015cutfmu04_1_1.bus
channel disk2: piece handle=/backup02/PRODDB/backup/PRODDB20181015cstfmtrd_1_1.bus tag=HOTBKUPFULL
channel disk2: restored backup piece 1
channel disk2: restore complete, elapsed time: 00:07:45
channel disk2: starting datafile backup set restore
channel disk2: specifying datafile(s) to restore from backup set
channel disk2: restoring datafile 00005 to +ASM/proddb/prefd_20.dbf
channel disk2: restoring datafile 00025 to +ASM/proddb/perfx_05.dbf
channel disk2: reading from backup piece /backup03/PRODDB/backup/PRODDB20181015cvtfmu2q_1_1.bus
channel disk4: piece handle=/backup03/PRODDB/backup/PRODDB20181015cktfmtg5_1_1.bus tag=HOTBKUPFULL

You can also check the status with:

select inst_id, sid, round((elapsed_seconds/60),0) as elap, round((time_remaining/60),0) as remain, message 
from gv$session_longops 
where time_remaining > 0 
and target_desc = 'restore'
order by start_time asc
;

If you want to see what is occurring on each data file, you can comment out the “AND” as shown below:

select inst_id, sid, round((elapsed_seconds/60),0) as elap, round((time_remaining/60),0) as remain, message 
from gv$session_longops 
where time_remaining > 0 
--and target_desc = 'restore'
order by start_time asc
;

In the alert log, you should see messages like:

Tue Oct 16 02:19:39 2018
Incomplete Recovery applied until change 141794746611 time 10/15/2018 19:00:00
Media Recovery Complete (PRODDB4)
Completed: alter database recover logfile '+ASM/proddb/archivelog/4_20404_972607420.arc'
Tue Oct 16 02:22:36 2018

After the recover is complete, open the database with a resetlogs:

alter database open resetlogs

The alert log will show messages such as:

RESETLOGS after incomplete recovery UNTIL CHANGE 141794757937583
Archived Log entry 164482 added for thread 1 sequence 20270 ID 0x4d158c31 dest 1:
Archived Log entry 164483 added for thread 2 sequence 20003 ID 0x4d158c31 dest 1:
Archived Log entry 164484 added for thread 3 sequence 20643 ID 0x4d158c31 dest 1:
Archived Log entry 164485 added for thread 4 sequence 20420 ID 0x4d158c31 dest 1:
Resetting resetlogs activation ID 1293257777 (0x4d158c31)

The instance will be shut down and opened in read write mode. Since this is a RAC, I issued a

shut down immediate;

and restarted with SRVCTL

srvctl start database –d PRODDB

and monitored the alert logs to see a successful start of all nodes.

Important – Take a backup of the database as soon as possible as you no longer have a recovery point after the reset logs.

Python to the rescue – comparing text (fields) in a file

Python has increasingly become my go-to language for solving the annoying little problems that occur on a day to day basis. A good book for beginners is “Automate the Boring Stuff with Python: Practical Programming for Total Beginners” by Al Sweigart.

I have a process that runs multi streamed on a server. The input parameters and the log files containing the output are read from and written to different directories which are based on the stream name. Each log file is parsed at the end of the process to ensure that all the input parameters have been processed. I noticed that two streams were encountering issues as the stream name did not match the directory name in the input parameters. Rather than manually find the mismatch, I found the mismatch in a couple of minutes with the below python code:

#!/usr/bin/python
import pdb
import os
import sys
import subprocess
import commands
import time
import socket
import locale
locale.setlocale(locale.LC_ALL, 'en_US')
#
##-----------------------------------------------------------------------------
## Main Logic
##-----------------------------------------------------------------------------
#
all_parms=open('/opt/oracle/all_parms.txt','r')

for line in all_parms:
        #
        ## Filter out some non-data lines
        #
        if  '/dbbackup' in line:
                stream_name,fld_01,fld_02,fld_03,fld_04,temp,fld_05,fld_06=line.split("|")
                cmd_01,cmd_02,cmd_03,location=temp.split("/")
                if  stream_name != location:
                        print "Found a mismatch",stream_name,location
                        print line
                        print " "

The input file looks like:

Stream_name|field01|field02|field03|field04|/directory/name/stream_name|field05|field06

Oracle 12c – Copy password file in ASM on primary to ASM on physical standby

Prior to Oracle 12c, the password file was stored in ORACLE_HOME/dbs directory. In case of a RAC environment (without ACFS) we would copy this file to each node of the RAC. In the case of a RAC standby this file would also be copied to all the nodes of the standby RAC. 12c now stores this file in ASM which means we only need one copy of the file per RAC cluster.

If you display the database configuration with command

srvctl config database -d CUSTPR

one of the displayed parameters will be the location of the file in ASM:

Password file: +DG_ORA_DATA/CUSTPR/orapwCUSTPR

Despite the above file name, the actual physical file will be in a different location pointed to by a link such as:

ASMCMD> ls -l +DG_ORA_DATA/CUSTPR/orapwCUSTPR
Type      Redund  Striped  Time             Sys  Name
PASSWORD  UNPROT  COARSE   SEP 28 17:00:00  N    orapwCUSTPR => +DG_ORA_DATA/CUSTPR/PASSWORD/pwdCUSTPR.347.967084532

Storing this file in ASM adds a small complexity in moving the file to a standby RAC that houses the physical (data guard) standby as we can no longer just FTP the file directly. The file has to be copied from ASM to a regular unix file system, transferred to a destination file system and then put into ASM. This is accomplished with the ORAPWD command.

On the primary RAC, copy the file to a unix file system with the command

This command is issued from within ASMCMD

pwcopy +DG_ORA_DATA/CUSTPR/orapwCUSTPR /oracle_home/prod/12/db_1/dbs/orapwCUSTDR

Note that the destination file has been renamed to the file (database) name required on the standby; orapwCUSTPR changed to orapwCUSTDR.

After the file has been extracted from ASM, transfer it with any method such as SFTP, SCP etc. to a destination unix folder. After it is transferred, copy it to ASM with the below command

This command is issued from within ASMCMD

pwcopy /oracle_home/prod/12/db_1/dbs/ orapwCUSTDR +DG_UIIOMPDCLU33_ORA1/A7VZRPR/PASSWORD/ orapwCUSTDR

As a precautionary measure, I shutdown the standby RAC with the appropriate SRVCTL commands. After the above copy was completed, I restarted the standby and tested connectivity from the primary with

sqlplus sys@CUSTDR as sysdba

Unix send a mail with body and attachment

If you need to send an email from the command line or a script that contains both a mail body and an attachment you can use the below command:

mailx -s "Subject of this email " -r "sent-from@somewhere.com" -a file_name.txt sent-to@somewhere.com << EOF
This will be the first line of the body of the email.
2nd line.
3rd line.
EOF

Excessive growth in SYSAUX tablespace

The SYSAUX tablespace on one of our production RACs had grown to over 28 GB and was continuing to alarm for space. I ran the below report:

@?/rdbms/admin/awrinfo.sql

which indicated that the space was being used by “SM/AWR”:

########################################################
(I) AWR Snapshots Information
########################################################

*****************************************************
(1a) SYSAUX usage - Schema breakdown (dba_segments)
*****************************************************
|
| Total SYSAUX size                     29,967.0 MB ( AUTOEXTEND ON for 1 out of 2 files )
| Fixed limit                              499.0 MB
| Auto Extent limit                     32,768.0 MB
|
| Schema  SYS          occupies         29,684.9 MB (  99.1% )
| Schema  SYSTEM       occupies             94.2 MB (   0.3% )
|
********************************************************
(1b) SYSAUX occupants space usage (v$sysaux_occupants)
********************************************************
|
| Occupant Name        Schema Name               Space Usage
| -------------------- -------------------- ----------------
| SM/AWR               SYS                       28,805.8 MB
| SM/OPTSTAT           SYS                          522.6 MB
| SM/ADVISOR           SYS                          234.1 MB
| LOGMNR               SYSTEM                        92.8 MB

Since we were keeping the AWR data for 30 days, we tried purging a few days at a time to see if that would make a difference with the following SQL:

exec DBMS_STATS.PURGE_STATS(SYSDATE-29);

But that did not seem to reduce the space used either. Purging all the data did not seem to make any difference:

exec DBMS_STATS.PURGE_STATS(DBMS_STATS.PURGE_ALL)

I suspect that we were encountering the issue documented in “Excessive AWR Growth From Partitioned Objects Such as SYS.WRH$_EVENT_HISTOGRAM Causing Sysaux to Grow (Doc ID 1912201.1)”.

I then decided to look at the sizes of the individual objects with the SQL:

select sum(bytes/1024/1024) Mb, segment_name,segment_type from dba_segments
where  tablespace_name = 'SYSAUX'
group by segment_name,segment_type order by 1 desc
;

This indicated that objects such as WRH$_EVENT_HISTOGRAM, WRH$_LATCH etc. were using up a lot of space.

Before you proceed with the following, please understand that you will be losing AWR data that could be needed for performance tunning etc.

Based on the output from the above SQL, I started truncating the largest objects (tables) with SQL such as:

truncate table WRH$_EVENT_HISTOGRAM;
truncate table WRH$_LATCH;
truncate table WRH$_SQLSTAT;
truncate table WRH$_SYSSTAT;
truncate table WRH$_ACTIVE_SESSION_HISTORY;

After this completed the space utilized was significantly reduced:

(1a) SYSAUX usage - Schema breakdown (dba_segments)
*****************************************************
|
| Total SYSAUX size                     12,036.6 MB ( AUTOEXTEND ON for 1 out of 2 files )
| Fixed limit                              499.0 MB
| Auto Extent limit                     32,768.0 MB
|
| Schema  SYS          occupies         11,754.4 MB (  97.7% )
| Schema  SYSTEM       occupies             94.2 MB (   0.8% )
|
********************************************************
(1b) SYSAUX occupants space usage (v$sysaux_occupants)
********************************************************
|
| Occupant Name        Schema Name               Space Usage
| -------------------- -------------------- ----------------
| SM/AWR               SYS                       11,394.5 MB
| SM/ADVISOR           SYS                          234.1 MB
| LOGMNR               SYSTEM                        92.8 MB

ORA-28374: typed master key not found in wallet with Oracle 12c

A user reported the below error:

Select * from users;
ORA-28374: typed master key not found in wallet

gv$encryption_wallet indicated that the wallet was open and backed up on both nodes. It turned out that we had two copies of the wallet files in different locations. In order to resolve the issue, we
– stopped the RAC
– ensured that the wallet files matched and were in the correct location on both nodes
– updated the sqlnet.ora to indicate the wallet location on both nodes
And restarted the RAC. After this, the user was able to access the USERS table.

ORA-04091: table is mutating, trigger/function may not see it

A colleague of mine ran into the annoying mutating trigger issue and solved it as follows. The requirement was to update TABLE_A with data from TABLE_B. There is no relationship between the two tables other than they get inserted during the same unit of work.

Original trigger:

CREATE OR REPLACE TRIGGER TABLE_A_TRIGGER_01
before INSERT ON TABLE_A 
FOR EACH ROW 
begin 
update TABLE_A  set customer_id = 
(select customer_id from TABLE_B
where master_key = :new.master_key);
END; 
/

This resulted in error:

insert into TABLE_A select * from TABLE_A_TEST_DATA
            *
ERROR at line 1:
ORA-04091: table TABLE_A is mutating, trigger/function may not see it
ORA-06512: at "TABLE_A_TRIGGER_01", line 2
ORA-04088: error during execution of trigger 'TABLE_A_TRIGGER_01'

The workaround was to code it as follows:

CREATE OR REPLACE TRIGGER TABLE_A_TRIGGER_01
BEFORE INSERT ON TABLE_A
FOR EACH ROW
DECLARE
 V_CUSTOMER_ID number(9);
 V_MASTER_KEY varchar2(16);
begin
V_MASTER_KEY := :new.master_key;
select h.customer_id INTO V_CUSTOMER_ID from TABLE_B h
where h.master_key = V_MASTER_KEY;
:new.customer_id := V_CUSTOMER_ID;
END;
/