Recovering in the event a user issues an incorrect SQL in DB2 for z/OS

If a user inadvertently updates, inserts or deletes data in a table during the day and you need to recover the table to a prior point in time, you can recover to a checkpoint rather than an Image Copy or a quiesce point to reduce the amount of data loss. For example, if the user issued a “bad” query at 17:18 on March 29, 2018, you can print the checkpoint queue with DB2 utility DSNJU004. The output will look similar to:

                    CHECKPOINT QUEUE                
                18:40:10 MARCH 29, 2018             
TIME OF CHECKPOINT       18:30:04 MARCH 29, 2018    
BEGIN CHECKPOINT RBA            0000000023FDD4187405
END CHECKPOINT RBA              0000000023FDD4194000
END CHECKPOINT STCK             00D41A198B683F8E0A00
TIME OF CHECKPOINT       18:15:04 MARCH 29, 2018    
BEGIN CHECKPOINT RBA            0000000023FDCE6AA615
END CHECKPOINT RBA              0000000023FDCE6B73E7
END CHECKPOINT STCK             00D41A16311B02B53A00
TIME OF CHECKPOINT       18:02:44 MARCH 29, 2018    
BEGIN CHECKPOINT RBA            0000000023FDC976C806
END CHECKPOINT RBA              0000000023FDC9785E4F
END CHECKPOINT STCK             00D41A136F4ACE910800
TIME OF CHECKPOINT       17:47:44 MARCH 29, 2018    
BEGIN CHECKPOINT RBA            0000000023FDC3A02EF2
END CHECKPOINT RBA              0000000023FDC3A11000
END CHECKPOINT STCK             00D41A1014FBDC3B0800
TIME OF CHECKPOINT       17:32:44 MARCH 29, 2018    
BEGIN CHECKPOINT RBA            0000000023FDBDA9FBDC
END CHECKPOINT RBA              0000000023FDBDAAD967
END CHECKPOINT STCK             00D41A0CBAAD71430800
TIME OF CHECKPOINT       17:17:44 MARCH 29, 2018    
BEGIN CHECKPOINT RBA            0000000023FDB7EAA020
END CHECKPOINT RBA              0000000023FDB7EC4622
END CHECKPOINT STCK             00D41A096060B5B53000

There is a checkpoint at 17:17 on March 29, 2018 that may be a good recovery point.

The tablespace can be recovered with

//SYSIN    DD  *                        
  RECOVER  TABLESPACE proddb00.prodts33 
           TORBA      X'23FDB7EC4622'   
/*                                      

If you have an old table space with multiple tables, they will all be recovered to the RBA. Rebuild all the associated indices after the recovery completes.

Delete a backup file from RMAN

Yesterday I had a requirement to remove a backup file from the RMAN catalog because we did not want recover to it. After some “research” on Google and the RMAN documentation, I found the below method:

Connect to RMAN an issue command:

list backup;

This will list all the backups cataloged in RMAN. Find the one that you want to “uncatalog”. For example, I wanted to remove the below file:

BS Key  Type LV Size       Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ ---------------
27249   Full    3.75G      DISK        00:00:00     28-MAR-18      
        BP Key: 27249   Status: AVAILABLE  Compressed: NO  Tag: TAG20180328
        Piece Name: /backup/move/dean/PRODDBbtsuuqm6_1_1.dbc
  List of Datafiles in backup set 27249
  File LV Type Ckp SCN    Ckp Time  Name
  ---- -- ---- ---------- --------- ----
  120     Full 132807363910 29-MAR-18 +DG_ORA64/proddb/perfx_52.dbf

I then issued the below command in RMAN:

CHANGE BACKUPpiece 27249 uncatalog;

Response was

uncataloged backup piece
backup piece handle=/backup/move/dean/PRODDBbtsuuqm6_1_1.dbc RECID=27249 STAMP=972007418
Uncataloged 1 objects

The file was no longer part of the subsequent RECOVER and RESTORE commands.

ORA-31693 and ORA-06502 on an Import Datapump (IMPDP)

While importing data via an IMPDP job we encountered the error:

Import: Release 11.2.0.4.0 - Production on Tue Mar 20 13:59:30 2018
 
Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.
;;; 
Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, Oracle Label Security,
OLAP, Data Mining and Real Application Testing options
Master table "SYS"."SYS_IMPORT_FULL_01" successfully loaded/unloaded
Starting "SYS"."SYS_IMPORT_FULL_01":  "/******** AS SYSDBA" directory=dpump dumpfile=dpump_data.dmp 
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
. . 
ORA-31693: Table data object "SCHEMA_NAME"."TABLE_NAME" failed to load/unload and is being skipped due to error:
ORA-06502: PL/SQL: numeric or value error
. . 
. . 
. . 
. . 
. . 
Job "SYS"."SYS_IMPORT_FULL_01" completed with 1 error(s) at Tue Mar 20 13:59:32 2018 elapsed 0 00:00:02

Per Oracle Support, this was documented in “Bug 18665004 – ORA-31693/ORA-6502 from Datapump import (impdp) into an existing table having a LONG or LONG RAW column (Doc ID 18665004.8)”. This table did have a “LONG” column.

We were not in a position to install the fix suggested and instead decided to try the work around. We dropped the only index on the table and reran the import successfully. After the import, we recreated the index.

ORA-16047: DGID mismatch

I was building a new standby and after I enabled log_archive_dest_state_2 on the primary, the below error message was reported in the primary’s alert_log:

ALTER SYSTEM SET log_archive_dest_state_2='ENABLE' SCOPE=BOTH SID='*';
Tue Mar 20 00:20:08 2018
Errors in file /oracle/home/trace/PRODDB_tt01_438.trc:
ORA-16047: DGID mismatch between destination setting and target database
Error 16047 for archive log file 6 to 'STDBYDB'
Tue Mar 20 00:20:08 2018
Errors in file /oracle/home/trace/PRODDB_tt01_438.trc:
ORA-16047: DGID mismatch between destination setting and target database
Tue Mar 20 00:20:08 2018
Errors in file /oracle/home/trace/PRODDB_tt01_438.trc:
ORA-16047: DGID mismatch between destination setting and target database
Tue Mar 20 00:20:08 2018
Errors in file /oracle/home/trace/PRODDB_arc1_8368.trc:
ORA-16047: DGID mismatch between destination setting and target database
Tue Mar 20 00:20:08 2018
PING[ARC1]: Heartbeat failed to connect to standby 'STDBYDB'. Error is 16047.

After a few minutes of consternation, I realized that I had an incorrect parameter on the standby:

SQL> show parameter db_unique_name;
 
NAME                                 TYPE        VALUE
db_unique_name                       string      PRODDB

Basically, I had inadvertently set the db_unique_name to PRODDB on both the primary and standby database. I changed the standby as follows:

alter system set db_unique_name=STDBYDB scope=spfile sid='*';

And bounced the standby. This resolved the issue.

Testing Oracle’s Database Security Assessment Tool

I received an email suggesting that I run a “Security Assessment” by using Oracle’s Database Security Assessment Tool. The tool promises to

“analyzes database configurations, users, their entitlements, security policies and identifies where sensitive data resides to uncover security risks and improve the security posture of Oracle Databases within your organization.”

The tool can be downloaded for free from support.oracle.com reference Oracle Database Security Assessment Tool (DBSAT) (Doc ID 2138254.1).

After you, agree, a zip file will be downloaded to your laptop. I choose to unzip the file and then FTP the contents to a directory on my database server. (I suggest creating a separate directory.)

The contents of the directory will be:

[snoopy_test-TSTRAC2] ls -l
total 352
-rw-r--r-- 1 oracle dba  12433 Mar 15 18:11 dbsat
-rw-r--r-- 1 oracle dba  12579 Mar 15 18:11 dbsat.bat
drwxr-xr-x 5 oracle dba     40 Mar 15 18:15 Discover
-rw-r--r-- 1 oracle dba  28216 Mar 15 18:11 sat_analysis.py
-rw-r--r-- 1 oracle dba  43181 Mar 15 18:11 sat_collector.sql
-rw-r--r-- 1 oracle dba 247465 Mar 15 18:11 sat_reporter.py
drwxr-xr-x 2 oracle dba   4096 Mar 15 18:15 xlsxwriter

chmod the files to add execute permissions.

The tool has 3 components:
Collector collects data on the system being assessed
Reporter analyzes the above output
Discoverer generates the Database Sensitive Data Assessment Report

Running the Collector
Run the collector with the below command. In my example, I ran this as sysdba with my ORACLE_HOME set to the RAC instance that I intended to execute the DBSAT tool:

./dbsat collect -n "/ as sysdba" /TSTRAC2/dean/dbsat/dbsat_collect
                 |               |                   | 
No encryption ---+               |                   |
                                 |                   | 
location for the output ---------+                   |
                                                     |
name of report file ---------------------------------+

The output will be written to a file named dbsat_collect.json

Running the Reporter
The reporter needs Python version 2.6 or later. This can be verified with

[snoopy_test-TSTRAC2] python -V
Python 2.7.5

Run the report with the below command:

./dbsat report -a -n /TSTRAC2/dean/dbsat/dbsat_collect
 
Note: 
a. Paste the above command into notepad. I used word to write this blog entry and word sometimes replaces the “-“ to a non-unix compatible character.
b. The file name from the Collector is specified without the extension

The below files will be generated:
dbsat_collect_report.txt
dbsat_collect_report.html
dbsat_collect_report.xlsx

The html file is probably the nicest format to review and will contain a large amount of information classified by level of risk.

Using the Discoverer
To run this part of the tool, you have to set up the

/TSTRAC2/dean/dbsat/Discover/conf/sample_dbsat.config
 
File which contains the information on the database connection and the location of the
 
sensitive_en.ini
 
file which contains the patterns that will be used to query database column names, column comments etc. These patterns are specified as regex and logical operators as documented in Section 1.4.3.1.2 Pattern File Configuration of the Database Security Assessment Tool User Guide.

Based on the parameters specified in the config files, the tool will then scan the database looking for column, comments etc. that match the patterns specified.

Some useful scheduler related commands

If you want to list all the jobs that are in the scheduler:

SELECT  A.job_name
       ,a.last_start_date
       ,a.next_run_date
       ,a.enabled
       ,a.*
FROM    dba_scheduler_jobs A
WHERE   job_name like ('%DAILY_JOB%')
;

To find out if a job ran successfully (or not):

select   a.job_name
        ,a.log_date
        ,a.status
        ,a.*
from     dba_SCHEDULER_JOB_LOG a
WHERE    a.job_name like ('%DAILY_JOB%')
order by
         a.job_name
        ,a.log_date desc
;

If you need to see the complete DDL of the job

select dbms_metadata.get_ddl('PROCOBJ','SOME_APP_JOB','DEAN') from dual;

The parameters are:

PROCOBJ      = procedural object which cover all objects in the scheduler
SOME_APP_JOB = Name of the job
DEAN         = Schema that contains the job

It does not seem to be possible to get the DDL for a job in the SYS schema. For example:

select dbms_metadata.get_ddl('PROCOBJ','FILE_WATCHER','SYS') from dual;

Will result in:

ORA-31603: object "FILE_WATCHER" of type PROCOBJ not found in schema "SYS"
ORA-06512: at "SYS.DBMS_METADATA", line 6069
ORA-06512: at "SYS.DBMS_METADATA", line 8666
ORA-06512: at line 1
31603. 00000 -  "object \"%s\" of type %s not found in schema \"%s\""
*Cause:    The specified object was not found in the database.
*Action:   Correct the object specification and try the call again.

One way to get around this is to copy the job to another schema as follows:

exec dbms_scheduler.copy_job('SYS.FILE_WATCHER',’DEAN.FILE_WATCHER’);

The parameters in the above SQL are ‘old_schema.job_name’ to ‘new_schema.jobname’.

As an FYI, a copied job will be created in (i.e. copied into) the new schema as a DISABLED job.
After you get the DDL, you can drop the job with:

BEGIN
    DBMS_SCHEDULER.DROP_JOB(job_name => '"SYS"."FILE_WATCHER"',
                                defer => false,
                                force => true);
END;

As mentioned above, a copied job will be created in (i.e. copied into) the new schema as a DISABLED job. You can enable the job with:

exec dbms_scheduler.enable('DEAN.SOME_APP_JOB');

Stuck in a loop between restore and recover database

This week a colleague of mine was building a physical standby. The method she had chosen was to copy the backups from the primary server to the standby server and then restore the database followed by a recover. The database was on the larger size; approximately 4 TB. After the restore completed successfully, she issued the recover database command and Oracle responded that data file 114 needed to be restored. At this point in time I got involved and issued a restore datafile 114 command which completed successfully.

We then tried the recover database command and were informed that data file 177 needed to be restored. We issued a restore data file 177 command which also completed successfully. The next recover database command pointed to a different file that needed to be restored. This continued alternating between a series of six data files.

At this point in time we looked at the V$datafile table with the below SQL

select name, count(*) from v$datafile group by name having count(*) > 1;

The SQL stated indicated that there were three pairs of data files, each with two identically named filenames. This has occurred because the source database had many more ASM disk groups than the target (i.e. the standby database). As a result of this mismatch between the number of ASM disk groups, three pairs of files were acquiring identical names in the same ASM disk groups on the standby. Each time we restored one of the duplicate pairs the other file of the duplicate pair would then need to be restored causing us to continuously loop between restores and recovers.

We resolve the issues by issuing set new name for file ID into new location commands and then restoring and recovering the database.

Lost connection to your session during a restore database

If you are restoring a database in rman and you lose connection to your session due to a network blip, you can resume the restore after the network connection is reestablished.

For example, the restore is being run as follows:

[prodsrvr4-RACNODE4] rman target=/
Recovery Manager: Release 11.2.0.4.0 - Production on Wed Feb 14 10:01:36 2018
Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.
connected to target database: PRODDB (DBID=1057984567, not open)
RMAN> restore database;
channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00120 to +DG_PRD2/proddb/customer_52.dbf
channel ORA_DISK_1: reading from backup piece /dbbackup06/PRODDB/backup/PRODDB201802137lsr3u55_1_1.bus

and you lose network connectivity, you can log back on and restart (i.e. resume) as follows:

 [prodsrvr4-RACNODE4] rman target=/
 
Recovery Manager: Release 11.2.0.4.0 - Production on Wed Feb 14 10:01:36 2018
 
Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.
 
connected to target database: PRODDB (DBID=1057984567, not open)
 
RMAN> restore database ;
 
Starting restore at 14-FEB-18
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=177 instance=RACNODE4 device type=DISK
 
skipping datafile 1; already restored to file +DG_PRD2/proddb/system01.dbf
skipping datafile 140; already restored to file +DG_PRD1/proddb/order_65.dbf
.
.
.
channel ORA_DISK_1: starting datafile backup set restore
channel ORA_DISK_1: specifying datafile(s) to restore from backup set
channel ORA_DISK_1: restoring datafile 00069 to +DG_PRD2/proddb/customer_29.dbf
channel ORA_DISK_1: restoring datafile 00203 to +DG_PRD2/proddb/undotbs102.dbf
channel ORA_DISK_1: reading from backup piece /dbbackup05/PRODDB/backup/PRODDB201802136bsr3ph4_1_1.bus
channel ORA_DISK_1: piece handle=/dbbackup05/PRODDB/backup/PRODDB201802136bsr3ph4_1_1.bus tag=HOTBKUPFULL
channel ORA_DISK_1: restored backup piece 1
channel ORA_DISK_1: restore complete, elapsed time: 00:15:55

Running the SQL Health Check (SQLHC) report

If you have a SQL performance issue, Oracle Support may request you to run the SQL Health Check script. This is a pretty cool process, so I thought I would document it.

Download the SQLHC.zip archive from the Oracle support web site. The document id at the time of writing this blog post was Doc ID 1366133.1.

Extract the zip archive. I did this on my laptop. There are three files sqldx.sql, sqlhc.sql, and sqlhcxec.sql.

Place these three files in a directory on your server.

Find the SQL id of the problem SQL. This can be accomplished via AWR, ADDM, etc. In my case, the SQL was running for greater than 5 minutes so I was able to get the SQL id quite easily.

Run the SQLHC tool as follows:
Sign on as sysdba
Issue the command as follows:
START sqlhc.sql T 3t3ypamk8wu40

the first parameter “T” indicates that my site has the SQL tunning pack.
(Valid values are Tuning(T), Diagnostics (D) or None (N)). If site has both Tuning and Diagnostics licenses then specify T (Oracle Tuning pack includes Oracle Diagnostics)

the second parameter is the sql_id of the problem SQL statement

The process will display a lot of information on the terminal and write the output to a file named sqlhc_date_time_sqlid.zip. For example:
sqlhc_20180212_180710_3t3ypamk8wu40.zip

Attach this file to your open SR with Oracle.

ORA-01157 while activating a physical standby database

We had previously created a 2 node RAC physical standby database. Today, we were attempting to bring it up as a primary by issuing the commands:

startup mount;
ALTER DATABASE ACTIVATE PHYSICAL STANDBY DATABASE;

In the alert log we saw the error:

ORA-01157: cannot identify/lock data file 2 - see DBWR trace file
ORA-01110: data file 2: '+DG_OLD_SERVER/dbname/ts_name_data_file_01.dbf'
ORA-17503: ksfdopn:2 Failed to open file +DG_OLD_SERVER/dbname/ts_name_data_file_01.dbf
ORA-15001: diskgroup "+DG_OLD_SERVER" does not exist or is not mounted

The database was attempting to lock a file with the name of the old primary’s ASM file systems. I checked in v$datafile:

 
select    a.name
from      v$datafile a
order by  a.file#
;

and all the file names were prefixed by the name of the old primary’s ASM file systems. A check in ASM on the new server revealed that the files existed in ASM prefixed with the new server’s ASM file system names. I suspect that while building the standby the below command was not issued after the recover database:

switch datafile all; 

We brought up the database in mount mode and issued commands to rename the files from the old ASM file system to the new ASM file system:

ALTER DATABASE RENAME FILE '+DG_OLD_SERVER/dbname/ts_name_data_file_01.dbf'        TO '+DG_NEW_SERVER/dbname/ts_name_data_file_01.dbf';

After all the files were renamed, I checked in v$datafile to confirm the new names. After this we were able to activate the physical standby database as a primary.

Update – February 05, 2018
I built another standby and checked the names of the data files after the recover database and the file names were correctly changed as per the DB_FILE_NAME_CONVERT parameter and hence I cannot explain what went wrong in the original build. However, there is no need to issue the “switch datafile all” command.