DB2 z/OS – A DBA's blog

AUTH_COMPATIBILITY value of null causing UNLOAD utility jobs to fail

This week we went to a new fix level that activated the AUTH_COMPATIBILITY in macro DSN6SPRM. The documentation states:

The AUTH_COMPATIBILITY parameter specifies whether current-release of previous-release authorization rules are used for specific Db2 operations.
 
Null (blank)
Indicates that AUTH_COMPATIBILITY has no effect and that there is no authorization check override.
 
SELECT_FOR_UNLOAD
When this value is specified, the UNLOAD utility checks if the user has the SELECT or UNLOAD privilege on the target table. If this value is not specified, the UNLOAD utility checks if the user has the UNLOAD privilege on the target table.

In our case, the zparm was set to null, however, our unload (utility not DSNTIAUL) jobs began to fail. The auth ids executing the unload utilities had SELECT access but not UNLOAD access. The above documentation indicated that the null value should have allowed the jobs to run but we saw failures.

Since this is a dynamic change, we switched from null to SELECT_FOR_UNLOAD and the previous functionality was restored.

Left aligning DSNTEP2 (aka batch SPUFI) output

A short blog about a trivial but annoying item. By default, the DSNTEP2 (aka batch SPUFI) output is center aligned. As I prefer the output to be left aligned for readability and parsing of output, I code the JCL as described below.

By default, the below code will center align the output:

//SYSTSIN   DD *                                         
 DSN SYSTEM(PRN3)                                        
 RUN  PROGRAM(DSNTEP2) PLAN(DSNTEP2)
//*                                                      
//SYSIN    DD  *                                         
  SELECT    COUNT(*)                                     
  FROM      Q.OBJECT_DATA                                
  ;

And the output looks like:
(Actual display is further to the right.I moved it to the left for readability.)

PAGE    1                                                                     
***INPUT STATEMENT:                                                           
   SELECT    COUNT(*)                                                         
   FROM      Q.OBJECT_DATA                                                    
   ;                                                                          
                                                 +----------------+ 
                                                 |                | 
                                                 +----------------+ 
                                               1_|           4042 | 
                                                 +----------------+ 
SUCCESSFUL RETRIEVAL OF          1 ROW(S)

To left align, code as:

Default (i.e. center) alignment:
//SYSTSIN   DD *                                         
 DSN SYSTEM(PRN3)                                        
 RUN  PROGRAM(DSNTEP2) PLAN(DSNTEP2) PARMS('/ALIGN(LHS)')
//*                                                      
//SYSIN    DD  *                                         
  SELECT    COUNT(*)                                     
  FROM      Q.OBJECT_DATA                                
  ;                                                      

And the output looks like:
PAGE    1                                
***INPUT STATEMENT:                      
   SELECT    COUNT(*)                    
   FROM      Q.OBJECT_DATA               
   ;                                     
       +----------------+                
       |                |                
       +----------------+                
     1_|           4042 |                
       +----------------+                
SUCCESSFUL RETRIEVAL OF          1 ROW(S)

REORG and contention – Forcibly terminating threads?

One of the challenges in performing online REORGs (i.e. SHRLEVEL CHANGE) is successfully completing the switch phase of the process. This problem seems to compound in direct relation to the number of distributed threads that connect to the database objects. I often see a pattern wherein a distributed thread acquires a lock or claim on an object and then does not perform a commit to release the lock or claim. This results in the REORG failing with a page out to the DBA.

I initially coded the REROG jobs with

REORG TABLESPACE   LIST CLUSTTS                            
      LOG           NO                        
      KEEPDICTIONARY                          
      ROWFORMAT BRF                           
      SORTDATA                                
      SORTKEYS                                
      NOSYSREC                                
      COPYDDN      (LPCOPY)                   
      RECOVERYDDN  (RBCOPY)                   
      SHRLEVEL      CHANGE                    
      DRAIN_WAIT    10 RETRY 3  RETRY_DELAY 10
      DEADLINE      NONE                      
      MAXRO         60                        
      DRAIN         WRITERS   <---
      LONGLOG       CONTINUE                  
      DELAY         0                         
      TIMEOUT       TERM                      
      FASTSWITCH    YES                       
      UNLOAD        CONTINUE                  
      STATISTICS    TABLE INDEX ALL           
      UPDATE ALL

Assuming that this would be the least invasive approach. This was based on the below information from the Utility Guide:

WRITERS
Specifies that Db2 drains only the writers during the 
log phase after the MAXRO threshold is reached and 
then issues DRAIN ALL on entering the switch phase.

I changed the options to include DRAIN ALL FORCE ALL:

REORG TABLESPACE   LIST CLUSTTS                            
      LOG           NO                        
      KEEPDICTIONARY                          
      ROWFORMAT BRF                           
      SORTDATA                                
      SORTKEYS                                
      NOSYSREC                                
      COPYDDN      (LPCOPY)                   
      RECOVERYDDN  (RBCOPY)                   
      SHRLEVEL      CHANGE                    
      DRAIN_WAIT    10 RETRY 3  RETRY_DELAY 10
      DEADLINE      NONE                      
      MAXRO         60 
      DRAIN         ALL    <---                   
      FORCE         ALL    <---           
      LONGLOG       CONTINUE                  
      DELAY         0                         
      TIMEOUT       TERM                      
      FASTSWITCH    YES                       
      UNLOAD        CONTINUE                  
      STATISTICS    TABLE INDEX ALL           
      UPDATE ALL

The results were as follow:

If the DDF thread had only performed a select without a commit, it was cancelled with the below message in the DB2 master address space:

DSNL027I  -PR01 SERVER DISTRIBUTED AGENT WITH  828                   
           LUWID=JF0C80D3.DC4E.D8C6D35F5B78=129996                   
                                                                     
THREAD-INFO=ABC123Y:67.19.138.241:ABC123y:db2jcc_application:DYNAMIC:
5719:*:<::67.19.138.241.55378.D8C6D35F5B78>                          
           RECEIVED ABEND=04E                                        
           FOR REASON=00E50013

If the DDF thread had performed an update without a commit, it was not cancelled. Instead, the REORG fails with the below message in the job output:

DSNURDRN - RESOURCE NOT AVAILABLE, REASON=X'00C200EA', ON 
DBNAME.TSNAME PROHIBITS PROCESSING

I would have preferred that the DRAIN ALL FORCE ALL options would have cancelled the thread in both cases.

STROBing a transaction

In the last few weeks, we have been challenged with a system that was supposed to process orders was not processing them in a timely fashion. TMON indicated that the process was sitting in an IC wait suggesting that the delay was intentional. However, the concern was that the process was waiting on some unknown resource such as the database. The process consisted of a CICS transaction that called a number of COBOL programs and accessed an ADABAS database.

In an attempt to identify the actual line of code that was causing the delay, we turned on STROBE. Since this is a CICS transaction, you have to run an “ADD ACTIVE” measurement as the CICS region is an executing job. Set the job name to the CICS region, the session duration to 10 minutes, and the target sample size to 20,000 (Change these numbers as required for your context). In addition, select the “Data Collectors” option with a “Y”. On the Data Collectors panel, select the CICS+ option with a “Y”.

On the CICS Data Collector Options, select “Collect Region Data” with a “Y” and provide the name of the transaction in the “Detail by Transaction” section of the panel.

At this point in time, the measurement should be active. Monitor the measurement of completion and then generate a profile in the foreground. You can do this in batch but it is a relatively trivial operation so it can be performed quickly in the foreground.

After the profile is generated, the transaction summary (#TUS) report indicated that 599 seconds of the measurement were attributable to one transaction. Further, all of this time was attribute to a single program. STROBE was able to provide us with the actual CICS command “EXEC CICS DELAY” and the offset of the compile a listing. We were able to take this offset error look into the compile listing and identify that the line of command that was generating the delay was an

EXEC CICS
     DELAY FOR SECONDS (number_of_seconds>
END-EXEC

This proved that the delay in the process was intentional and not the result of a database or other resource contention.

Identify the current version/level of DB2 z/OS via an SQL statement

After the database has been upgraded to DB2 12, you can use the below SQLs to identify the current level:

select catalog_level from sysibm.sysdummy1;                   
---------+---------+---------+---------+---------+---------+--
CATALOG_LEVEL                                                 
---------+---------+---------+---------+---------+---------+--
V12R1M500                                                     
DSNE610I NUMBER OF ROWS DISPLAYED IS 1                        
DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 100   
---------+---------+---------+---------+---------+---------+--

Another option:

SELECT GETVARIABLE('SYSIBM.VERSION') FROM SYSIBM.SYSDUMMY1    
---------+---------+---------+---------+---------+---------+--
                                                              
---------+---------+---------+---------+---------+---------+--
DSN12010                                                      
DSNE610I NUMBER OF ROWS DISPLAYED IS 1                        
DSNE616I STATEMENT EXECUTION WAS SUCCESSFUL, SQLCODE IS 100

REXX RC=1 SQLSTATE=01506 on DSNREXX PREPARE statement after migrating to DB2 12

A colleague had a REXX process in place that had been working with DB2 11. After the upgrade to DB2 12, the REXX exec started failing with RC=1 on a PREPARE statement. The SQL statement being prepared contained logic to generate the boundaries used in the predicates referencing a time stamp column in a table:

  FROM creator.table_name       
WHERE TIME_STAMP_COL BETWEEN             
      'EndTS' - 2 HOURS-01 MONTHS  
  AND 'EndTS' + 2 HOURS-01 MONTHS  
   OR TIME_STAMP_COL BETWEEN             
      'EndTS' - 2 HOURS-02 MONTHS  
  AND 'EndTS' + 2 HOURS-02 MONTHS  
   OR TIME_STAMP_COL BETWEEN             
      'EndTS' - 2 HOURS-03 MONTHS  
  AND 'EndTS' + 2 HOURS-03 MONTHS  
   OR TIME_STAMP_COL BETWEEN             
      'EndTS' - 2 HOURS-04 MONTHS  
  AND 'EndTS' + 2 HOURS-04 MONTHS  
GROUP BY DATE(TIME_STAMP_COL)

The REXX code PREPAREs this statement with

address DSNREXX "EXECSQL PREPARE S40 FROM :SQL"

In DB2 11, this would return RC=0. After the upgrade to DB2 12, this started returning RC=1. The SQLCA contained:

SQLCODE = 0
SQLERRD.1 = 0
SQLERRD.2 = 0
SQLERRD.3 = 999
SQLERRD.4 = 1142028335
SQLERRD.5 = 0
SQLERRD.6 = 0
SQLWARN.1 =
SQLWARN.2 =
SQLWARN.3 =
SQLWARN.4 = 
SQLWARN.5 =           
SQLWARN.6 = W
SQLWARN.7 =  
SQLWARN.8 =
SQLWARN.9 =
SQLWARN.10 =          
SQLERRP = DSN
SQLSTATE = 01506 <---
SQLERRMC =

Per IBM

01506 = An adjustment was made to a DATE or TIMESTAMP value to correct an invalid date resulting from an arithmetic operation.

The issue was caused because the SQL would generate an invalid date for February. In the past, the PREPARE would have ignored this.

Recovering a tablespaces with a LOB Tablespace/Auxiliary table

Due to a miscommunication with the development team, the DBA team accidentally refreshed the data in an environment that was currently being used for a test cycle. The users requested that we recover the database back to the most recent copy which we had taken earlier that morning. I created a REXX routine to generate the approximately 600 recover jobs that were needed to perform this as fast as possible. Some of my jobs failed because the base table had a LOB column that was pointing to an auxiliary table in a LOB TABLESPACE.

The table structure was defined as follows (DDL has been edited to remove options not relevant):

     CREATE TABLE                                             
       PRODA.MY_TABLE_01                                      
        (                                                     
        ACCOUNT_NAME   CHAR(50) NOT NULL WITH DEFAULT          
         FOR SBCS DATA                                        
       ,ACCOUNT_TXT   CLOB(1 M) NOT NULL WITH DEFAULT          
         FOR SBCS DATA                                        
        )                                                     
         IN PRODAB00.TSPACE57  
         CCSID EBCDIC                                         
         NOT VOLATILE                                         
         APPEND NO                                                                           
     ;                                
     CREATE                           
       LOB TABLESPACE TSPACE58        
         IN DPDNBB00                  
         USING STOGROUP PRODASMS      
         PRIQTY -1                    
         SECQTY -1                    
         ERASE NO                         
         GBPCACHE SYSTEM                  
         NOT LOGGED                       
         DSSIZE 4 G                       
         BUFFERPOOL BP32K1                
         LOCKSIZE ANY                     
         LOCKMAX SYSTEM                   
         CLOSE NO                         
     ;                                    
     CREATE AUXILIARY TABLE               
         PRODA.MY_TABLE_01_AUX_01         
         IN PRODAB00.TSPACE58             
         STORES PRODA.MY_TABLE_01         
         APPEND NO                        
         COLUMN ACCOUNT_TXT                
         PART 1                           
     ;

The initial recover was coded as follows:

//SYSIN    DD  *                                   
  RECOVER  TABLESPACE  PRODAB00.TSPACE57           
           TORBA       X'0000000015D79FB6B7E0'     
/*

This generated the error:

DSNUGUTC -  RECOVER TABLESPACE PRODAB00.TSPACE57 TORBA X'0000000015D79FB6B7E0'
DSNU1316I -DBXC 102 15:41:03.83 DSNUCAIN - THE FOLLOWING TABLESPACES ARE MISSING FROM THE RECOVERY LIST
PRODAB00.TSPACE58

I changed the JCL as follows:

//SYSIN    DD  *                                  
  RECOVER  TABLESPACE  PRODAB00.TSPACE57          
           TABLESPACE  PRODAB00.TSPACE58          
           TORBA       X'0000000015D79FB6B7E0'    
/*

And the recover completed:

DSNUCBMD - RECOVER TABLESPACE PRODAB00.TSPACE57   START
DSNUCBAL - THE IMAGE COPY DATA SET HLQ.PRODAB00.TSPACE57.NVE.F1010801 WITH DATE=20180411 AND TIME=071133 IS PARTICIPATING IN RECOVERY OF TABLESPACE PRODAB00.TSPACE57

DSNUCBMD - RECOVER TABLESPACE PRODAB00.TSPACE58   START                    
DSNUCBAL - THE IMAGE COPY DATA SET HLQ.PRODAB00.TSPACE58.NVE.F1010801 WITH DATE=20180411 AND TIME=071133 IS PARTICIPATING IN RECOVERY OF TABLESPACE PRODAB00.TSPACE58

Recovering in the event a user issues an incorrect SQL in DB2 for z/OS

If a user inadvertently updates, inserts or deletes data in a table during the day and you need to recover the table to a prior point in time, you can recover to a checkpoint rather than an Image Copy or a quiesce point to reduce the amount of data loss. For example, if the user issued a “bad” query at 17:18 on March 29, 2018, you can print the checkpoint queue with DB2 utility DSNJU004. The output will look similar to:

                    CHECKPOINT QUEUE                
                18:40:10 MARCH 29, 2018             
TIME OF CHECKPOINT       18:30:04 MARCH 29, 2018    
BEGIN CHECKPOINT RBA            0000000023FDD4187405
END CHECKPOINT RBA              0000000023FDD4194000
END CHECKPOINT STCK             00D41A198B683F8E0A00
TIME OF CHECKPOINT       18:15:04 MARCH 29, 2018    
BEGIN CHECKPOINT RBA            0000000023FDCE6AA615
END CHECKPOINT RBA              0000000023FDCE6B73E7
END CHECKPOINT STCK             00D41A16311B02B53A00
TIME OF CHECKPOINT       18:02:44 MARCH 29, 2018    
BEGIN CHECKPOINT RBA            0000000023FDC976C806
END CHECKPOINT RBA              0000000023FDC9785E4F
END CHECKPOINT STCK             00D41A136F4ACE910800
TIME OF CHECKPOINT       17:47:44 MARCH 29, 2018    
BEGIN CHECKPOINT RBA            0000000023FDC3A02EF2
END CHECKPOINT RBA              0000000023FDC3A11000
END CHECKPOINT STCK             00D41A1014FBDC3B0800
TIME OF CHECKPOINT       17:32:44 MARCH 29, 2018    
BEGIN CHECKPOINT RBA            0000000023FDBDA9FBDC
END CHECKPOINT RBA              0000000023FDBDAAD967
END CHECKPOINT STCK             00D41A0CBAAD71430800
TIME OF CHECKPOINT       17:17:44 MARCH 29, 2018    
BEGIN CHECKPOINT RBA            0000000023FDB7EAA020
END CHECKPOINT RBA              0000000023FDB7EC4622
END CHECKPOINT STCK             00D41A096060B5B53000

There is a checkpoint at 17:17 on March 29, 2018 that may be a good recovery point.

The tablespace can be recovered with

//SYSIN    DD  *                        
  RECOVER  TABLESPACE proddb00.prodts33 
           TORBA      X'23FDB7EC4622'   
/*

If you have an old table space with multiple tables, they will all be recovered to the RBA. Rebuild all the associated indices after the recovery completes.

Refreshing a database from one LPAR/Subsystem to another

I am often tasked with cloning a production DB2 database from one LPAR/Subsystem into another LPAR/Subsystem. This has to be accomplished with minimum impact to the production database. In order to accomplish this, I created a process based on REXX generating the required JCL. In summary, my approach is:

Generate a list of the latest image copy files for the production database via SQL/SPUFI

A REXX routine reads in the above list and generates one unload JCL per table space (we have 1 table per table space) that
– unloads the Image copy
– edits the punch file to change the load parameters
– submits MFT processes for the punch and data
– submits the respective load job on the target

The REXX also creates the required load JCL that has logic to
– display the status of the database before and after the load
– Load the data
– Repair any check pending
– Perform an image copy (There is a requirement to recover the table multiple times during the testing cycle)

The PDS containing the load JCL is copied to the target before starting any of the unloads. As each unload finishes, it starts the transfer process about the data and punch files. After the transfer of the files is complete the process automatically starts the load.

The process also contains logic to throttle the unload and load processes via the use of jes agents. LOB tables are handle via cursor loads.

Using this process I am able to copy approximately 600 tables containing approximately 2.25 billion rows of data (about 1 TB compressed) in about 10 hours without impacting the production system.

Finding the row that returned the -803

One of our users created a somewhat complex update statement that was intended to update about 1600 rows in a table. The update statement was coded to eliminate the possibility of duplicate rows, however we continued to encounter -803 responses.

SQLCODE = -803, ERROR:  AN INSERTED OR UPDATED VALUE IS INVALID       
BECAUSE INDEX IN INDEX SPACE UNQNDX1 CONSTRAINS COLUMNS OF THE TABLE 
SO NO TWO ROWS CAN CONTAIN DUPLICATE VALUES IN THOSE COLUMNS.         
RID OF EXISTING ROW IS X'00000FC90F'.

We were somewhat at a loss to identify which row was causing this error. The below SQL was able to identify the row that was causing the issue. This also identified the logic error in the update SQL.

SELECT  RID(owner.table_name), owner.table_name.*  
  FROM  owner.table_name                            
WHERE   RID(owner.table_name) = 1034511             
;

The hex value reported above was X’00000FC90F’. Use the below calculator to convert this to decimal:

http://www.rapidtables.com/convert/number/hex-to-decimal.htm