AUTH_COMPATIBILITY value of null causing UNLOAD utility jobs to fail

This week we went to a new fix level that activated the AUTH_COMPATIBILITY in macro DSN6SPRM. The documentation states:

The AUTH_COMPATIBILITY parameter specifies whether current-release of previous-release authorization rules are used for specific Db2 operations.
 
Null (blank)
Indicates that AUTH_COMPATIBILITY has no effect and that there is no authorization check override.
 
SELECT_FOR_UNLOAD
When this value is specified, the UNLOAD utility checks if the user has the SELECT or UNLOAD privilege on the target table. If this value is not specified, the UNLOAD utility checks if the user has the UNLOAD privilege on the target table.

In our case, the zparm was set to null, however, our unload (utility not DSNTIAUL) jobs began to fail. The auth ids executing the unload utilities had SELECT access but not UNLOAD access. The above documentation indicated that the null value should have allowed the jobs to run but we saw failures.

Since this is a dynamic change, we switched from null to SELECT_FOR_UNLOAD and the previous functionality was restored.

Left aligning DSNTEP2 (aka batch SPUFI) output

A short blog about a trivial but annoying item. By default, the DSNTEP2 (aka batch SPUFI) output is center aligned. As I prefer the output to be left aligned for readability and parsing of output, I code the JCL as described below.

By default, the below code will center align the output:

//SYSTSIN   DD *                                         
 DSN SYSTEM(PRN3)                                        
 RUN  PROGRAM(DSNTEP2) PLAN(DSNTEP2)
//*                                                      
//SYSIN    DD  *                                         
  SELECT    COUNT(*)                                     
  FROM      Q.OBJECT_DATA                                
  ;                                                      

And the output looks like:
(Actual display is further to the right.I moved it to the left for readability.)

PAGE    1                                                                     
***INPUT STATEMENT:                                                           
   SELECT    COUNT(*)                                                         
   FROM      Q.OBJECT_DATA                                                    
   ;                                                                          
                                                 +----------------+ 
                                                 |                | 
                                                 +----------------+ 
                                               1_|           4042 | 
                                                 +----------------+ 
SUCCESSFUL RETRIEVAL OF          1 ROW(S)                                     

To left align, code as:

Default (i.e. center) alignment:
//SYSTSIN   DD *                                         
 DSN SYSTEM(PRN3)                                        
 RUN  PROGRAM(DSNTEP2) PLAN(DSNTEP2) PARMS('/ALIGN(LHS)')
//*                                                      
//SYSIN    DD  *                                         
  SELECT    COUNT(*)                                     
  FROM      Q.OBJECT_DATA                                
  ;                                                      

And the output looks like:

PAGE    1                                
***INPUT STATEMENT:                      
   SELECT    COUNT(*)                    
   FROM      Q.OBJECT_DATA               
   ;                                     
       +----------------+                
       |                |                
       +----------------+                
     1_|           4042 |                
       +----------------+                
SUCCESSFUL RETRIEVAL OF          1 ROW(S)

REORG and contention – Forcibly terminating threads?

One of the challenges in performing online REORGs (i.e. SHRLEVEL CHANGE) is successfully completing the switch phase of the process. This problem seems to compound in direct relation to the number of distributed threads that connect to the database objects. I often see a pattern wherein a distributed thread acquires a lock or claim on an object and then does not perform a commit to release the lock or claim. This results in the REORG failing with a page out to the DBA.

I initially coded the REROG jobs with

REORG TABLESPACE   LIST CLUSTTS                            
      LOG           NO                        
      KEEPDICTIONARY                          
      ROWFORMAT BRF                           
      SORTDATA                                
      SORTKEYS                                
      NOSYSREC                                
      COPYDDN      (LPCOPY)                   
      RECOVERYDDN  (RBCOPY)                   
      SHRLEVEL      CHANGE                    
      DRAIN_WAIT    10 RETRY 3  RETRY_DELAY 10
      DEADLINE      NONE                      
      MAXRO         60                        
      DRAIN         WRITERS   <---
      LONGLOG       CONTINUE                  
      DELAY         0                         
      TIMEOUT       TERM                      
      FASTSWITCH    YES                       
      UNLOAD        CONTINUE                  
      STATISTICS    TABLE INDEX ALL           
      UPDATE ALL                

Assuming that this would be the least invasive approach. This was based on the below information from the Utility Guide:

WRITERS
Specifies that Db2 drains only the writers during the 
log phase after the MAXRO threshold is reached and 
then issues DRAIN ALL on entering the switch phase.

I changed the options to include DRAIN ALL FORCE ALL:

REORG TABLESPACE   LIST CLUSTTS                            
      LOG           NO                        
      KEEPDICTIONARY                          
      ROWFORMAT BRF                           
      SORTDATA                                
      SORTKEYS                                
      NOSYSREC                                
      COPYDDN      (LPCOPY)                   
      RECOVERYDDN  (RBCOPY)                   
      SHRLEVEL      CHANGE                    
      DRAIN_WAIT    10 RETRY 3  RETRY_DELAY 10
      DEADLINE      NONE                      
      MAXRO         60 
      DRAIN         ALL    <---                   
      FORCE         ALL    <---           
      LONGLOG       CONTINUE                  
      DELAY         0                         
      TIMEOUT       TERM                      
      FASTSWITCH    YES                       
      UNLOAD        CONTINUE                  
      STATISTICS    TABLE INDEX ALL           
      UPDATE ALL                

The results were as follow:

If the DDF thread had only performed a select without a commit, it was cancelled with the below message in the DB2 master address space:

DSNL027I  -PR01 SERVER DISTRIBUTED AGENT WITH  828                   
           LUWID=JF0C80D3.DC4E.D8C6D35F5B78=129996                   
                                                                     
THREAD-INFO=ABC123Y:67.19.138.241:ABC123y:db2jcc_application:DYNAMIC:
5719:*:<::67.19.138.241.55378.D8C6D35F5B78>                          
           RECEIVED ABEND=04E                                        
           FOR REASON=00E50013

If the DDF thread had performed an update without a commit, it was not cancelled. Instead, the REORG fails with the below message in the job output:

DSNURDRN - RESOURCE NOT AVAILABLE, REASON=X'00C200EA', ON 
DBNAME.TSNAME PROHIBITS PROCESSING

I would have preferred that the DRAIN ALL FORCE ALL options would have cancelled the thread in both cases.

Recovering a tablespaces with a LOB Tablespace/Auxiliary table

Due to a miscommunication with the development team, the DBA team accidentally refreshed the data in an environment that was currently being used for a test cycle. The users requested that we recover the database back to the most recent copy which we had taken earlier that morning. I created a REXX routine to generate the approximately 600 recover jobs that were needed to perform this as fast as possible. Some of my jobs failed because the base table had a LOB column that was pointing to an auxiliary table in a LOB TABLESPACE.

The table structure was defined as follows (DDL has been edited to remove options not relevant):

     CREATE TABLE                                             
       PRODA.MY_TABLE_01                                      
        (                                                     
        ACCOUNT_NAME   CHAR(50) NOT NULL WITH DEFAULT          
         FOR SBCS DATA                                        
       ,ACCOUNT_TXT   CLOB(1 M) NOT NULL WITH DEFAULT          
         FOR SBCS DATA                                        
        )                                                     
         IN PRODAB00.TSPACE57  
         CCSID EBCDIC                                         
         NOT VOLATILE                                         
         APPEND NO                                                                           
     ;                                
     CREATE                           
       LOB TABLESPACE TSPACE58        
         IN DPDNBB00                  
         USING STOGROUP PRODASMS      
         PRIQTY -1                    
         SECQTY -1                    
         ERASE NO                         
         GBPCACHE SYSTEM                  
         NOT LOGGED                       
         DSSIZE 4 G                       
         BUFFERPOOL BP32K1                
         LOCKSIZE ANY                     
         LOCKMAX SYSTEM                   
         CLOSE NO                         
     ;                                    
     CREATE AUXILIARY TABLE               
         PRODA.MY_TABLE_01_AUX_01         
         IN PRODAB00.TSPACE58             
         STORES PRODA.MY_TABLE_01         
         APPEND NO                        
         COLUMN ACCOUNT_TXT                
         PART 1                           
     ;                                    

The initial recover was coded as follows:

//SYSIN    DD  *                                   
  RECOVER  TABLESPACE  PRODAB00.TSPACE57           
           TORBA       X'0000000015D79FB6B7E0'     
/*                                                 

This generated the error:

DSNUGUTC -  RECOVER TABLESPACE PRODAB00.TSPACE57 TORBA X'0000000015D79FB6B7E0'
DSNU1316I -DBXC 102 15:41:03.83 DSNUCAIN - THE FOLLOWING TABLESPACES ARE MISSING FROM THE RECOVERY LIST
PRODAB00.TSPACE58

I changed the JCL as follows:

//SYSIN    DD  *                                  
  RECOVER  TABLESPACE  PRODAB00.TSPACE57          
           TABLESPACE  PRODAB00.TSPACE58          
           TORBA       X'0000000015D79FB6B7E0'    
/*                                                

And the recover completed:

DSNUCBMD - RECOVER TABLESPACE PRODAB00.TSPACE57   START
DSNUCBAL - THE IMAGE COPY DATA SET HLQ.PRODAB00.TSPACE57.NVE.F1010801 WITH DATE=20180411 AND TIME=071133 IS PARTICIPATING IN RECOVERY OF TABLESPACE PRODAB00.TSPACE57

DSNUCBMD - RECOVER TABLESPACE PRODAB00.TSPACE58   START                    
DSNUCBAL - THE IMAGE COPY DATA SET HLQ.PRODAB00.TSPACE58.NVE.F1010801 WITH DATE=20180411 AND TIME=071133 IS PARTICIPATING IN RECOVERY OF TABLESPACE PRODAB00.TSPACE58

Recovering in the event a user issues an incorrect SQL in DB2 for z/OS

If a user inadvertently updates, inserts or deletes data in a table during the day and you need to recover the table to a prior point in time, you can recover to a checkpoint rather than an Image Copy or a quiesce point to reduce the amount of data loss. For example, if the user issued a “bad” query at 17:18 on March 29, 2018, you can print the checkpoint queue with DB2 utility DSNJU004. The output will look similar to:

                    CHECKPOINT QUEUE                
                18:40:10 MARCH 29, 2018             
TIME OF CHECKPOINT       18:30:04 MARCH 29, 2018    
BEGIN CHECKPOINT RBA            0000000023FDD4187405
END CHECKPOINT RBA              0000000023FDD4194000
END CHECKPOINT STCK             00D41A198B683F8E0A00
TIME OF CHECKPOINT       18:15:04 MARCH 29, 2018    
BEGIN CHECKPOINT RBA            0000000023FDCE6AA615
END CHECKPOINT RBA              0000000023FDCE6B73E7
END CHECKPOINT STCK             00D41A16311B02B53A00
TIME OF CHECKPOINT       18:02:44 MARCH 29, 2018    
BEGIN CHECKPOINT RBA            0000000023FDC976C806
END CHECKPOINT RBA              0000000023FDC9785E4F
END CHECKPOINT STCK             00D41A136F4ACE910800
TIME OF CHECKPOINT       17:47:44 MARCH 29, 2018    
BEGIN CHECKPOINT RBA            0000000023FDC3A02EF2
END CHECKPOINT RBA              0000000023FDC3A11000
END CHECKPOINT STCK             00D41A1014FBDC3B0800
TIME OF CHECKPOINT       17:32:44 MARCH 29, 2018    
BEGIN CHECKPOINT RBA            0000000023FDBDA9FBDC
END CHECKPOINT RBA              0000000023FDBDAAD967
END CHECKPOINT STCK             00D41A0CBAAD71430800
TIME OF CHECKPOINT       17:17:44 MARCH 29, 2018    
BEGIN CHECKPOINT RBA            0000000023FDB7EAA020
END CHECKPOINT RBA              0000000023FDB7EC4622
END CHECKPOINT STCK             00D41A096060B5B53000

There is a checkpoint at 17:17 on March 29, 2018 that may be a good recovery point.

The tablespace can be recovered with

//SYSIN    DD  *                        
  RECOVER  TABLESPACE proddb00.prodts33 
           TORBA      X'23FDB7EC4622'   
/*                                      

If you have an old table space with multiple tables, they will all be recovered to the RBA. Rebuild all the associated indices after the recovery completes.