One of the challenges in performing online REORGs (i.e. SHRLEVEL CHANGE) is successfully completing the switch phase of the process. This problem seems to compound in direct relation to the number of distributed threads that connect to the database objects. I often see a pattern wherein a distributed thread acquires a lock or claim on an object and then does not perform a commit to release the lock or claim. This results in the REORG failing with a page out to the DBA.
I initially coded the REROG jobs with
REORG TABLESPACE LIST CLUSTTS LOG NO KEEPDICTIONARY ROWFORMAT BRF SORTDATA SORTKEYS NOSYSREC COPYDDN (LPCOPY) RECOVERYDDN (RBCOPY) SHRLEVEL CHANGE DRAIN_WAIT 10 RETRY 3 RETRY_DELAY 10 DEADLINE NONE MAXRO 60 DRAIN WRITERS <--- LONGLOG CONTINUE DELAY 0 TIMEOUT TERM FASTSWITCH YES UNLOAD CONTINUE STATISTICS TABLE INDEX ALL UPDATE ALL
Assuming that this would be the least invasive approach. This was based on the below information from the Utility Guide:
WRITERS Specifies that Db2 drains only the writers during the log phase after the MAXRO threshold is reached and then issues DRAIN ALL on entering the switch phase.
I changed the options to include DRAIN ALL FORCE ALL:
REORG TABLESPACE LIST CLUSTTS LOG NO KEEPDICTIONARY ROWFORMAT BRF SORTDATA SORTKEYS NOSYSREC COPYDDN (LPCOPY) RECOVERYDDN (RBCOPY) SHRLEVEL CHANGE DRAIN_WAIT 10 RETRY 3 RETRY_DELAY 10 DEADLINE NONE MAXRO 60 DRAIN ALL <--- FORCE ALL <--- LONGLOG CONTINUE DELAY 0 TIMEOUT TERM FASTSWITCH YES UNLOAD CONTINUE STATISTICS TABLE INDEX ALL UPDATE ALL
The results were as follow:
If the DDF thread had only performed a select without a commit, it was cancelled with the below message in the DB2 master address space:
DSNL027I -PR01 SERVER DISTRIBUTED AGENT WITH 828 LUWID=JF0C80D3.DC4E.D8C6D35F5B78=129996 THREAD-INFO=ABC123Y:67.19.138.241:ABC123y:db2jcc_application:DYNAMIC: 5719:*:<::67.19.138.241.55378.D8C6D35F5B78> RECEIVED ABEND=04E FOR REASON=00E50013
If the DDF thread had performed an update without a commit, it was not cancelled. Instead, the REORG fails with the below message in the job output:
DSNURDRN - RESOURCE NOT AVAILABLE, REASON=X'00C200EA', ON DBNAME.TSNAME PROHIBITS PROCESSING
I would have preferred that the DRAIN ALL FORCE ALL options would have cancelled the thread in both cases.
What is this heresy?
DB2?
Have you gone over to “the dark side of the force”?
Lol
It has been a dark side kind of week 🙂