One of the challenges in performing online REORGs (i.e. SHRLEVEL CHANGE) is successfully completing the switch phase of the process. This problem seems to compound in direct relation to the number of distributed threads that connect to the database objects. I often see a pattern wherein a distributed thread acquires a lock or claim on an object and then does not perform a commit to release the lock or claim. This results in the REORG failing with a page out to the DBA.
I initially coded the REROG jobs with
REORG TABLESPACE LIST CLUSTTS
LOG NO
KEEPDICTIONARY
ROWFORMAT BRF
SORTDATA
SORTKEYS
NOSYSREC
COPYDDN (LPCOPY)
RECOVERYDDN (RBCOPY)
SHRLEVEL CHANGE
DRAIN_WAIT 10 RETRY 3 RETRY_DELAY 10
DEADLINE NONE
MAXRO 60
DRAIN WRITERS <---
LONGLOG CONTINUE
DELAY 0
TIMEOUT TERM
FASTSWITCH YES
UNLOAD CONTINUE
STATISTICS TABLE INDEX ALL
UPDATE ALL
Assuming that this would be the least invasive approach. This was based on the below information from the Utility Guide:
WRITERS
Specifies that Db2 drains only the writers during the
log phase after the MAXRO threshold is reached and
then issues DRAIN ALL on entering the switch phase.
I changed the options to include DRAIN ALL FORCE ALL:
REORG TABLESPACE LIST CLUSTTS
LOG NO
KEEPDICTIONARY
ROWFORMAT BRF
SORTDATA
SORTKEYS
NOSYSREC
COPYDDN (LPCOPY)
RECOVERYDDN (RBCOPY)
SHRLEVEL CHANGE
DRAIN_WAIT 10 RETRY 3 RETRY_DELAY 10
DEADLINE NONE
MAXRO 60
DRAIN ALL <---
FORCE ALL <---
LONGLOG CONTINUE
DELAY 0
TIMEOUT TERM
FASTSWITCH YES
UNLOAD CONTINUE
STATISTICS TABLE INDEX ALL
UPDATE ALL
The results were as follow:
If the DDF thread had only performed a select without a commit, it was cancelled with the below message in the DB2 master address space:
DSNL027I -PR01 SERVER DISTRIBUTED AGENT WITH 828
LUWID=JF0C80D3.DC4E.D8C6D35F5B78=129996
THREAD-INFO=ABC123Y:67.19.138.241:ABC123y:db2jcc_application:DYNAMIC:
5719:*:<::67.19.138.241.55378.D8C6D35F5B78>
RECEIVED ABEND=04E
FOR REASON=00E50013
If the DDF thread had performed an update without a commit, it was not cancelled. Instead, the REORG fails with the below message in the job output:
DSNURDRN - RESOURCE NOT AVAILABLE, REASON=X'00C200EA', ON
DBNAME.TSNAME PROHIBITS PROCESSING
I would have preferred that the DRAIN ALL FORCE ALL options would have cancelled the thread in both cases.