Upgrade failure (11.2.0.4 to 19c) followed by ORA-29702: error occurred in Cluster Group Service operation & ORA-00704, ORA-00604 and ORA-00904

Update – April 10, 2021
We encountered a failure during an upgrade from 12.1 to 19c. The SYSTEM tablespace ran out of space causing the upgrade to fail. When we attempted to mount the database on the 12c home we encountered the error

 ORA-00704: bootstrap process failure
 ORA-00604: error occurred at recursive SQL level 1
 ORA-00904: "SPARE10": invalid identifier

and

ORA-29702: error occurred in Cluster Group Service operation

We mounted the database in the oracle home and performed the following steps

startup nomount;

alter database mount;

alter database open;

flashback database to restore point PRE_UPGRADE;

alter database open resetlogs;

shutdown immediate;

After that, we continued at “After consulting Google, we found the new process is to run a manual upgrade as follows:” in the below post. The upgrade lasted for 90 minutes and performed 107 phases. Most phases run between 10 and 150 seconds except for phase 53, 1135 seconds, and phase 98, 1388 seconds.

Original post continues below:
We were attempting to upgrade an Oracle database from 11.2.0.4 to 19 in an Oracle Exadata cluster in the Oracle Cloud Infrastructure. Approximately 44% in to the upgrade the process failed and performed a rollback. We suspect we had some invalid objects that caused this. However, when we attempted to start the database at 11.2.0.4 (old Oracle home) we encountered the error message “ORA-29702: error occurred in Cluster Group Service operation”. Oracle suggested that we bounce CRS across all nodes together (i.e. non-rolling) to resolve this. However, the database still did not start and displayed the message:

[actpsrvr-ACPTDB1] srvctl start database -d ACPTDB
PRCR-1079 : Failed to start resource ora.ACPTDB.db
CRS-5017: The resource action "ora.ACPTDB.db start" encountered the following error:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-39700: database must be opened with UPGRADE option
Process ID: 251124
Session ID: 775 Serial number: 1
. For details refer to "(:CLSN00107:)" in "/grid19/crs/trace/crsd_oraagent_oracle.trc".
 
CRS-2674: Start of 'ora.ACPTDB.db' on 'acptsrvr' failed
CRS-2632: There are no more servers to try to place resource 'ora.ACPTDB.db' on that would satisfy its placement policy
CRS-5017: The resource action "ora.ACPTDB.db start" encountered the following error:
ORA-01092: ORACLE instance terminated. Disconnection forced
ORA-00704: bootstrap process failure
ORA-39700: database must be opened with UPGRADE option
Process ID: 135914
Session ID: 775 Serial number: 1
. For details refer to "(:CLSN00107:)" in "/grid19/crs/trace/crsd_oraagent_oracle.trc".
 
CRS-2674: Start of 'ora.ACPTDB.db' on 'actpsrvr' failed

Any attempt to start the database resulted in the database starting in upgrade mode at this point. So we switched the database to the new 19c home and issued a STARTUP UPGRADE on node 01 and then attempted to run CATUPGRD which displayed:

SQL> @?/rdbms/admin/catupgrd.sql
DOC>######################################################################
DOC>######################################################################
DOC>                                 ERROR
DOC>
DOC>
DOC>    As of 12.2, customers must use the parallel upgrade utility, catctl.pl,
DOC>    to invoke catupgrd.sql when upgrading the database dictionary.
DOC>    Running catupgrd.sql directly from SQL*Plus is no longer supported.
DOC>
DOC>    For Example:
DOC>
DOC>
DOC>          catctl
DOC>
DOC>          or
DOC>
DOC>          cd $ORACLE_HOME/rdbms/admin
DOC>          $ORACLE_HOME/perl/bin/perl catctl.pl catupgrd.sql
DOC>
DOC>    Refer to the Oracle Database Upgrade Guide for more information.
DOC>
DOC>
DOC>######################################################################
DOC>######################################################################
DOC>#
Disconnected from Oracle Database 19c EE Extreme Perf Release 19.0.0.0.0 - Production
Version 19.8.0.0.0

After consulting Google, we found the new process is to run a manual upgrade as follows:

cd $ORACLE_HOME/bin
./dbupgrade

This kicks of a PERL script that does the upgrade. The log files are in:

$ORACLE_HOME/product/19.0.0.0/dbhome_9/cfgtoollogs/ACPTDB/upgrade20210226194351

Takes about 90 minutes and the log files keep switching in the above directory so keep changing which one you are tailing.

After the upgrade was finished, I attempted to start with SRVCTL and got the below error:

[actpsrvr-ACPTDB1] srvctl status database -d ACPTDB
PRCD-1229 : An attempt to access configuration of database ACPTDB was rejected because its version 11.2.0.4.0 differs from the program version 19.0.0.0.0. Instead run the program from /11.2.0/dbhome_2.

This is documented by Oracle in note “After manual database upgrade, srvctl commands fail with PRCD-1027, PRCD-1229 (Doc ID 1281852.1)”

CAUSE
Oracle Clusterware keys for the database still refer to the old ORACLE_HOME.
 
SOLUTION
1. Upgrade the Oracle Clusterware keys for the database by running the "srvctl upgrade database" command.
 
Run srvctl from the new release $ORACLE_HOME to upgrade the database keys. For example:
 
/bin/srvctl upgrade database -d DB_NAME -o NEW_ORACLE_DB_HOME

After we issued the above command the database started normally with SRVCTL commands. Oddly, it cut about 100 logs in quick succession before it calmed down and started reporting routine errors on objects. We then recompiled all the objects and the upgrade was successful.

Creating an EC2 in AWS via Cloud Formation

This blog post explains how to create (“spin up” in the cool lingo 🙂 ) an EC2 in AWS using Cloud Formation. I will attempt to explain each line or section of code before the code itself. The complete cloud formation template is available at the end of this post. This template is written in JSON. It could also have been done in YAML.

This is the opening curly brace. Just make sure you have one at the end to close it out. Strongly suggest using a JSON formatter and syntax checker.

{

These lines identify the AWS template format version and a description tag for documentation purposes

    "AWSTemplateFormatVersion":"2010-09-09",
    "Description":"Linux EC2 to test with",

The parameters section contains the parameters that will be passed from my Jenkins job to this cloud formation template. Each of these can have a type, default value, and description.

    "Parameters":{
        "CFstackname":{
            "Type":"String"
        },
        "AppID":{
            "Type":"String",
            "Default":"ABC",
            "Description":"Name of the application. Default=ABC"
        },
        "Role":{
            "Type":"String",
            "Default":"App",
            "Description":"Enter the Role name. Default=App"
        },
        "UserID":{
            "Type":"String",
            "Description":"Enter the userid of the owner(s)"
        }
    },

This next section specifies the resources that are to be created by this cloud formation template.

    "Resources":{

The type of the resource to be created is an EC2 instance

        "EC2Instance":{
            "Type":"AWS::EC2::Instance",

This section contains the properties to describe the resource.

            "Properties":{

This parameter specifies the instance type which in this case is a t3 small

                "InstanceType":"t3.small",

The DisableApiTermination attribute controls whether the instance can be terminated using the console, CLI, or API. While writing this blog post I did learn that the DisableApiTermination attribute does not prevent Amazon EC2 Auto Scaling from terminating an instance. Just a useful piece of information.

                "DisableApiTermination":"false",

This specifies the network interfaces. The AssociatePublicIpAddress is set to false in this case. The subnet ID is your subnet i.e. logical subdivision of an IP network. The group set contains the security groups that you want to be attached to this EC2 instance. These need to be predefined.

                "NetworkInterfaces":[
                    {
                        "AssociatePublicIpAddress":"false",
                        "DeviceIndex":"0",
                        "SubnetId":"my-subnet-01",
                        "GroupSet":[
                            "sg-for-me-01",
                            "sg-for-me-02"
                        ]
                    }
                ],

This next parameter specifies the Amazon machine image (AMI) that you wish to use to create this instance. In my case, I had a pre-created AMI that I was using as it was already encrypted with my KMS key

                "ImageId":"ami-dean-01",

Here I’m specifying the block devices (disks) to be associated with this machine. The volume size is specified in gigabytes. Storage is deleted when the EC2 is terminated. The disk is to be encrypted with the KMS key specified

                "BlockDeviceMappings":[
                    {
                        "DeviceName":"/dev/sda1",
                        "Ebs":{
                            "VolumeSize":"100",
                            "DeleteOnTermination":true,
                            "VolumeType":"gp2",
                            "Encrypted":"true",
                            "KmsKeyId":"my magical KMS key"
                        }
                    }
                ],

These are the tags that are to be associated with this EC2. Some of them are from the parameters of the top of the template. This is a free form to the extent that you can specify any tag string as the key and any value as the value.

                "Tags":[
                    {
                        "Key":"UserID",
                        "Value":"xyz01"
                    },
                    {
                        "Key":"Name",
                        "Value":"My Linux Machine"
                    },
                    {
                        "Key":"Role",
                        "Value":"App"
                    }

Close all the curly braces or else bad things will happen 🙂

 
                ]
            }
        }
    },

This section is the outputs that I am expecting. The first is the instance id, followed by the Availability Zone (AZ) and the private IP

    "Outputs":{
        "InstanceId":{
            "Description":"InstanceId of the newly created EC2 instance",
            "Value":{
                "Ref":"EC2Instance"
            }
        },
        "AZ":{
            "Description":"Availability Zone of the newly created EC2 instance",
            "Value":{
                "Fn::GetAtt":[
                    "EC2Instance",
                    "AvailabilityZone"
                ]
            }
        },
        "PrivateIP":{
            "Description":"PrivateIP of the newly created EC2 instance",
            "Value":{
                "Fn::GetAtt":[
                    "EC2Instance",
                    "PrivateIp"
                ]
            }
        }
    }
}

Complete cloud formation template

{
    "AWSTemplateFormatVersion":"2010-09-09",
    "Description":"Linux EC2 to test with",
    "Parameters":{
        "CFstackname":{
            "Type":"String"
        },
        "AppID":{
            "Type":"String",
            "Default":"ABC",
            "Description":"Name of the application. Default=ABC"
        },
        "Role":{
            "Type":"String",
            "Default":"App",
            "Description":"Enter the Role name. Default=App"
        },
        "UserID":{
            "Type":"String",
            "Description":"Enter the userid of the owner(s)"
        }
    },
    "Resources":{
        "EC2Instance":{
            "Type":"AWS::EC2::Instance",
            "Properties":{
                "InstanceType":"t3.small",
                "DisableApiTermination":"false",
                "NetworkInterfaces":[
                    {
                        "AssociatePublicIpAddress":"false",
                        "DeviceIndex":"0",
                        "SubnetId":"my-subnet-01",
                        "GroupSet":[
                            "sg-for-me-01",
                            "sg-for-me-02"
                        ]
                    }
                ],
                "ImageId":"ami-dean-01",
                "BlockDeviceMappings":[
                    {
                        "DeviceName":"/dev/sda1",
                        "Ebs":{
                            "VolumeSize":"100",
                            "DeleteOnTermination":true,
                            "VolumeType":"gp2",
                            "Encrypted":"true",
                            "KmsKeyId":"my magical KMS key"
                        }
                    }
                ],
                "Tags":[
                    {
                        "Key":"UserID",
                        "Value":"xyz01"
                    },
                    {
                        "Key":"Name",
                        "Value":"My Linux Machine"
                    },
                    {
                        "Key":"Role",
                        "Value":"App"
                    }
 
                ]
            }
        }
    },
    "Outputs":{
        "InstanceId":{
            "Description":"InstanceId of the newly created EC2 instance",
            "Value":{
                "Ref":"EC2Instance"
            }
        },
        "AZ":{
            "Description":"Availability Zone of the newly created EC2 instance",
            "Value":{
                "Fn::GetAtt":[
                    "EC2Instance",
                    "AvailabilityZone"
                ]
            }
        },
        "PrivateIP":{
            "Description":"PrivateIP of the newly created EC2 instance",
            "Value":{
                "Fn::GetAtt":[
                    "EC2Instance",
                    "PrivateIp"
                ]
            }
        }
    }
}

ORA-31604: invalid transform NAME parameter “MODIFY” for object type PROCACT_INSTANCE in function ADD_TRANSFORM

While performing an import data pump operation into an Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 – 64bit Production we encountered the below error:

Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
Master table "SYS"."SYS_IMPORT_FULL_15" successfully loaded/unloaded
Starting "SYS"."SYS_IMPORT_FULL_15":  "/******** AS SYSDBA" directory=exp_dir dumpfile=resolvecode.dmp remap_table=myuser.accounting_codes:accounting_codes_prod 
Processing object type TABLE_EXPORT/TABLE/PROCACT_INSTANCE
ORA-39126: Worker unexpected fatal error in KUPW$WORKER.LOAD_MD_TRANSFORMS [] 
ORA-31604: invalid transform NAME parameter "MODIFY" for object type PROCACT_INSTANCE in function ADD_TRANSFORM
ORA-06512: at "SYS.DBMS_SYS_ERROR", line 86
ORA-06512: at "SYS.KUPW$WORKER", line 9710

This issue is documented on Oracle support at

Impdp Fails With ORA-39126 ORA-31604 Iinvalid Transform NAME Parameter "MODIFY" For Object Type PROCACT_INSTANCE in function ADD_TRANSFORM (Doc ID 1596495.1)

The solution is to change the import data pump command from:

impdp \'/ as sysdba \' dumpfile=resolvecode.dmp directory=exp_dir remap_table=myuser.accounting_codes:accounting_codes_prod 

to:

impdp \'/ as sysdba \' dumpfile=resolvecode.dmp directory=exp_dir remap_table=myuser.accounting_codes:accounting_codes_prod exclude=PROCACT_INSTANCE

The use of the exclude=PROCACT_SYSTEM will exclude the resource manager objects such as resource plans and groups.

There is also a patch available, but the addition of exclude=PROCACT_INSTANCE solved my issue faster than patching.

CRS-6706: Oracle Clusterware Release patch level (‘xxxxxxxx’) does not match Software patch level (‘xxxxxxxxx’). Oracle Clusterware cannot be started

After applying the October 2020 patch to a 12c GI, we were attempting to bring up CRS and got the below error message:

CRS-6706: Oracle Clusterware Release patch level (‘xxxxxxxx’) does not match Software patch level (‘xxxxxxxxx’). Oracle Clusterware cannot be started

This is described in Oracle support document 1639285.1

We stopped CRS on all nodes and performed the below on the RAC node that reported the error:

Sudo to root and navigate to the grid home/install directory and run

./rootcrs.pl –unlock

The output will look like:

Using configuration parameter file: ./crsconfig_params
2021/01/15 00:52:44 CLSRSC-4012: Shutting down Oracle Trace File Analyzer (TFA) Collector.
 
2021/01/15 00:52:55 CLSRSC-4013: Successfully shut down Oracle Trace File Analyzer (TFA) Collector.
 
2021/01/15 00:52:56 CLSRSC-347: Successfully unlock /grid_home/12.2.0.2

After the above, run the patch command:

./rootcrs.pl –patch

The output will look like:

Using configuration parameter file: ./crsconfig_params
2021/01/15 00:53:12 CLSRSC-4015: Performing install or upgrade action for Oracle Trace File Analyzer (TFA) Collector.
 
2021/01/15 00:53:12 CLSRSC-4005: Failed to patch Oracle Trace File Analyzer (TFA) Collector. Grid Infrastructure operations will continue.
 
2021/01/15 00:53:24 CLSRSC-329: Replacing Clusterware entries in file 'oracle-ohasd.conf'
 
CRS-4123: Oracle High Availability Services has been started.
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on 'pdcc04-d03'
CRS-2673: Attempting to stop 'ora.drivers.acfs' on 'pdcc04-d03'
CRS-2677: Stop of 'ora.drivers.acfs' on 'pdcc04-d03' succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on 'pdcc04-d03' has completed
CRS-4133: Oracle High Availability Services has been stopped.

After this, we were able to start CRS on all nodes of the RAC.

AUTH_COMPATIBILITY value of null causing UNLOAD utility jobs to fail

This week we went to a new fix level that activated the AUTH_COMPATIBILITY in macro DSN6SPRM. The documentation states:

The AUTH_COMPATIBILITY parameter specifies whether current-release of previous-release authorization rules are used for specific Db2 operations.
 
Null (blank)
Indicates that AUTH_COMPATIBILITY has no effect and that there is no authorization check override.
 
SELECT_FOR_UNLOAD
When this value is specified, the UNLOAD utility checks if the user has the SELECT or UNLOAD privilege on the target table. If this value is not specified, the UNLOAD utility checks if the user has the UNLOAD privilege on the target table.

In our case, the zparm was set to null, however, our unload (utility not DSNTIAUL) jobs began to fail. The auth ids executing the unload utilities had SELECT access but not UNLOAD access. The above documentation indicated that the null value should have allowed the jobs to run but we saw failures.

Since this is a dynamic change, we switched from null to SELECT_FOR_UNLOAD and the previous functionality was restored.

Export datapump encounters error ORA-39126 and ORA-01690

We ran across the following error while running a metadata-only export data pump on an 11.2.0.4 database:

ORA-39126: Worker unexpected fatal error in KUPW$WORKER.FETCH_XML_OBJECTS [PROCA
CT_SCHEMA:"SYSMAN"]
ORA-01690: sort area size too small
 
ORA-06512: at "SYS.DBMS_SYS_ERROR", line 95
ORA-06512: at "SYS.KUPW$WORKER", line 9715
 
----- PL/SQL Call Stack -----
object line object
handle number name
0x11d4c2bf0 21979 package body SYS.KUPW$WORKER
0x11d4c2bf0 9742 package body SYS.KUPW$WORKER
0x11d4c2bf0 11838 package body SYS.KUPW$WORKER
0x11d4c2bf0 2808 package body SYS.KUPW$WORKER
0x11d4c2bf0 10422 package body SYS.KUPW$WORKER
0x11d4c2bf0 1824 package body SYS.KUPW$WORKER
0x11df95480 2 anonymous block
 
Job "SYS"."SYS_EXPORT_FULL_21" stopped due to fatal error at Wed Dec 23 18:13:11
2020 elapsed 0 00:05:34

According the Oracle Support Doc ID 1358890.1, this is a known bug that occurs after applying patch set 11.2.0.2. There are some solutions in Doc ID 1358890.1 including upgrading to 12.1, applying patch set 11.2.0.4 or an interim patch set. Another solution is to

update sys.metaview$ set properties = properties + 1024 where viewname = 'KU$_PROCACT_SYS_VIEW';

We tried the update option, including increasing the size up to 4096. However, this did not resolve the error. After some trial and error, we were able to complete the export with the below command:

expdp \'/ as sysdba \' directory=exp_dir dumpfile=rpt.dmp exclude=statistics exclude=schema:\"IN \(\'SYSMAN\',\'SYSTEM\',\'VRRPT1NE\',\'SYS\'\)\" full=y content=metadata_only

Changes to SYS user’s password not reflected in column PTIME of SYS.USER$ table

Today I noticed that the changes to the SYS user’s password were not being reflected in the ptime column of SYS.USER$. While this may be trivial, this column can be used to identify passwords that have not been changed recently. This issue is documented by Oracle in “After Changing SYS Password, DBA_USERS.EXPIRY_DATE Not Updated For SYS User (Doc ID 2518310.1)” and Bug 28538439. In summary, after applying the July 2020 patch, you need to apply Patch 28538439 and perform the below steps:

This was performed on a RAC database running Release 12.1.0.2.0.

After changing the SYS user’s password, the SYS.USER$ table displayed the below information:

set line 300
col name format a15
SELECT    name, ctime, ptime
FROM      sys.user$
where     name like 'SYS%'
order by  name
;  
 
NAME            CTIME     PTIME
--------------- --------- ---------
SYS             07-AUG-18 10-JUN-19 <-- Incorrect date

Check to see if patch 28538439 has been applied:

opatch lsinv -all | grep 28538439
     21117072, 22611167, 23182225, 25139545, 
     26196152, 27487279, 28538439 <-- Patch has been applied

The solution is to set the below parameter to “TRUE”:

show parameter _enable_ptime_update_for_sys;
 
No output is displayed, indicating that the 
parameter has not been set

Note that this parameter cannot be set by an ALTER SYSTEM command. It has to be present in the spfile at startup.

Create a temporary pfile from the spfile:

create pfile='/tmp/PRD1_pfile.txt' from spfile='+ORA_DATA_1/PRD1/PARAMETERFILE/spfile';

Edit the temporary pfile and add in the parameter:

vi /tmp/PRD1_pfile.txt
 
add the below line
 
*._enable_ptime_update_for_sys = TRUE

Stop the database and check the status:

srvctl stop database -d PRD1 -o immediate
 
srvctl status database -d PRD1

Start up one node of the RAC in nomount to recreate the spfile from the pfile:

sqlplus / as sysdba
 
startup nomount pfile='/tmp/PRD1_pfile.txt';
 
create spfile='+ORA_DATA_1/PRD1/PARAMETERFILE/spfile' from pfile='/tmp/PRD1_pfile.txt';
 
shutdown immediate;

Start up the RAC (all nodes) and check the status:

srvctl start database -d PRD1
srvctl status database -d PRD1

Check that the parameter to resolve Bug 28538439 has been set:

show parameter _enable_ptime_update_for_sys;
 
Output will be:
 
NAME                         TYPE    VALUE
---------------------------- ------- -----
_enable_ptime_update_for_sys boolean TRUE

This is not retroactive to the previous change to the SYS user’s password. The next time the SYS user’s password is changed, it will be reflected in the SYS.USER$ table:

set line 300
col name format a15
SELECT    name, ctime, ptime
FROM      sys.user$
where     name like 'SYS%'
order by  name
;  
 
NAME            CTIME     PTIME
--------------- --------- ---------
SYS             07-AUG-18 03-DEC-20 <-- Correct date

Oracle GoldenGate – Conditional replication based on the value of a column

A colleague had a requirement to replicate rows from a table only when a column was updated to a specific value and came up with this solution. The table in question was a configuration table which contained parameters and values that was loaded in advance of code deployment. After the code was deployed, the parameter was activated by updating the column REPLICATE_FLAG to a value of “y”.

On the source side, the table is defined in the .prm file as follows:

TABLE TABLE_OWNER.CONFIG_TABLE, WHERE (REPLICATE_FLAG = 'y' OR REPLICATE ='Y');

This causes GoldenGate to replicate only after the row is updated to a “y”. On the target side, the row can either exist in which case GoldenGate performs an update or the row does not exist and GoldenGate performs an insert. In order to handle both conditions, the *.prm file has the parameter insertmissingupdates.

insertmissingupdates
MAP TABLE_OWNER_PROD.CONFIG_TABLE , TARGET TABLE_OWNER_TARGET.CONFIG_TABLE;
noinsertupdates
MAP TABLE_OWNER_PROD.OTHER_TABLE , TARGET TABLE_OWNER_TARGET.OTHER_TABLE

Since you do not want this parameter (i.e. missing updates converted to inserts behavior) to be applied to other tables being replicated, you can either turn it off with the noinsertupdates or put the table that needs the special handling at the end.

This was tested on Oracle GoldenGate Version 19.1.0.0.4 and Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 – 64bit Production

Left aligning DSNTEP2 (aka batch SPUFI) output

A short blog about a trivial but annoying item. By default, the DSNTEP2 (aka batch SPUFI) output is center aligned. As I prefer the output to be left aligned for readability and parsing of output, I code the JCL as described below.

By default, the below code will center align the output:

//SYSTSIN   DD *                                         
 DSN SYSTEM(PRN3)                                        
 RUN  PROGRAM(DSNTEP2) PLAN(DSNTEP2)
//*                                                      
//SYSIN    DD  *                                         
  SELECT    COUNT(*)                                     
  FROM      Q.OBJECT_DATA                                
  ;                                                      

And the output looks like:
(Actual display is further to the right.I moved it to the left for readability.)

PAGE    1                                                                     
***INPUT STATEMENT:                                                           
   SELECT    COUNT(*)                                                         
   FROM      Q.OBJECT_DATA                                                    
   ;                                                                          
                                                 +----------------+ 
                                                 |                | 
                                                 +----------------+ 
                                               1_|           4042 | 
                                                 +----------------+ 
SUCCESSFUL RETRIEVAL OF          1 ROW(S)                                     

To left align, code as:

Default (i.e. center) alignment:
//SYSTSIN   DD *                                         
 DSN SYSTEM(PRN3)                                        
 RUN  PROGRAM(DSNTEP2) PLAN(DSNTEP2) PARMS('/ALIGN(LHS)')
//*                                                      
//SYSIN    DD  *                                         
  SELECT    COUNT(*)                                     
  FROM      Q.OBJECT_DATA                                
  ;                                                      

And the output looks like:

PAGE    1                                
***INPUT STATEMENT:                      
   SELECT    COUNT(*)                    
   FROM      Q.OBJECT_DATA               
   ;                                     
       +----------------+                
       |                |                
       +----------------+                
     1_|           4042 |                
       +----------------+                
SUCCESSFUL RETRIEVAL OF          1 ROW(S)

REORG and contention – Forcibly terminating threads?

One of the challenges in performing online REORGs (i.e. SHRLEVEL CHANGE) is successfully completing the switch phase of the process. This problem seems to compound in direct relation to the number of distributed threads that connect to the database objects. I often see a pattern wherein a distributed thread acquires a lock or claim on an object and then does not perform a commit to release the lock or claim. This results in the REORG failing with a page out to the DBA.

I initially coded the REROG jobs with

REORG TABLESPACE   LIST CLUSTTS                            
      LOG           NO                        
      KEEPDICTIONARY                          
      ROWFORMAT BRF                           
      SORTDATA                                
      SORTKEYS                                
      NOSYSREC                                
      COPYDDN      (LPCOPY)                   
      RECOVERYDDN  (RBCOPY)                   
      SHRLEVEL      CHANGE                    
      DRAIN_WAIT    10 RETRY 3  RETRY_DELAY 10
      DEADLINE      NONE                      
      MAXRO         60                        
      DRAIN         WRITERS   <---
      LONGLOG       CONTINUE                  
      DELAY         0                         
      TIMEOUT       TERM                      
      FASTSWITCH    YES                       
      UNLOAD        CONTINUE                  
      STATISTICS    TABLE INDEX ALL           
      UPDATE ALL                

Assuming that this would be the least invasive approach. This was based on the below information from the Utility Guide:

WRITERS
Specifies that Db2 drains only the writers during the 
log phase after the MAXRO threshold is reached and 
then issues DRAIN ALL on entering the switch phase.

I changed the options to include DRAIN ALL FORCE ALL:

REORG TABLESPACE   LIST CLUSTTS                            
      LOG           NO                        
      KEEPDICTIONARY                          
      ROWFORMAT BRF                           
      SORTDATA                                
      SORTKEYS                                
      NOSYSREC                                
      COPYDDN      (LPCOPY)                   
      RECOVERYDDN  (RBCOPY)                   
      SHRLEVEL      CHANGE                    
      DRAIN_WAIT    10 RETRY 3  RETRY_DELAY 10
      DEADLINE      NONE                      
      MAXRO         60 
      DRAIN         ALL    <---                   
      FORCE         ALL    <---           
      LONGLOG       CONTINUE                  
      DELAY         0                         
      TIMEOUT       TERM                      
      FASTSWITCH    YES                       
      UNLOAD        CONTINUE                  
      STATISTICS    TABLE INDEX ALL           
      UPDATE ALL                

The results were as follow:

If the DDF thread had only performed a select without a commit, it was cancelled with the below message in the DB2 master address space:

DSNL027I  -PR01 SERVER DISTRIBUTED AGENT WITH  828                   
           LUWID=JF0C80D3.DC4E.D8C6D35F5B78=129996                   
                                                                     
THREAD-INFO=ABC123Y:67.19.138.241:ABC123y:db2jcc_application:DYNAMIC:
5719:*:<::67.19.138.241.55378.D8C6D35F5B78>                          
           RECEIVED ABEND=04E                                        
           FOR REASON=00E50013

If the DDF thread had performed an update without a commit, it was not cancelled. Instead, the REORG fails with the below message in the job output:

DSNURDRN - RESOURCE NOT AVAILABLE, REASON=X'00C200EA', ON 
DBNAME.TSNAME PROHIBITS PROCESSING

I would have preferred that the DRAIN ALL FORCE ALL options would have cancelled the thread in both cases.

142
51
73
77