STROBing a transaction

In the last few weeks, we have been challenged with a system that was supposed to process orders was not processing them in a timely fashion. TMON indicated that the process was sitting in an IC wait suggesting that the delay was intentional. However, the concern was that the process was waiting on some unknown resource such as the database. The process consisted of a CICS transaction that called a number of COBOL programs and accessed an ADABAS database.

In an attempt to identify the actual line of code that was causing the delay, we turned on STROBE. Since this is a CICS transaction, you have to run an “ADD ACTIVE” measurement as the CICS region is an executing job. Set the job name to the CICS region, the session duration to 10 minutes, and the target sample size to 20,000 (Change these numbers as required for your context). In addition, select the “Data Collectors” option with a “Y”. On the Data Collectors panel, select the CICS+ option with a “Y”.

On the CICS Data Collector Options, select “Collect Region Data” with a “Y” and provide the name of the transaction in the “Detail by Transaction” section of the panel.

At this point in time, the measurement should be active. Monitor the measurement of completion and then generate a profile in the foreground. You can do this in batch but it is a relatively trivial operation so it can be performed quickly in the foreground.

After the profile is generated, the transaction summary (#TUS) report indicated that 599 seconds of the measurement were attributable to one transaction. Further, all of this time was attribute to a single program. STROBE was able to provide us with the actual CICS command “EXEC CICS DELAY” and the offset of the compile a listing. We were able to take this offset error look into the compile listing and identify that the line of command that was generating the delay was an

EXEC CICS
     DELAY FOR SECONDS (number_of_seconds>
END-EXEC

This proved that the delay in the process was intentional and not the result of a database or other resource contention.

Pro*C – PCC-F-02104, Unable to connect to Oracle

If you are attempting to compile Pro*C code and get the error “PCC-F-02104, Unable to connect to Oracle”, perform the following steps:

Instead of connecting to the Oracle database as

sqlplus userid/password@somelistener.domain.com:1521/MYPRODDB

and then attempting to compile, edit your tnsnames.ora to add the below entry (change appropriately to reflect your database SID or service, listener, port etc.)

MY_DB =
  (DESCRIPTION =
    (ADDRESS = (PROTOCOL = TCP)(HOST = SOMELISTENER.DOMAIN.COM)(PORT = 1521))
    (CONNECT_DATA =
      (SERVER = DEDICATED)
      (SID = MYPRODDB)
    )
  )

Export this entry as follows

export TWO_TASK=MY_DB

And now connect as

sqlplus userid/password

i.e. without specifying the usual connect string.

In case you are wondering about the name “TWO_TASK”, Tom Kyte explains it best:
“because in days gone by – the 1980’s – two tasks (having two tasks working together) just made ‘sense’”.

Copying a file from a Linux on-prem server to an AWS RDS Oracle instance

Today we had a requirement to copy a file from a Linux on-prem server to an AWS RDS Oracle instance. We did this using a DB link and utl_file utility. For the rest of this post, “ONP” will refer to the on-prem instance, and “ADS” will refer to the AWS RDS Oracle instance.

On the AWS RDS Oracle instance, create a DB link to the on-prem instance

create database link AWS_TO_ON_PREM CONNECT TO ONPUSER
IDENTIFIED BY SECRET-PASSWORD
USING '(DESCRIPTION =
    (ADDRESS_LIST =
      (ADDRESS = (PROTOCOL = TCP)(HOST = ONPREM.HOST.COM)(PORT = 1776))
    )
    (CONNECT_DATA =
      (SERVICE_NAME = ONPDB)
    )
  )'
   ;

On the AWS RDS Oracle instance, create a directory to store the file

exec rdsadmin.rdsadmin_util.create_directory('ADS_DIRECTORY');

Check that the directory got created

Select * from dba_directories;

As AWS manages the path internally, the above SQL will return the actual directory path.

On the on-prem instance, create a directory pointing to the location of the file that you want to copy to AWS

CREATE OR REPLACE DIRECTORY
FOR_AWS AS '/onprem/host/file/location/file_for_aws.bin';

GRANT READ, WRITE ON DIRECTORY FOR_AWS TO ONPUSER;

Now to copy the file from the on-prem instance to the AWS instance. This is accomplished via the below PL/SQL which is to be executed while connected to the AWS RDS Oracle instance. In our case, the data was binary (raw) data

declare
  ONP    utl_file.file_type@AWS_TO_ON_PREM;
  ADS    utl_file.file_type;
  ldata  raw(32767);
begin
  ONP := utl_file.fopen@AWS_TO_ON_PREM(location=>'FOR_ADS', filename=>'file_for_aws.bin', open_mode=>'rb');
  ADS := utl_file.fopen(location=>'ADS_DIRECTORY', filename=>'file_for_aws.bin', open_mode=>'wb');
  begin
    loop
      begin
        utl_file.get_raw@AWS_TO_ON_PREM(ONP, ldata, 32767);
        utl_file.put_raw(ADS, ldata, true);
      exception
        when no_data_found then
          exit;
      end;
    end loop;
  end;
  utl_file.fclose@AWS_TO_ON_PREM (ONP);
  utl_file.fclose(ADS);  
exception
  when others then
    utl_file.fclose@AWS_TO_ON_PREM(ONP);
    utl_file.fclose(ADS);
    raise;
end;
/

After the above is complete, check that the file size in AWS matches the file size from on-prem with this sql

select    filesize, mtime 
from      table(rdsadmin.rds_file_util.listdir('ADS_DIRECTORY')) 
where     filename='file_for_aws.bin'
;

If you need to delete a file in AWS, you can use this command

exec utl_file.fremove('directory-name','file-name');

Is it worth turning on compression?

I had a need to turn on auditing of SYS operations and FGA on some tables. Additionally, I preferred to have the audit destination be the DB. I had moved bot the SYS.AUD$ and the SYS.FGA_LOG$ out of SYSAUX to their own tablespace. After collecting some data, I was curious the see if compressing the tables would allow me to store more data. This was on a 12.1.0.2.0 – 64bit Unix environment.

Run the GET_COMPRESSION_RATIO procedures as follows:

SET SERVEROUTPUT ON
DECLARE
  l_blkcnt_cmp    PLS_INTEGER;
  l_blkcnt_uncmp  PLS_INTEGER;
  l_row_cmp       PLS_INTEGER;
  l_row_uncmp     PLS_INTEGER;
  l_cmp_ratio     NUMBER;
  l_comptype_str  VARCHAR2(32767);
BEGIN
  DBMS_COMPRESSION.get_compression_ratio (
    scratchtbsname  => 'USERS',
    ownname         => 'SYS',
    objname         => 'AUD$',
    subobjname      => NULL,
    comptype        => DBMS_COMPRESSION.comp_advanced,
    blkcnt_cmp      => l_blkcnt_cmp,
    blkcnt_uncmp    => l_blkcnt_uncmp,
    row_cmp         => l_row_cmp,
    row_uncmp       => l_row_uncmp,
    cmp_ratio       => l_cmp_ratio,
    comptype_str    => l_comptype_str,
    subset_numrows  => DBMS_COMPRESSION.comp_ratio_allrows,
    objtype         => DBMS_COMPRESSION.objtype_table
  );

  DBMS_OUTPUT.put_line('Number of blocks used (compressed)       : ' ||  l_blkcnt_cmp);
  DBMS_OUTPUT.put_line('Number of blocks used (uncompressed)     : ' ||  l_blkcnt_uncmp);
  DBMS_OUTPUT.put_line('Number of rows in a block (compressed)   : ' ||  l_row_cmp);
  DBMS_OUTPUT.put_line('Number of rows in a block (uncompressed) : ' ||  l_row_uncmp);
  DBMS_OUTPUT.put_line('Compression ratio                        : ' ||  l_cmp_ratio);
  DBMS_OUTPUT.put_line('Compression type                         : ' ||  l_comptype_str);
END;
/

The output looks like:

Number of blocks used (compressed)       : 252
Number of blocks used (uncompressed)     : 459
Number of rows in a block (compressed)   : 70
Number of rows in a block (uncompressed) : 38
Compression ratio                        : 1.8
Compression type                         : "Compress Advanced"

This indicates that we can store about twice the number of rows in the same amount of space.

Setting up Fine Grained Auditing (FGA)

I had a requirement to track the activity on a list of tables and decided to try of Fine Grained Auditing (FGA). This was done in a 12.1.0.2.0 – 64bit Unix environment.

Note that the SYS user is not audited by FGA.

Setting up FGA

Check the location of the audit trail

show parameter audit;

If not set to DB, change as follows (11g):

alter system set audit_trail='DB_EXTENDED' scope=SPFILE sid='*';

In 12c the command seems to have changed to:

alter system set audit_trail=DB,EXTENDED scope=SPFILE sid='*';

Bounce the database

The tables to contain the auditing data are created in SYSAUX. It is good practice to move them to their own table space.

SELECT   table_name
        ,tablespace_name 
FROM     dba_tables 
WHERE    owner='SYS' 
  AND    table_name IN ('AUD$','FGA_LOG$')
;

The standard auditing data is written into SYS.AUD$ but a better-formatted version is in table DBA_AUDIT_TRAIL. Similarily, SYS.FGA_LOG$ is best viewed from DBA_FGA_AUDIT_TRAIL.
Create a tablespace

CREATE TABLESPACE "AUDAUX" DATAFILE 
  '+DG_ORA1/datafile/audaux_1.dbf' SIZE 2G
  LOGGING ONLINE PERMANENT BLOCKSIZE 8192
  EXTENT MANAGEMENT LOCAL UNIFORM SIZE 524288 DEFAULT 
 NOCOMPRESS  SEGMENT SPACE MANAGEMENT AUTO
;

Turn off auto extend as this can fill up fast:

alter database datafile '+DG_ORA1/datafile/audaux_1.dbf' autoextend off;

Move SYS.AUD$ to the new tablespace with

BEGIN
  DBMS_AUDIT_MGMT.set_audit_trail_location(
    audit_trail_type           => DBMS_AUDIT_MGMT.AUDIT_TRAIL_AUD_STD,
    audit_trail_location_value => 'AUDAUX');
END;
/

Move SYS.FGA_LOG$ to the new tablespace with:

BEGIN
  DBMS_AUDIT_MGMT.set_audit_trail_location(
    audit_trail_type           => DBMS_AUDIT_MGMT.AUDIT_TRAIL_FGA_STD,
    audit_trail_location_value => 'AUDAUX');
END;
/

Create an audit policy as follows. In this case, I am auditing inserts, updates, and deletes on this table.

EXEC DBMS_FGA.ADD_POLICY
(
  object_schema=>'BILLING',
  object_name=>'INVENTORY_MASTER',
  policy_name=>'AUDIT_BILLING_INVENTORY_MASTER',
  statement_types=>'insert, update, delete'
)
;

If needed, the policy can be dropped with
EXEC DBMS_FGA.DROP_POLICY
(
object_schema=>’BILLING’,
object_name=>’INVENTORY_MASTER’,
policy_name=>’AUDIT_BILLING_INVENTORY_MASTER’
)
;

Check the current policies and their status

SELECT *
--object_schema,object_name,policy_name,policy_column,enabled,sel,ins,upd,del 
FROM   dba_audit_policies
;

The above example turned on FGA for all columns in the table. FGA can also be used for a specific column as shown below:

EXEC DBMS_FGA.ADD_POLICY
(
  object_schema=>'BILLING',
  object_name=>'INVENTORY_MASTER',
  policy_name=>'AUDIT_BILLING_INVENTORY_MASTER',
  audit_column=>'bonus'
  statement_types=>'insert, update, delete'
)
;

The columns that are part of an FGA policy can be checked with:

SELECT  * 
FROM    dba_audit_policy_columns;

The information being collected by the FGA can be viewed with:

ALTER SESSION SET nls_date_format='dd-Mon-yyyy hh24:mi:ss';

select    *
FROM      DBA_FGA_AUDIT_TRAIL
order by  timestamp
;

Cleaning up FGA
Please read until the end before running these commands:

Check if the DBMS_AUDIT_MGMT.INIT_CLEANUP has already been run on the database:

SET serveroutput ON SIZE 999999
BEGIN
  IF DBMS_AUDIT_MGMT.IS_CLEANUP_INITIALIZED(DBMS_AUDIT_MGMT.AUDIT_TRAIL_DB_STD)
  THEN
    DBMS_output.put_line('True');
  ELSE
    DBMS_output.put_line('False');
  END IF;
END;
/

If the response is “FALSE”, run the DBMS_AUDIT_MGMT.INIT_CLEANUP as follows:

In this example, the cleanup is being set to run every 1 hour

BEGIN
  DBMS_AUDIT_MGMT.init_cleanup(
    audit_trail_type         => DBMS_AUDIT_MGMT.AUDIT_TRAIL_ALL,
    default_cleanup_interval => 1 /* hours */);
END;
/

If you get an error such as:

Error starting at line : 7 in command -
BEGIN
  DBMS_AUDIT_MGMT.init_cleanup(
    audit_trail_type         => DBMS_AUDIT_MGMT.AUDIT_TRAIL_ALL,
    default_cleanup_interval => 24 /* hours */);
END;
Error report -
ORA-46263: The audit trail is already initialized for cleanup
ORA-06512: at "SYS.DBMS_AUDIT_MGMT", line 61
ORA-06512: at "SYS.DBMS_AUDIT_MGMT", line 757
ORA-06512: at line 2

Uninitialize the cleanup with:

BEGIN
 DBMS_AUDIT_MGMT.DEINIT_CLEANUP(
  AUDIT_TRAIL_TYPE  => DBMS_AUDIT_MGMT.AUDIT_TRAIL_AUD_STD);
END;
/

Followed by the above command to initialize the cleanup.

I spent some time looking into DBMS_SCHEDULER, DBMS_AUDIT_MGMT.SET_LAST_ARCHIVE_TIMESTAMP and DBMS_AUDIT_MGMT.AUDIT_TRAIL_AUD_STD etc. but it seemed overly complicated for what the goal was. The same can be accomplished with a cron script:

DELETE 
FROM   SYS.fga_log$ 
WHERE  NTIMESTAMP# < to_date('21-SEP-20 18.42.00','dd-MON-yy hh24:mi:ss')
;

Remember to gather statistics and/or add indices to improve performance

exec dbms_stats.gather_table_stats (ownname=>'SYS', tabname=>'AUD$' , estimate_percent=>10, cascade=>TRUE, degree=>5);

exec dbms_stats.gather_table_stats (ownname=>'SYS', tabname=>'FGA_LOG$' , estimate_percent=>10, cascade=>TRUE, degree=>5);

printf wrapping/tabbing on spaces in variables

If you need to print the contents of variables that happen to contain spaces via printf you would typically code as follows:

#!/bin/bash
export var_01='This is variable 01'
export var_02='continuing in variable 02'
printf '%-30s %-30s\n' $var_01 $var_02 >> temp.txt

However, this will output as follows:

This                           is                            
variable                       01                            
continuing                     in                            
variable                       02   

Instead of the expected:

This is variable 01            continuing in variable 02   

To prevent this, enclose the variables in double quotes:

#!/bin/bash
export var_01='This is variable 01'
export var_02='continuing in variable 02'
printf '%-30s %-30s\n' "$var_01" "$var_02" >> temp.txt

Storing the result from a v$ table into a variable in a script

Typically while selecting from a V$ table in a Unix script, I have to escape the “$” character as shown below:

sqlplus -s $sqlplus_command << EOF > backup_check_temp.txt
  select  value 
  from    V\$RMAN_CONFIGURATION 
  where   NAME = 'RETENTION POLICY'
  ;  
  exit;
EOF

The above will successfully run the SQL and write the output in the spool file.

However, if instead of the output in a spool file, you want the output in a Unix variable for further processing, you would change the code to

export ret_period=`sqlplus -s $sqlplus_command << EOF
                   select   value 
                   from     V\$RMAN_CONFIGURATION 
                   where    NAME = 'RETENTION POLICY'
                   ;
EOF`
if  [ -z "$ret_period" ]; then
    export ret_period='Not found'
fi    

This will fail with:

+ export 'ret_period=                select value from V where NAME = '\''RETENTION POLICY'\''
                                       *
ERROR at line 1:
ORA-00942: table or view does not exist'

The reason is that the second method of coding results in two levels of interpretation, each of which requires the “$” character to be escaped. To get the code working, it has to be coded as follows (note the two “\\” escape characters):

export ret_period=`sqlplus -s $sqlplus_command << EOF
                   select   value 
                   from     V\\$RMAN_CONFIGURATION 
                   where    NAME = 'RETENTION POLICY'
                   ;
EOF`
if  [ -z "$ret_period" ]; then
    export ret_period='Not found'
fi    

Flashback Data Archive

We had a requirement to track how, when, and who made changes to the data in some of our sensitive tables. In the past, I would accomplish this by creating a trigger that would create before-and-after images into an audit table along with some other information such as the time of the change, the current user making the change, the IP address of the user making the change, etc. With 11g, Oracle introduced “Total Recall” that was subsequently renamed to Flashback Data Archive (FDA or FBA) in 12c. This new feature allows for long-term storage of “undo” data in a separate area known as the flashback archive.

Since my target database was on Oracle 12c, I decided to try out this new technology as opposed to a trigger solution.

Setup steps

Create a TABLESPACE and associated data file

  CREATE TABLESPACE "FDA" DATAFILE 
  '+DG_ORA1/datafile/fda_01.dbf' SIZE 1G
  NOLOGGING ONLINE PERMANENT BLOCKSIZE 8192
  EXTENT MANAGEMENT LOCAL AUTOALLOCATE DEFAULT 
 NOCOMPRESS  SEGMENT SPACE MANAGEMENT AUTO;

Create the flashback archive pointing to the above TABLESPACE. Additionally, set the quota and retention

CREATE FLASHBACK ARCHIVE DEFAULT OA_FDA TABLESPACE FDA
  QUOTA 1G RETENTION 30 day;

If you add more space to the TABLESPACE associated with the flashback archive, remember to increase the quota assigned above to the flashback archive. It is important to allocate enough space to this TABLESPACE as running out of space here will also impact the operations on the base tables.

In order to capture additional information about the user account, terminal, program, etc. that made the change, set the context level as shown below. There are other options to the level, however, I found “ALL” to be a good choice.

exec dbms_flashback_archive.set_context_level(level=> 'ALL');

Grant the schema owner access to the flashback archive

ALTER USER ACCT_OWNER QUOTA UNLIMITED ON FDA;

add the tables that you want to monitor to the flashback archive

ALTER TABLE "ACCT_OWNER"."ACCT_MASTER"  FLASHBACK ARCHIVE OA_FDA;

For every table that is added to the flashback archive, Oracle creates three additional tables named

SYS_FBA_DDL_COLMAP_91245
SYS_FBA_HIST_91245
SYS_FBA_TCRV_91245

I am not exactly sure how the number is assigned for each table added to the flashback archive but it will get a number and these three tables. These tables are created in the flashback data archive TABLESPACE created at the beginning of this process.

You can list the tables that are currently in the flashback data archive with the following SQL statement

SELECT     *
FROM       dba_flashback_archive_tables
ORDER BY   owner_name, table_name
;

Viewing the information if you have the key columns

The following SQL statement will display the change to a given row of table. Note that the SQL statement is run on the actual table and the optimizer uses the indexes on the actual table hence code the predicates appropriately

SELECT 
         versions_startscn
        ,versions_starttime
        ,versions_endtime
        ,versions_xid
        ,case versions_operation
              when 'D' then 'Delete'
              when 'I' then 'Insert'
              when 'U' then 'Update'
              else          versions_operation 
         end as Action    
        ,DBMS_FLASHBACK_ARCHIVE.get_sys_context
              (versions_xid, 'USERENV','SESSION_USER') AS session_user
        ,DBMS_FLASHBACK_ARCHIVE.get_sys_context
              (versions_xid, 'USERENV','HOST') AS host
        ,DBMS_FLASHBACK_ARCHIVE.get_sys_context
              (versions_xid, 'USERENV','module') AS module
        ,DBMS_FLASHBACK_ARCHIVE.get_sys_context
              (versions_xid, 'USERENV','os_user') AS os_user
        ,DBMS_FLASHBACK_ARCHIVE.get_sys_context
              (versions_xid, 'USERENV','ip_address') AS ip_address
        ,DBMS_FLASHBACK_ARCHIVE.get_sys_context
              (versions_xid, 'USERENV','AUTHENTICATED_IDENTITY') 
               as AUTHENTICATED_IDENTITY
        ,DBMS_FLASHBACK_ARCHIVE.get_sys_context
              (versions_xid, 'USERENV','DATABASE_ROLE') as test
        ,RO.*
FROM    "ACCT_OWNER"."ACCT_MASTER"  versions between 
                 timestamp (systimestamp - interval '1' day) and 
                 systimestamp RO
WHERE   key_value_col = 1622310
  and   versions_xid is not null
  order by 
        versions_starttime desc
;

Note:
a. that the table is aliased to RO at the end of the FROM clause (i.e. after the versions between part of the SQL). This allows me to see all the columns with RO.*.

b. in some cases I saw rows where the versions_xid was not populated and this causes the DBMS_FLASHBACK_ARCHIVE.get_sys_context to fail.

c. if you attempt to read data before any changes have been made and written to the flashback archive, you get a null error. Adjust the time period specified in the versions above.

d. there is a slight delay between modifying the table and seeing it in the flashback data archive. Changes to the flashback archive are written by a new process called the “fbda”. This is not a significant issue and I assume I was only able to notice it because I was testing on a quiet system where I was the only user.

Viewing the information if you DO NOT have the key columns
The above SQL will answer the question of who modified the data in a particular row. If you are attempting to identify all the changes that were made to a given table (i.e. you do not have key column values), the above SQL will not return in any reasonable amount of time. The issue is that each row in the result set initiates three calls to DBMS_FLASHBACK_ARCHIVE.get_sys_context for every column that uses the function. For example, if there are 100 rows that are in the flashback archive, and I am using the get_sys_context column 7 times in the above SQL, there will be (1000 rows * 7 columns * 3 calls) = 21,000 calls to DBMS_FLASHBACK_ARCHIVE.get_sys_context.

Also, note that the versions clause will list all of the rows that exist in the specified timeframe and not the rows that were modified in that timeframe. This will be discussed later below.

As a result of these gotchas, the below SQL performs better if you’re searching for all the rows that are modified a a given time

SELECT 
         versions_startscn
        ,versions_starttime
        ,versions_endtime
        ,versions_xid
        ,case versions_operation
              when 'D' then 'Delete'
              when 'I' then 'Insert'
              when 'U' then 'Update'
              else          versions_operation 
         end as Action    
        ,RO.*
FROM    "ACCT_OWNER"."ACCT_MASTER"  versions between 
                 timestamp (systimestamp - interval '1' day) and 
                 systimestamp RO
  order by 
        versions_starttime desc
;

The above query assumes that you want to see the changes in chronological order descending. If you want to see the information by a key column, then order by that column. After you have identified the particular row that you’re interested in, use the key column value and run the previous SQLs with the DBMS_FLASHBACK_ARCHIVE.get_sys_context columns.

Altering a table – Adding a column
If you add a column, the rows that existed in the flashback archive for the have values of now with the new column which is pretty much expected behavior for any table.

Altering a table – Modifying a column
If you modify an existing column of the table that is currently in the flashback archive, you will see a new column named something like M_50389_old_column_name together with the previous column.

Dropping a table that is part of the flashback archive
Before dropping a table that is part of the flashback archive, you need to remove it from the flashback data archive otherwise you will get the error

drop table "ACCT_OWNER"."ACCT_MASTER_DEAN"
Error report -
ORA-55610: Invalid DDL statement on history-tracked table
55610. 00000 -  "Invalid DDL statement on history-tracked table"
*Cause:    An attempt was made to perform certain DDL statement that is
           disallowed on tables that are enabled for Flashback Archive.
*Action:   No action required.

Remove the table from the flashback archive as follows

ALTER TABLE "ACCT_OWNER"."ACCT_MASTER"         NO FLASHBACK ARCHIVE;

and then it can be dropped.

Any attempt to read the flashback data for a table that has been drop results in the error

ORA-01466: unable to read data - table definition has changed
01466. 00000 -  "unable to read data - table definition has changed"
*Cause:    Query parsed after tbl (or index) change, and executed
           w/old snapshot
*Action:   commit (or rollback) transaction, and re-execute

Rows that exist vs. rows that were modified in the versions interval

When I started this exercise, I assumed that a query based on versions between timestamp “A” and timestamp “B” would only show me the rows that were modified between these two timestamps. However, my queries would return rows that had been modified outside of this range. This is documented at Oracle support in Doc id 2686158.1 which states “the VERSIONS BETWEEN clause is used to qualify rows which “existed” in the time range, and it is not something used to qualify rows which were “inserted” in the time range”. The column versions_starttime indicates the time of the actual change.

Performance implications
In a quiet system, I ran a workload of 10,000 inserts, 10,000 updates followed by 10,000 deletes. Each SQL statement was followed by a commit in an attempt to simulate an OLTP workload. After the AWR snapshot had rolled over to a new snapshot interval, I turned on the flashback archive and ran the same workload. An AWR difference report did not show any noticeable difference in resource consumption. However, this could have been due to the small size of my workload. Users testing in a somewhat busy nonproduction environment were also unable to identify any perceptible increase in elapsed time.

Locking
I naively assumed that since the rows were being written to the flashback archive via the FBDA process there would be no locking. However, we did see instances of locking. A colleague found the below documents on Oracle support

2423880.1 - FBDA: Massive Performance Problem After 
            enabling FDA (Doc ID 2423880.1)
            Request patch for bug 22239699
 
22239699  - Bug - FBDA TCRV delete maintenance colliding with 
                  archiving delete causing TX contention 
 
2344671.1 - Flashback Data Archive: Error ORA-1 and ORA-60 In 
            SQL Merge Into SYS_FBA_TCRV DEADLOCK ERRORS
 
2370465.1 - FDA - Flashback Data Archive Usage and Best 
            Practices (a.k.a. Total Recall)

Summary
In summary, it was relatively easy to set up the flashback archive and add or remove tables. Volume testing in a nonproduction environment has not shown any increase in resource consumption. Ensuring that there is enough space in the flashback archive table space is very important as running out of space impacts the application.

DR test with Snapshot Standby

This blog post contains the steps required to convert the physical standby to an open (read/write) database and allow the application to perform a DR test without impacting the primary database and the users.

This test was conducted on a four-node primary RAC with a four-node physical standby, both running Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 – 64bit Production.

We did not make any changes to the primary. As a result, users were able to work on the primary and it was able to ship logs to that standby. However, as we reserved all the space on the standby for DB_RECOVERY_FILE_DEST_SIZE the primary was unable to ship and issued the below error messages. For next time, DO NOT reserve all the space for DB_RECOVERY_FILE_DEST_SIZE. Instead, maybe use 50% or 60% of the available space and allow the archive logs from the primary to continue to land on the standby.

These errors were reported in the alert logs of the primary database because we had reserved all the space on the standby:

Tue Jun 09 12:33:44 2020
TT00: Standby redo logfile selected for thread 4 sequence 144397 for destination LOG_ARCHIVE_DEST_2
Tue Jun 09 12:35:13 2020
TT00: Attempting destination LOG_ARCHIVE_DEST_2 network reconnect (3135)
TT00: Destination LOG_ARCHIVE_DEST_2 network reconnect abandoned
Tue Jun 09 12:35:13 2020
Errors in file /oracle_home/product/diag/rdbms/prprimy/PRPRIMY4/trace/PRPRIMY4_tt00_53355.trc:
ORA-03135: connection lost contact
Error 3135 for archive log file 8 to 'PRSTNBY'

All of the below commands/SQLs were run on the standby RAC.

Check the recovery file destination:

SELECT  NAME,FLOOR (space_limit/1024/1024) "Size MB",CEIL (space_used/1024/1024) "Used MB"
FROM    v$recovery_file_dest
;

NAME                              Size MB    Used MB
------------------------------ ---------- ----------
+DG_ARCHIVE                          5535        408

Confirm space in ASM:

ASMCMD [+] > lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  EXTERN  N         512   4096  1048576    208896   204136                0          204136              0             N  DG_ARCHIVE/

Modify the below parameters:

alter system set DB_RECOVERY_FILE_DEST_SIZE= '190G' scope=both sid='*';
alter system set DB_RECOVERY_FILE_DEST='+DG_ARCHIVE' scope=both sid='*';
alter system set undo_retention = 21600 scope=both sid='*';

Stop managed recovery process:

alter database recover managed standby database cancel;

Stop the database via srvctl:

srvctl stop database –d PRSTNBY –o immediate

In older versions of Oracle, we used to turn on flashback and activate the standby database. This is no longer required because the CONVERT TO SNAPSHOT STANDBY will turn on flashback. At this point the value of flashback will be:

select FLASHBACK_ON from gv$database;

FLASHBACK_ON
------------------
NO
NO
NO
NO

alter database flashback on;    <--- this was deemed not needed, so we didn't execute this
select flashback_on from gv$database;
select * from gv$tablespace where FLASHBACK_ON='NO';

On the standby, bring up one node in mount status to convert to snapshot:

startup nomount
alter database mount standby database;
select name, guarantee_flashback_database from gv$restore_point;

At this point, the above SQL will say no rows selected

Convert to snapshot:

ALTER DATABASE CONVERT TO SNAPSHOT STANDBY;

Check the flashback status:

select name, guarantee_flashback_database from gv$restore_point;

NAME                                            GUA
---------------------------------------------   ---
SNAPSHOT_STANDBY_REQUIRED_06/09/2020 12:35:05   YES


select flashback_on from gv$database;

FLASHBACK_ON
------------------
RESTORE POINT ONLY

Sanity check - Stop here !! -- verify that restore point EXISTs, then open database

Try to open the one node that we are working on to make sure that the open works successfully:

ALTER DATABASE OPEN;

select status, instance_name, database_role, open_mode from gv$database, gv$instance;
STATUS       INSTANCE_NAME    DATABASE_ROLE    OPEN_MODE
------------ ---------------- ---------------- --------------------
MOUNTED      PRSTNBY4         SNAPSHOT STANDBY MOUNTED

Shutdown immediate as we had only started one node to convert to snapshot

SHUTDOWN IMMEDIATE;

Bring up the database via SRVCTL in mount mode

srvctl start database –d PRSTNBY -o mount

Log on to each node (instance) and open that instance

ALTER DATABASE OPEN;

Inform users to point their application server to the former standby database and begin the DR Exercise.

After the applications inform us that they have completed their DR Exercise, continue with the below steps.

Stop the database via SRVCTL:

srvctl stop database –d PRSTNBY –o immediate

Confirm all nodes are down

srvctl status database –d PRSTNBY

Bring up one node to convert back to physical standby (Reduce the DB_RECOVERY_FILE_DEST_SIZE else we will face the error:

Waiting for all non-current ORLs to be archived...
Tue Jun 09 17:50:17 2020
Waiting for the ORL for thread 1 sequence 1 to be archived...
ARC3: Archiving not possible: error count exceeded

as the archive destination is being reserved for the DB_RECOVERY_FILE_DEST_SIZE):

startup mount
ALTER DATABASE CONVERT TO PHYSICAL STANDBY;
alter database mount standby database;
alter system set DB_RECOVERY_FILE_DEST_SIZE= '5535M' scope=both sid='*';
alter system set undo_retention = 14400 scope=both sid='*';
shutdown immediate;

Start the standby via SRVCTL

srvctl start database –d PRSTNBY

Confirm that MRP is running and log apply is progressing.

Issues with multiple outer joins causing high elapsed time

Apologies in advance as I cannot share the original query as the information is proprietary. I hope to be able to explain as much as possible to make it useful.

One of our application had converted from Sybase to Oracle Database 18c Enterprise Edition Release 18.0.0.0.0. This (obfuscated) query used to run in under 2 seconds in Sybase but was taking between 15 to 25 seconds to execute in Oracle. The 25 second result tended to be on the first execution with subsequent executions running in between 15 to 17 seconds. I put aside the 25 second result as being attributable to the creation of the access path and focused on the 15 to 17 second executions.

Original query

SELECT 
       many,
       columns,
       from,
       all,
       the,
       joined,
       tables
FROM   customer_order ao
       full outer join sales_notes sn
                    ON ao.order_id = sn.order_id
       full outer join order_contacts poc
                    ON ao.order_id = poc.order_id
       full outer join contact c
                    ON poc.contact = c.contact,
       site_order so
       full outer join order_info oori
                    ON so.site = oori.site
       full outer join tax_info t
                    ON so.site = t.site
                       AND so.order_id = t.order_id
       full outer join order_notes son
                    ON so.site = son.site
       full outer join tax_notes tn
                    ON so.site = tn.site
       full outer join tax_order_detail tod
                    ON so.site = tod.site
       full outer join tax_sub_order tso
                    ON so.site = tso.site
       full outer join other_vax_order ovo
                    ON so.site = ovo.site
       address a
WHERE  so.site          IS NOT NULL
  AND  so.site          = 'some value here'
  AND  so.site          = ao.site
  AND  so.acct_order_id = ao.acct_order_id
  AND  so.site          = a.site
ORDER  BY 
       so.site,
       so.acct_order_id 
;

All of these tables were below 100k rows with no complexity of LOB columns or anything out of the ordinary. The cost estimate for this SQL was 365,047,136.

The filter predicates in the where clause on the so.site will always result in less than 10 rows. Site is the primary index on the site table. All the other tables have an index to satisfy the join columns. The tunning advisor reported that no better plan was available.

In order to understand what was occurring, I ran an explain on the below SQL:

SELECT 
       many,
       columns,
       from,
       all,
       the,
       joined,
       tables
FROM   customer_order ao,
       site_order so,
       address a
WHERE  so.site          IS NOT NULL
  AND  so.site          = 'some value here'
  AND  so.site          = ao.site
  AND  so.acct_order_id = ao.acct_order_id
  AND  so.site          = a.site
ORDER  BY 
       so.site,
       so.acct_order_id 
;

The optimizer estimated the cost at 9 and the query ran consistently in less than a second. I then experimented with adding the other FULL OUTER JOIN tables one by one. The addition of any two tables as FULL OUTER JOINS did not impact the query cost. Additionally, these other two tables would be joined as NESTED LOOP joins. The addition of a third table as a FULL OUTER JOIN would raise the query cost significantly and also use a hash join.

After some trial and error, I added in the optimizer hint:
Modified query

SELECT /*+ USE_NL(so ovo) */
       many,
       columns,
       from,
       all,
       the,
       joined,
       tables
FROM   customer_order ao
       full outer join sales_notes sn
                    ON ao.order_id = sn.order_id
       full outer join order_contacts poc
                    ON ao.order_id = poc.order_id
       full outer join contact c
                    ON poc.contact = c.contact,
       site_order so
       full outer join order_info oori
                    ON so.site = oori.site
       full outer join tax_info t
                    ON so.site = t.site
                       AND so.order_id = t.order_id
       full outer join order_notes son
                    ON so.site = son.site
       full outer join tax_notes tn
                    ON so.site = tn.site
       full outer join tax_order_detail tod
                    ON so.site = tod.site
       full outer join tax_sub_order tso
                    ON so.site = tso.site
       full outer join other_vax_order ovo
                    ON so.site = ovo.site
       address a
WHERE  so.site          IS NOT NULL
  AND  so.site          = 'some value here'
  AND  so.site          = ao.site
  AND  so.acct_order_id = ao.acct_order_id
  AND  so.site          = a.site
ORDER  BY 
       so.site,
       so.acct_order_id 
;

And the cost went down to around 70 with the query consistently performing in under two seconds. I just happened to be testing the order of joins with the table aliased as OVO and hence the hint lists OVO. Almost any other combination results in the same (i.e. improved) access path.

142
51
73
77