Mysterious contention!

One of our applications began to experience ORA-02049: timeout: distributed transaction waiting for lock errors during their processing. The application was connecting via a service name to a four node RAC cluster. The basic functionality of the process was to read an input file and insert the data into multiple tables in Oracle. The tables where interconnected via multiple referential integrity constraints.

This was reproducible as it was happening every time the process started. We rebuilt all the associated indices, tried the locking insert as a singleton insert via the command line etc. and were not able to resolve or identify the issue. We looked in all the usual places to identify blocking sessions and were never able to identify the blocking session. The information from gv$session looked something like shown below. The unusual thing was that there was never any blocking session at any point during the contention periods.

ncps01

 

We suspected that this may be some kind of global cache contention but were not able to see any in the AWR reports for the period. The AWR report indicated that the database was spending its time in the “enq:TX – row lock contention”.

ncps02

 

In order to continue processing, we modified the service to only run on the first node of the RAC, killed all the sessions that are currently connected basically forcing the application to reconnect all the sessions and normal processing was resumed. Within less than 20 hours we were able to completely eliminate the backup of input files that resulted from the contention previously experienced.

We opened an SR with Oracle and after providing all the trace logs etc. there and found that beaver probably encountering bug are hitting Bug 13361419 : ‘ENQ: TX – ROW LOCK CONTENTION’ WHEN REF INTEGRITY SPANS BRANCHES. From the bug:

“Contention associated with referential integrity between tables when activity associated with pk / fk relationship goes down separate branches of the cluster and the fk branch is blocked : ‘enq: TX – row lock contention’.

This issue was not observed with non RAC environment .”

As of now we are continuing to process with the service pointed at only the first node of the RAC.

Author: Dean Capps

Database consultant at Amazon Web Services.

142
51
73
77