ORA-15032, ORA-15017 and ORA-15040 issues with an ASM disk group

While creating a new Oracle database with ASM on an AWS EC2 platform I encountered the below errors while attempting to mount one of the ASM disk groups:

ALTER DISKGROUP VOTE_DISK mount
*
ERROR at line 1:
ORA-15032: not all alterations performed
ORA-15017: diskgroup "VOTE_DISK" cannot be mounted
ORA-15040: diskgroup is incomplete

When I checked the v$asm_diskgroup, I saw

SQL> select NAME,TOTAL_MB,FREE_MB from v$asm_diskgroup;  

NAME                             TOTAL_MB    FREE_MB
------------------------------ ---------- ----------
ASMDATA1                            716796     716726
DBVOT                                    0          0
ASMDATA2                            716796     716726
ASMARCH                             716796     716726

The goal was to create the VOTE_DISK diskgroup on the DBVOT volume that was in turn mapped to the /dev/bqdx disk. The AWS Console indicated that the /dev/bqdx was attached to the EC2 but was not visible in the OS for some reason. I issued the command:

/usr/bin/echo -e "o\nn\np\n1\n\n\nt\n8e\nw" | /usr/sbin/fdisk /dev/bqdx

Which resulted in:

Welcome to fdisk (util-linux 2.23.2).

Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.


Command (m for help): Building a new DOS disklabel with disk identifier 0x29ba6b8d.

Command (m for help): Partition type:
   p   primary (0 primary, 0 extended, 4 free)
   e   extended
Select (default p): Partition number (1-4, default 1): First sector (2048-104857599, default 2048): Using default value 2048
Last sector, +sectors or +size{K,M,G} (2048-104857599, default 104857599): Using default value 104857599
Partition 1 of type Linux and of size 50 GiB is set

Command (m for help): Selected partition 1
Hex code (type L to list all codes): Changed type of partition 'Linux' to 'Linux LVM'

Command (m for help): The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

However, when I attempted to create the disk I got the error:

root@10-118-134-71:/root # /usr/sbin/oracleasm createdisk DBVOT /dev/bqdx1
Device "/dev/bqdx1" is already labeled for ASM disk "DBVOT"

I then attempted to drop the disk group

SQL> drop diskgroup DBVOT;
drop diskgroup DBVOT
*
ERROR at line 1:
ORA-15039: diskgroup not dropped
ORA-15001: diskgroup "DBVOT" does not exist or is not mounted

I then tried with the FORCE option and succeeded

SQL> drop diskgroup DBVOT force including contents;

Diskgroup dropped.

I then deleted the disk

root@10-118-134-71:/root # /usr/sbin/oracleasm deletedisk DBVOT 
Clearing disk header: done
Dropping disk: done

and checked to make sure it was no longer visible

[oracle@10-118-134-71 ~]$ /usr/sbin/oracleasm listdisks
DB1
DG10
DG11
DG12
DG2
DG3
DG4
DG5
DG6
DG7
DG8
DG9

Then I recreated the disk

root@10-118-134-71:/root # /usr/sbin/oracleasm createdisk DBVOT /dev/bqdx1
Writing disk header: done
Instantiating disk: done

and listed the disks to confirm

root@10-118-134-71:/root # /usr/sbin/oracleasm listdisks
DG1
DG10
DG11
DG12
DG2
DG3
DG4
DG5
DG6
DG7
DG8
DG9
DBVOT

Then I created the disk in ASM

SQL> CREATE DISKGROUP DBVOT EXTERNAL REDUNDANCY DISK '/dev/oracleasm/disks/DBVOT';

Diskgroup created.

TNS-12535, ns secondary err code: 12560 and users unable to log on

This afternoon the users of one of my applications reported that they were unable to sign on using a particular user account. The user account in question was open and unexpired. The users were able to sign on to other accounts without any issues. The alert logs on all nodes of the RAC were being flooded with messages such as:

Fatal NI connect error 12170.

  VERSION INFORMATION:
        TNS for Linux: Version 11.2.0.4.0 - Production
        Oracle Bequeath NT Protocol Adapter for Linux: Version 11.2.0.4.0 - Production
        TCP/IP NT Protocol Adapter for Linux: Version 11.2.0.4.0 - Production
  Time: 15-JUN-2018 19:17:55
  Tracing not turned on.
  Tns error struct:
    ns main err code: 12535
    
TNS-12535: TNS:operation timed out
    ns secondary err code: 12560
    nt main err code: 505
    
TNS-00505: Operation timed out
    nt secondary err code: 110
    nt OS err code: 0
  Client address: (ADDRESS=(PROTOCOL=tcp)(HOST=xxx.xxx.xxx.xx)(PORT=nnn))
Fri Jun 15 19:18:37 2018

Despite bouncing all nodes of the RAC and flushing the shared pool etc. the users were not able to connect via this particular user account. There were existing connections with this account that were working.

After some false leads, a colleague noticed that there were a number of library cache locks with the user account set to “null”. These were being generated by a set of new servers that the application was attempting to configure using the user account that was experiencing the issue. After these new servers were shut down and the database bounced, normal functionality was restored.

Applying an incremental copy to make a physical standby current

One of our standby RACs had fallen behind in the log apply process:

DESTINATION           THREAD#    APPLIED   NOTAPPLD      DELTA
------------------ ---------- ---------- ---------- ----------
1 - From Primary            1       1187       1231         44
                            2       1520       1572         52
******************            ---------- ---------- ----------
Hash                                2707       2803         96

MRP was waiting on the below log number

   INST_ID PROCESS   STATUS          THREAD#  SEQUENCE#     BLOCK#
---------- --------- ------------ ---------- ---------- ----------
         2 MRP0      WAIT_FOR_GAP          2       1521          0

This log file was no longer available on the primary and hence I had to use an incremental copy to get the standby in synch with the primary. At this point I cancelled MRP on the standby, shut down the RAC via SRVCTL and started node 2 as a standalone in mount mode.

On the primary, I identified the first change number of the n-1th log that the standby had identified was missing. Using this change number, I created an incremental backup:

backup incremental from scn 14521210213 database format '/backup/dean/backup_%U.bak';

I also created a backup of the control file:

backup current controlfile for standby format '/backup/dean/stnd_ctl_file.ctl';

These files were transferred to the standby via FTP commands.

After the files are transferred to the standby, sign into SQLPLUS and stop MRP with

alter database recover managed standby database cancel;

Stop all nodes of the RAC with

srvctl status databs -d PRODDB
srvctl stop database -d PRODDB -o immediate

Bring up the database as a single node in SQLPLUS

startup nomount;
Alter database mount standby database;

Exit from SQLPLUS and go start RMAN and catalog the directory where the backups were FTPed with:

catalog start with '/backup/dean';

And then recovered the database with:

recover database noredo;

After the recover was complete, I restored the control file:

shutdown immediate;
startup nomount;
restore standby controlfile from '/backupb/dean/stnd_ctl_file.ctl';

Since the data file locations on the standby were different from the primary, the standby control file has to be updated. Start the database in mount mode:

alter database mount standby database;

At this point, the instance thinks the data file locations are those of the primary as the control file that was restored was from the primary. Switch the data file locations in RMAN with the below command (The catalog command should point to the data file locations on the standby.):

catalog start with '+DG_CLU_ORA1/D99Xsn';
switch database to copy;

In my experience, the “switch database to copy;” will switch the undo, sysaux, users and system data files successfully. The others (i.e. application-specific database files) tend to fail with the below error:

RMAN> switch database to copy;

RMAN-00571: ===========================================================
RMAN-00569: =============== ERROR MESSAGE STACK FOLLOWS ===============
RMAN-00571: ===========================================================
RMAN-03002: failure of switch to copy command at 08/27/2018 17:40:53
RMAN-06571: datafile 41 does not have recoverable copy

Check with REPORT SCHEMA to ensure that the data file exists on the standby with:

RMAN>  report schema;

RMAN-06139: WARNING: control file is not current for REPORT SCHEMA
Report of database schema for database with db_unique_name IPTSDV

List of Permanent Datafiles
===========================
File Size(MB) Tablespace           RB segs Datafile Name
---- -------- -------------------- ------- ------------------------
1    0        SYSTEM               ***     +DG_CLU_ORA1/D99Xsn/system.259.841589723
2    0        SYSAUX               ***     +DG_CLU_ORA1/D99Xsn/sysaux.260.841589731
.
.
.
41   1024     emp_stats_DAT_001_DC  ***     +DG_CLU_ORA1/D99Xsn/emp_stats_dat_001_dc.2405.983050989

If the file does exist, catalog it with:

RMAN> catalog datafilecopy '+DG_CLU_ORA1/D99Xsn/emp_stats_dat_001_dc.2405.983050989';

cataloged datafile copy
datafile copy file name=+DG_CLU_ORA1/D99Xsn/emp_stats_dat_001_dc RECID=197 STAMP=985283856

The switch database to copy should work after that

RMAN> switch database to copy;

datafile 1 switched to datafile copy "+DG_CLU_ORA1/D99Xsn/system.1853.983049983"
datafile 2 switched to datafile copy "+DG_CLU_ORA1/D99Xsn/sysaux.1793.983050037"
datafile 3 switched to datafile copy "+DG_CLU_ORA1/D99Xsn/undotbs1.1805.983050783"
.
.
.
datafile 41 switched to datafile copy "+DG_CLU_ORA1/D99Xsn/emp_stats_dat_001_dc.2405.983050989"

After this, I started MRP on the last node with:

alter database recover managed standby database using current logfile disconnect from session;

The standby database was now current with the primary

DESTINATION           THREAD#    APPLIED   NOTAPPLD      DELTA
------------------ ---------- ---------- ---------- ----------
1 - From Primary            1       1233       1234          1
                            2       1575       1575          0
******************            ---------- ---------- ----------
Hash                                2808       2809          1

To make sure everything was perfect, I cut a log on the primary and monitored to ensure that was applied on the standby.

Another issue I faced was that the log files were pointing to the primary’s log file location. This can be changed by running the output of the below SQL:

set line 200
select 'ALTER DATABASE RENAME FILE ' || '''' || member || '''' || ' to ' || '''' || replace(member,'primary_log_location','standby_log_location') || '''' || ';'
from v$logfile where member like '%primary db name%';

Output will look like:

ALTER DATABASE RENAME FILE '+DG1/primary/ONLINELOG/group_2_a.log' to '+DG3/standby/ONLINELOG/group_2_a.log';

Splitting a partition

Encountered a strange setup wherein a table was created as partitioned but only with a max partition resulting in basically a single partition table. Extract of the ORIGINAL DDL was as follows:

  CREATE TABLE “SCHEMA”.”TABLE_NAME” 
   (	"VERSION_NUM" VARCHAR2(5 BYTE), 
       .
       .
       .    
	"PARTITION_COLUMN" DATE, 
	.
      .
 (PARTITION "PARTITION_MAX" 
  TABLESPACE "DE_SUSP_IND_001" )  ENABLE
   ) PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 
  STORAGE(
  BUFFER_POOL DEFAULT FLASH_CACHE DEFAULT CELL_FLASH_CACHE DEFAULT)
  TABLESPACE "TABLE_DAT_001" 
  PARTITION BY RANGE ("PARTITION_COLUMN") 
 (PARTITION "PARTITION_MAX"  VALUES LESS THAN (MAXVALUE) SEGMENT CREATION IMMEDIATE
  ENABLE ROW MOVEMENT ;

I needed to split this into two partitions such that all ros with a value less than 2017-11-12 in a separate partition. I split the partitions as shown:

ALTER TABLE “SCHEMA”.”TABLE_NAME” 
      SPLIT PARTITION "PARTITION_MAX" AT (TO_DATE('2017-11-12','YYYY-MM-DD')) 
      INTO (PARTITION "PARTITION_20171111", PARTITION "PARTITION_MAX") UPDATE INDEXES PARALLEL
;

The above resulted in two partitions,

PARTITION "PARTITION_20171111"  VALUES LESS THAN (TO_DATE(' 2017-11-12 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))

And

PARTITION "PARTITION_MAX"  VALUES LESS THAN (MAXVALUE) SEGMENT CREATION IMMEDIATE

After this was complete, I gathered statistics to check that the row count in the partitions matched the actual distribution of data.

Moving temp table spaces on a RAC with users online

Please see blog post entitled “Moving undo table spaces on a RAC with users online” for some additional background. In summary, we had to free up space on a certain ASM disk group. We had about 175 GB allocate to temp and decided to move this with users actively accessing the database.

First, take the file offline:

ALTER DATABASE TEMPFILE '+ASM_DG_01/PRODPRM/temp01.dbf' OFFLINE;

Copy the file from the old ASM disk group to the new disk group. Set oracle home to your ASM instance and then issue ASMCMD.

cp +ASM_DG_01/PRODPRM/temp01.dbf +ASM_DG_04/PRODPRM/temp01.dbf

(Note that this is a copy, not a move.)

Rename the data file in the database in sqlplus. If you have changed the oracle home to the ASM instance, remember to change back to your RAC instance’s oracle home:

ALTER DATABASE RENAME FILE '+ASM_DG_01/PRODPRM/temp01.dbf' TO '+ASM_DG_04/PRODPRM/temp01.dbf';

Bring it back online:

ALTER DATABASE TEMPFILE '+ASM_DG_04/PRODPRM/temp01.dbf' ONLINE;

Now to delete the file from the old ASM disk group. Set oracle home to your ASM instance and then issue ASMCMD.

rm +ASM_DG_01/PRODPRM/temp01.dbf

The space utilized in the ASM disk group should now be reduced.

Moving undo table spaces on a RAC with users online

We were running out of space on an ASM DG and had to free up space ASAP. Since out UNDO tablespaces are sized at about 200gb and were on the ASM DG that was full, we decided to move them.

This was the primary database with a physical (data guard) standby, so I needed to turn off standby file management

ALTER SYSTEM SET STANDBY_FILE_MANAGEMENT='MANUAL';

Create a new undo file for the RAC node 01 in the new ASM disk group:

create undo tablespace UNDOTBS1A datafile '+ASM_DG_01/PRODPRM/undotbs1a.dbf' size 20G;

Alter to allocate the new undo file to RAC node 01:

ALTER SYSTEM SET UNDO_TABLESPACE=UNDOTBS1A SCOPE=BOTH SID='PRODPRM1';

Since this is a multi-node RAC, we can take one node offline to remove all users attached to the old undo tablespace off RAC node 01:

srvctl stop instance PRODPRM1 –d PRODPRM –o immediate

To be safe, make sure the RAC node is down with:

srvctl status database –d PRODPRM

Also, check the alert log on this particular node to make sure it was a clean shutdown.

With the RAC node down, drop the data file:

drop tablespace UNDOTBS1 including contents and datafiles;

Bring up RAC node 01 and it will be using the new undo table space:

srvctl start instance PRODPRM1 –d PRODPRM

Turn on standby file management

ALTER SYSTEM SET STANDBY_FILE_MANAGEMENT='AUTO';

Delete a backup file from RMAN

Yesterday I had a requirement to remove a backup file from the RMAN catalog because we did not want recover to it. After some “research” on Google and the RMAN documentation, I found the below method:

Connect to RMAN an issue command:

list backup;

This will list all the backups cataloged in RMAN. Find the one that you want to “uncatalog”. For example, I wanted to remove the below file:

BS Key  Type LV Size       Device Type Elapsed Time Completion Time
------- ---- -- ---------- ----------- ------------ ---------------
27249   Full    3.75G      DISK        00:00:00     28-MAR-18      
        BP Key: 27249   Status: AVAILABLE  Compressed: NO  Tag: TAG20180328
        Piece Name: /backup/move/dean/PRODDBbtsuuqm6_1_1.dbc
  List of Datafiles in backup set 27249
  File LV Type Ckp SCN    Ckp Time  Name
  ---- -- ---- ---------- --------- ----
  120     Full 132807363910 29-MAR-18 +DG_ORA64/proddb/perfx_52.dbf

I then issued the below command in RMAN:

CHANGE BACKUPpiece 27249 uncatalog;

Response was

uncataloged backup piece
backup piece handle=/backup/move/dean/PRODDBbtsuuqm6_1_1.dbc RECID=27249 STAMP=972007418
Uncataloged 1 objects

The file was no longer part of the subsequent RECOVER and RESTORE commands.

ORA-31693 and ORA-06502 on an Import Datapump (IMPDP)

While importing data via an IMPDP job we encountered the error:

Import: Release 11.2.0.4.0 - Production on Tue Mar 20 13:59:30 2018
 
Copyright (c) 1982, 2011, Oracle and/or its affiliates.  All rights reserved.
;;; 
Connected to: Oracle Database 11g Enterprise Edition Release 11.2.0.4.0 - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, Oracle Label Security,
OLAP, Data Mining and Real Application Testing options
Master table "SYS"."SYS_IMPORT_FULL_01" successfully loaded/unloaded
Starting "SYS"."SYS_IMPORT_FULL_01":  "/******** AS SYSDBA" directory=dpump dumpfile=dpump_data.dmp 
Processing object type TABLE_EXPORT/TABLE/TABLE_DATA
. . 
ORA-31693: Table data object "SCHEMA_NAME"."TABLE_NAME" failed to load/unload and is being skipped due to error:
ORA-06502: PL/SQL: numeric or value error
. . 
. . 
. . 
. . 
. . 
Job "SYS"."SYS_IMPORT_FULL_01" completed with 1 error(s) at Tue Mar 20 13:59:32 2018 elapsed 0 00:00:02

Per Oracle Support, this was documented in “Bug 18665004 – ORA-31693/ORA-6502 from Datapump import (impdp) into an existing table having a LONG or LONG RAW column (Doc ID 18665004.8)”. This table did have a “LONG” column.

We were not in a position to install the fix suggested and instead decided to try the work around. We dropped the only index on the table and reran the import successfully. After the import, we recreated the index.

ORA-16047: DGID mismatch

I was building a new standby and after I enabled log_archive_dest_state_2 on the primary, the below error message was reported in the primary’s alert_log:

ALTER SYSTEM SET log_archive_dest_state_2='ENABLE' SCOPE=BOTH SID='*';
Tue Mar 20 00:20:08 2018
Errors in file /oracle/home/trace/PRODDB_tt01_438.trc:
ORA-16047: DGID mismatch between destination setting and target database
Error 16047 for archive log file 6 to 'STDBYDB'
Tue Mar 20 00:20:08 2018
Errors in file /oracle/home/trace/PRODDB_tt01_438.trc:
ORA-16047: DGID mismatch between destination setting and target database
Tue Mar 20 00:20:08 2018
Errors in file /oracle/home/trace/PRODDB_tt01_438.trc:
ORA-16047: DGID mismatch between destination setting and target database
Tue Mar 20 00:20:08 2018
Errors in file /oracle/home/trace/PRODDB_arc1_8368.trc:
ORA-16047: DGID mismatch between destination setting and target database
Tue Mar 20 00:20:08 2018
PING[ARC1]: Heartbeat failed to connect to standby 'STDBYDB'. Error is 16047.

After a few minutes of consternation, I realized that I had an incorrect parameter on the standby:

SQL> show parameter db_unique_name;
 
NAME                                 TYPE        VALUE
db_unique_name                       string      PRODDB

Basically, I had inadvertently set the db_unique_name to PRODDB on both the primary and standby database. I changed the standby as follows:

alter system set db_unique_name=STDBYDB scope=spfile sid='*';

And bounced the standby. This resolved the issue.

Testing Oracle’s Database Security Assessment Tool

I received an email suggesting that I run a “Security Assessment” by using Oracle’s Database Security Assessment Tool. The tool promises to

“analyzes database configurations, users, their entitlements, security policies and identifies where sensitive data resides to uncover security risks and improve the security posture of Oracle Databases within your organization.”

The tool can be downloaded for free from support.oracle.com reference Oracle Database Security Assessment Tool (DBSAT) (Doc ID 2138254.1).

After you, agree, a zip file will be downloaded to your laptop. I choose to unzip the file and then FTP the contents to a directory on my database server. (I suggest creating a separate directory.)

The contents of the directory will be:

[snoopy_test-TSTRAC2] ls -l
total 352
-rw-r--r-- 1 oracle dba  12433 Mar 15 18:11 dbsat
-rw-r--r-- 1 oracle dba  12579 Mar 15 18:11 dbsat.bat
drwxr-xr-x 5 oracle dba     40 Mar 15 18:15 Discover
-rw-r--r-- 1 oracle dba  28216 Mar 15 18:11 sat_analysis.py
-rw-r--r-- 1 oracle dba  43181 Mar 15 18:11 sat_collector.sql
-rw-r--r-- 1 oracle dba 247465 Mar 15 18:11 sat_reporter.py
drwxr-xr-x 2 oracle dba   4096 Mar 15 18:15 xlsxwriter

chmod the files to add execute permissions.

The tool has 3 components:
Collector collects data on the system being assessed
Reporter analyzes the above output
Discoverer generates the Database Sensitive Data Assessment Report

Running the Collector
Run the collector with the below command. In my example, I ran this as sysdba with my ORACLE_HOME set to the RAC instance that I intended to execute the DBSAT tool:

./dbsat collect -n "/ as sysdba" /TSTRAC2/dean/dbsat/dbsat_collect
                 |               |                   | 
No encryption ---+               |                   |
                                 |                   | 
location for the output ---------+                   |
                                                     |
name of report file ---------------------------------+

The output will be written to a file named dbsat_collect.json

Running the Reporter
The reporter needs Python version 2.6 or later. This can be verified with

[snoopy_test-TSTRAC2] python -V
Python 2.7.5

Run the report with the below command:

./dbsat report -a -n /TSTRAC2/dean/dbsat/dbsat_collect
 
Note: 
a. Paste the above command into notepad. I used word to write this blog entry and word sometimes replaces the “-“ to a non-unix compatible character.
b. The file name from the Collector is specified without the extension

The below files will be generated:
dbsat_collect_report.txt
dbsat_collect_report.html
dbsat_collect_report.xlsx

The html file is probably the nicest format to review and will contain a large amount of information classified by level of risk.

Using the Discoverer
To run this part of the tool, you have to set up the

/TSTRAC2/dean/dbsat/Discover/conf/sample_dbsat.config
 
File which contains the information on the database connection and the location of the
 
sensitive_en.ini
 
file which contains the patterns that will be used to query database column names, column comments etc. These patterns are specified as regex and logical operators as documented in Section 1.4.3.1.2 Pattern File Configuration of the Database Security Assessment Tool User Guide.

Based on the parameters specified in the config files, the tool will then scan the database looking for column, comments etc. that match the patterns specified.