Sunday, August 25, 2019

Oracle RAC Restoring Voting disk and OCR

4:50 PM Posted by Dilli Raj Maharjan , , No comments

Voting Disk

Voting Disk is a file resides on shared storage and manages cluster members.  It manage information about node membership. Each voting disk must be accessible by all nodes in the cluster.
The voting disk is used as a central reference for all nodes and keeps the heartbeat information between nodes. If any of node is unable to ping the voting disk, the cluster immediately recognizes the communication failure and evicts the node from cluster. Voting disk reassigns cluster ownership between the nodes in case of failure. Minimum 1 and maximum 15 copy of voting disk is possible.
It can be seen that number of voting disks whose failure can be tolerated is same for (2n-1) as well as 2n voting disks where n can be 1, 2 or 3. Hence to save a redundant voting disk, (2n-1) i.e. an odd number of voting disks are desirable.

# View voting disk location
crsctl query css votedisk


# Backup voting disk

The voting disk data is automatically backed up in OCR as part of any configuration change so you do not have to perform manual backups of the voting disk. 

# Adding voting disk.

You cannot directly add voting disk from Oracle Database 11g Release 2 onwards. Instead we can add new diskgroup with desired redundancy and relocate it to new diskgroup. This will provide additional voting disk. 

# Deleting voting disk

Addition and deletion of votedisk is not allowed on ASM. You can always create new diskgroup with different redundancy group to reduce number of voting disk.


Note:
There is 1 voting disk if DG with external redundancy
There are 3 voting disks if DG with normal redundancy
There are 5 voting disks if DG with high redundancy


# Relocate voting disk, or recover voting disk

crsctl replace votedisk

Modifying redundancy level of Diskgroup containing voting disk. 

Let's say we have DG VDISK and it is configured with external redundancy.
If we want to increase level of redundancy to Normal or High then we need to go through following steps.
  1. Create diskgroup with desired redundancy.
  2. Add another disk to the diskgroup and mark it as quorum disk. The quorum disk is one small Disk (500 MB should be on the safe side here, since the Voting File is only about 280 MB in size) to keep one Mirror of the Voting File. In case of normal redundancy you need one quorum disk. Other two disks will contain each one Voting File and all the other stripes of the Database Area as well, but quorum  will only get that one Voting File. For high redundancy you need two quorum disks. QUORUM disks can contain the voting file for Cluster Synchronization Services (CSS). REGULAR disks, or disks in non-quorum failure groups, can contain any files.
  3. Now try to relocate the voting disk from exiting disk-group to newly created disk-group.

Checking current voting disk location

Checking available asm disks


Create asm disk group with desired redundancy.

Set diskgroup compatible.asm attribute to 11.2.0.0.0


Add quorum disk to disk group. We need to add 2 quorum disk for DG with high redundancy.


Validate asm diskgroup. 

Relocate votedisk to newly created Diskgroup

Query and validate changes.

OCR (Oracle Cluster Registry)

OCR (Oracle Cluster Registry) – resides on shared storage and it is accessed by all nodes in the cluster. It maintains information about cluster configuration and information about cluster database. 
OCR contains information like which database instances run on which nodes and which services runs on which database. OCR is created during the time of Grid Installation. It stores information to manage Oracle clusterware and it’s component such as RAC database, listener, VIP, Scan IP & Services. Minimum 1 and maximum 5 copy of OCR is possible.

# Check OCR file details

ocrcheck

OCR Backup:

Oracle automatically takes backup every 4 hrs on master node. You can also take backup using ocrconfig export utility.
Oracle11g R2 and higher releases simplified OCR and Voting file management by storing the OCR and Voting files in ASM (Automatic Storage Management). ASM automatically maintains the number of OCR/Voting disks based on the underlying Diskgroup redundancy further reducing manual DBA file management tasks. Additionally the Clusterware stack also initiated periodic automatic backups of these files.


To determine OCR file location
more /etc/oracle/ocr.loc



Adding new location
ocrconfig -add <DiskGroup>


Deleting location
ocrconfig -delete <DiskGroup>


View ocr backup location
ocrconfig -showbackup



Manually backup
ocrconfig -manualbackup



Dump backup of ocr file 
ocrconfig -export ocr_backup_$(date +%Y_%m_%d).dmp


Restore OCR 
ocrconfig -restore



Following are New Features from Oracle 11g R2 onward.
  1. OCR And Voting disk can be stored on ASM or certified cluster file system.
  2. Voting disk and OCR can be dynamically added or replaced.
  3. Voting disk and OCR can be keep in same disk-group or different disk-group
  4. Voting disk and OCR automatic backup kept together in a single file.
  5. Automatic backup of Voting disk and OCR happen after every four hours, end of the day, end of the week
  6. Administer access: root or sudo privilege are required for managing account.

Step by step restoring OCR and voting disk in case DG with Voting disks and OCR fails.

If there is no voting disk and or diskgroup containing Voting disk failed to mount due to insufficient disk members then the only way to recover OCR and voting disk is to create new DG and start recovery. Error message as below will be noticed on alert log file.

gpnpd(3183)]CRS-2328:GPNPD started on node rac01. 
2019-08-25 12:45:39.548
[cssd(3253)]CRS-1713:CSSD daemon is started in clustered mode
2019-08-25 12:45:41.343
[ohasd(3025)]CRS-2767:Resource state recovery not attempted for 'ora.diskmon' as its target state is OFFLINE
2019-08-25 12:45:43.641
[cssd(3253)]CRS-1714:Unable to discover any voting files, retrying discovery in 15 seconds; Details at (:CSSNM00070:) in /u01/app/11.2.0/grid/log/rac01/cssd/ocssd.log


Please following following steps to restore OCR and Voting disk. 

1. Create new disk with desired redundancy. ASM attribute compatible.asm should be 11.2.0.0.0 or higher and there should be sufficient quorum failure groups as per redundancy level. For normal redundancy there should be 1 quorum failure group and 2 quorum failure groups are required for high redundancy.



2. Stop crs with -f force option.
crsctl stop crs -f


3. Start crs in exclusive mode without crs. Check crs status with -init option.
crsctl start crs -excl -nocrs



4. Check ocr location from ocr.loc file. This file contains the diskgroup where OCR file is resides. If cluster is already running we can use ocrconfig command to modify the location. Since cluster is offline this file need to modify manually. Replace newly created diskgroup. 

cat /etc/oracle/ocr.loc
vi /etc/oracle/ocr.loc


5. Check ocr backup location with command below
ocrconfig showbackup


6. Restore crs from backup. Use command below to restore crs from backup.
ocrconfig -restore
ocrconfig -restore /u01/app/11.2.0/grid/cdata/rac-scan/backup_20190825_121238.ocr



IF there is an error "PROT-35: The configured Oracle Cluster Registry locations are not accessible"
    Check asm compatibility with SQL command below in asm instance. It should be 11.2.0.0.0 or greater.

 
Check asm compatibility
    Select name, compatibility from v$asm_diskgroup;

Modify asm compatibility
   alter diskgroup TDISK set attribute 'compatible.asm'='11.2.0.0.0'; 


7. Replace voting disk to newly create diskgroup with command below.
crsctl replace votedisk +TDISK


    If you encountered an "error CRS-4000: Command Replace failed, or completed with errors. "


   Check quorum disk(s) are available or not. For normal redundancy there should be 1 quorum disk and high redundancy requires 2 quorum disk.

set lines 200 pages 200
col path for a40
col group_name for a10
select a.GROUP_NUMBER, b.name group_name, a.DISK_NUMBER, a.PATH, a.TOTAL_MB, a.FREE_MB, a.failgroup_type
from v$asm_disk a, v$asm_diskgroup b where a.group_number = b.group_number order by 1;

Add quorum disk to diskgroup as required
alter diskgroup TDISK
add quorum disk '/dev/oracleasm/disks/TDISK3';


6. Stop crs with force -f option
crsctl stop crs -f

7. Start crs normally on all nodes
crsctl start crs


8. Check crs status.
crsctl stat res -t