Tuesday, September 30, 2008

CRS related commands

Oracle provides useful commands to manage CRS (Cluster Ready Services):

crs_stat -t --> Shows HA resource / service status (hard to read)
crsstat --> Ouptut of crs_stat -t formatted nicely (see Metalink note 259301.1)
crs_stop -all --> Stops all registered resources (but keeps CRS running!)
crs_start -all --> Starts all registered resources
crsctl check crs --> Verifies CSS,CRS,EVM functioning
crsctl stop crs --> Stops crs and all other services
crsctl start crs --> Starts crs and all other services

Two commands
crsctl disable crs --> Prevents CRS from starting on reboot
crsctl enable crs --> Enables CRS start on reboot
will update the file /etc/oracle/scls_scr/Node_name/root/crsstart which contains the string “enable” or “disable” as appropriate.

Command "ps -ef | grep d.bin" will check three main background processes. They are normally started by init during the operating system boot process. They can be started and stopped manually by issuing the command /etc/init.d/init.crs {start | stop | enable | disable} (or /etc/init.crs {start | stop | enable | disable})

oracle 2498 2091 0 Aug 18 - 8:18 /u01/crs/oracle/product/crs/bin/evmd.bin
root 2580 1927 0 Aug 18 - 705:21 /u01/crs/oracle/product/crs/bin/crsd.bin reboot
oracle 2662 2542 0 Aug 18 - 557:13 /u01/crs/oracle/product/crs/bin/ocssd.bin
root 2785 2951 0 Aug 18 - 1:59 /u01/crs/oracle/product/crs/bin/oprocd.bin run -t 1000 -m 500 -f

Here is a short description of each of the CRS daemon processes (Note 259301.1):

CRSD:
- Engine for HA operation
- Manages 'application resources'
- Starts, stops, and fails 'application resources' over
- Spawns separate 'actions' to start/stop/check application resources
- Maintains configuration profiles in the OCR
- Stores current known state in the OCR.
- Runs as root
- Is restarted automatically on failure
OCSSD:
- OCSSD is part of RAC and Single Instance with ASM
- Provides access to node membership
- Provides group services
- Provides basic cluster locking
- Integrates with existing vendor clusteware, when present
- Can also runs without integration to vendor clustware
- Runs as Oracle.
- Failure exit causes machine reboot. --> This is a feature to prevent data corruption in event of a split brain.
EVMD:
- Generates events when things happen
- Spawns a permanent child evmlogger
- Evmlogger, on demand, spawns children
- Scans callout directory and invokes callouts.
- Runs as Oracle.
- Restarted automatically on failure

Once the above processes are running, they will automatically start the following services in the following order if they are enabled.

- The nodeapps (gsd, VIP, ons, listener) are brought online.
- The ASM instances are brought online.
- The database instances are brought online.
- Any defined services are brought online.

Outputs from running some commands:

[root:/dssdb1]# crsctl stop crs
Stopping resources.
This could take several minutes.
Successfully stopped CRS resources.
Stopping CSSD.Shutting down CSS daemon.
Shutdown request successfully issued.
[root:/dssdb1]#

[root:/dssdb1]# crsctl start crs
Attempting to start CRS stack
The CRS stack will be started shortly
[root:/dssdb1]#

[oracle:/dssdb1]$ crsctl check crs
CSS appears healthy
CRS appears healthy
EVM appears healthy
[oracle:/dssdb1]$

If you see message other than above lines, you may use "ps -ef | grep d.bin" and "crs_stat -t" to check the status on each daemon and service.

Oracle provides an utility CLUVFY in $ORACLE_HOME/bin to verify the CRS as well:
[oracle:/dssdb1]$ cluvfy stage -post crsinst -n all -verbose

Wednesday, September 17, 2008

Database views for backuppiece info

1. On the RMAN catalog database
SELECT bp_key backuppiece_key, bs_key backupset_key, recid, set_count, incremental_level, handle, media, completion_time, status,
decode(backup_type, 'D', 'FULL', 'I', 'INCREMENTAL', 'L', 'LOGS') type
FROM rc_backup_piece
WHERE db_key = (select max(db_key) FROM rc_database WHERE name = 'PDWS')
-- and recid=2876
-- and status = 'X'
ORDER BY completion_time desc
;

After you get the backupset_key from this query, you can use "RMAN> list backupset backupset_key" or "RMAN> list backuppiece backuppiece_key" to see its contents (data files and/or archivelogs).

Once a backup piece has been deleted by "delete expired backup" and "delete obsolete", its record will be removed from this view and so all information on the backup piece in RMAN will be gone.

View RC_BACKUP_SET gives similar information, but is less useful.

2. On the target database
SELECT recid, set_count, handle, status, tag, media, completion_time, deleted
FROM v$BACKUP_PIECE
-- where recid=2876
order by completion_time desc;

This view keeps more records (maybe three months). But, after a backup piece has been deleted by "delete expired backup" and "delete obsolete", the HANDLE column of its record will become blank. So it is only useful if you want to find the tape ID from the MEDIA column.

Catalog the RMAN backuppieces

After the RMAN records are already deleted (by "delete expired backup" or "delete obsolete"), you need to catalog the RMAN backuppieces once you have the tape containing the RMAN backup of those backup pieces.

Same on the disk. After a file is deleted by RMAN, the file will no longer exist on disk. If somehow you saved a copy of the backup and later want to know what is in it, you have to use the RMAN "catalog" command to have RMAN review the file header, place in the controlfile the details about the backup. Then you run the "list backup" command to get details on the backup.

I believe Oracle is more confident on getting them back into RMAN if the backup pieces are on the disk than on the tape. If the backup is on the disk, see Note 727655.1 on getting it back. Here is an unpublished document (Note 550082.1) by Oracle on how to catalog tape backup pieces. I have not got chance to test it.

~~~~~~~~~~~~~~~~~~~~~~~~
Applies to:

Oracle Server - Enterprise Edition - Version: 10.1 to 11.1
Information in this document applies to any platform.
Applies to databases release 10g and further

Goal

Starting with 10g, it's possible to use the rman CATALOG command to add backuppieces stored in disk to the rman repository.
You may need to catalog backup pieces in the following situations:
. You copy or move a backup piece with an operating system utility and want it to be usable by RMAN.
. The RMAN metadata for the backup piece was removed, but the backup piece still exists. This situation can occur if you ran the DELETE command on a backup piece that was only temporarily unavailable.
. You make a NOCATALOG backup on one database host in a Data Guard environment and move the backup piece to the same location on a different database host. In this case, the recovery catalog has no record of the original backup piece.
. You do not use a recovery catalog and must re-create the control file, thereby losing all RMAN repository data. Cataloging your backups makes them available again.
. When control file autobackup is disabled, you back up the control file and then back up the archived redo logs. You can restore and mount the control file, but must catalog the backup pieces containing the archived redo logs backed up after the control file.

But it's not possible to use the CATALOG command for backup pieces stored in TAPE. This note explains how to add backuppieces stored in TAPE to the repository

Solution

From 10.1, there is an undocumented command that allows to catalog tape backup pieces:
CATALOG DEVICE TYPE 'SBT_TAPE' BACKUPPIECE '';

* Prerequisites
1. Use automatic channel configuration. It's mandatory to configure one sbt_tape device channel in your rman automatic configuration parameters;
CONFIGURE CHANNEL DEVICE TYPE 'SBT_TAPE' PARMS ''
2. It's necessary to know the backup piece file name in the tape and the backup piece file needs to be available and accessible.

* How to
Once there is a tape channel configured for accessing to the tape, the rman CATALOG command can be used to insert in RMAN catalog the tape backup piece:
CATALOG DEVICE TYPE 'SBT_TAPE' BACKUPPIECE '';

* Examples
- This is an example using Oracle Secure Backup (OSB):
1. Define a tape channel in the RMAN automatic configuration:
RMAN> CONFIGURE CHANNEL DEVICE TYPE 'SBT_TAPE' PARMS 'SBT_LIBRARY=/usr/local/oracle/backup/lib/libobk.so,ENV=(OB_MEDIA_FAMILY=RMAN-DEFAULT)';
2. Check that channel configuration is correct
RMAN> show all;
....
CONFIGURE CHANNEL DEVICE TYPE 'SBT_TAPE' PARMS 'SBT_LIBRARY=/usr/local/oracle/backup/lib/libobk.so,ENV=(OB_MEDIA_FAMILY=RMAN-DEFAULT)';
....
3. Catalog the backup piece
RMAN> CATALOG DEVICE TYPE 'SBT_TAPE' BACKUPPIECE '0pivagf8_1_1';

- The following will catalog a backuppiece on netbackup:
RMAN> CONFIGURE CHANNEL DEVICE TYPE 'SBT_TAPE' PARMS 'ENV=(NB_ORA_CLASS=oraclebkup, SBT_LIBRARY=/usr/openv/netbackup/bin/libobk.so';
RMAN> CATALOG DEVICE TYPE 'SBT_TAPE' BACKUPPIECE 'lij1qaa3_1_1';

Monday, September 15, 2008

Crosscheck archivelog all

The RMAN commands "crosscheck archivelog all" and "change archivelog all crosscheck" will check the archivelogs whether they are physically available on disk. If the archivelogs are no longer on disk, then the status in the RMAN catalog and controlfile will be marked from"A" for available to "X" for expired.

The RMAN command "delete noprompt expired archivelog all" will only delete entries on expired archivelogs from the RMAN catalog and controlfile (not from the disk bacause "X" means the archivelog has been removed from OS).

To see the list of expired archivelogs, run "delete expired archivelog all" and then answer "no" on confirmation.

Two ways to get archivelog space back from disk:
1. If the archivelogs have never been backed up to tape, back them up and delete them in one command:
RMAN> backup archivelog until logseq delete all input;

2. If you know the archivelogs have a good backup on the tape, you can remove them at OS level (Note 249452.1):
a) delete unwanted archive log files from disk ( rm /del )
b) connect to rman
c) RMAN> crosscheck archivelog all;
d) RMAN> delete expired archivelog all;

Note that "crosscheck backup of archivelog all" means crosschecking backups of archivelog files on the tape, and "delete expired backup of archivelog all" deletes the backup pieces from the MMD database. (??)

Crosscheck Backup

The RMAN command "crosscheck backup" will verify whether the RMAN backuppieces are still on the tape. If the RMAN backuppieces are no longer available in the tape, then the status will be changed from "A" for available to "X" for expired. Meanwhile, the RMAN command "delete force expired backup" will permanently delete the RMAN backuppieces from the tape that have a status of "X".

I believe "crosscheck backup" only check the MMD (media management device) database. Even the tape that holds the backup piece is not physically on the tape drive, the command will still consider the backup piece is available if it has not been deleted by the "delete obsolete" command (and is within the retention period).

I ran "list backup" and "list backupset", and saw every backup piece from both commands is on the list of backup pieces found Available from "crosscheck backup" command.

What could make the list of obsolete backups and the list of expired backups different?

"delete force expired backup" will permanently delete the RMAN backups that have a status of "X". However, the "force" clause will delete the corresponding records in the RMAN data dictionary even if RMAN cannot find the corresponding RMAN backuppieces on tape. My job running "delete force expired backup" against Veritas NetBackup hanged when the tape was not on the tape drive, but the one without "force" clause worked fine. It seems that "delete expired backup" only deletes entries from the MMD database.

If you just want to see the list of expired backups, use "delete expired backup" and then answer "no" on confirmation, or query the STATUS column (with 'X') of view RC_BACKUP_PIECE.

Sometimes, you may see erros:

RMAN-06207: WARNING: 41 objects could not be deleted for SBT_TAPE channel(s) due
RMAN-06208: to mismatched status. Use CROSSCHECK command to fix status
RMAN-06210: List of Mismatched objects
RMAN-06211: ==========================
RMAN-06212: Object Type Filename/Handle
RMAN-06213: ----- --------------------------------
RMAN-06214: Backup Piece ukjamesq_1_1
RMAN-06214: Backup Piece uljamesr_1_1

The RMAN-06207 and RMAN-06208 errors indicate that the RMAN backuppieces are no longer in the tape. To avoid the two errors, run the following RMAN commands, as shown below.

RMAN> allocate channel for maintenance type 'SBT_TAPE';
RMAN> crosscheck backup;
RMAN> delete expired backup;

But, it seems "crosscheck backup" only looks back to a certain time frame. Fairly old backup pieces may not get the status checked by the command, and the two errors keep showing up in the RMAN log on them. In some occasions, I had to query the BS_KEY column from view RC_BACKUP_PIECE on the catalog database, and then run "RMAN> delete FORCE NOPROMPT BACKUPSET bs_key#;" to delete the backupset after maintenace CHANNEL allocation. Keyword "force" is necessary in below run:

RMAN> delete force NOPROMPT BACKUPSET 218128;
List of Backup Pieces
BP Key BS Key Pc# Cp# Status Device Type Piece Name
------- ------- --- --- ----------- ----------- ----------
218353 218128 1 1 AVAILABLE SBT_TAPE 36j9fc7c_1_1

deleted backup piece
backup piece handle=36j9fc7c_1_1 recid=102 stamp=647475436
Deleted 1 objects

One strange error I got is that I could not use tape channle to run the command on an instance:

RMAN> run {
2> ALLOCATE CHANNEL ch01 TYPE 'SBT_TAPE';
3> crosscheck backup;
4> RELEASE CHANNEL ch01;
5> }

allocated channel: ch01
channel ch01: sid=79 devtype=SBT_TAPE
channel ch01: VERITAS NetBackup for Oracle - Release 6.0 (2006110304)
released channel: ch01
RMAN-00571: ===================================
RMAN-00569: ==== ERROR MESSAGE STACK FOLLOWS ===
RMAN-00571: ===================================
RMAN-03002: failure of crosscheck command at 09/15/2008 12:40:34
RMAN-06091: no channel allocated for maintenance (of an appropriate type)

Friday, September 12, 2008

Find the Archivelog names by using the SCN

During database recovery, you may have a SCN number and need to know the archivelog names. Here is the SQL for the answer:

column first_change# format 9,999,999,999
column next_change# format 9,999,999,999

alter session set nls_date_format='DD-MON-RRRR HH24:MI:SS';

select name, thread#, sequence#, status, first_time, next_time, first_change#, next_change#
from v$archived_log
where 35297312527 between first_change# and next_change#;

If you see 'D' in the STATUS column, the archive log has been deleted from the disk. You may need to restore it from the tape.

SEQUENCE# number usually shows up on the archivelog name. You can use RMAN command to restore them:

restore archivelog from logseq=45164 until logseq=45179;

Or, use commands to check the backup status:

list backup of archivelog all completed after 'SYSDATE - 21';
list backup of archivelog from logseq=45164 until logseq=45179;

RMAN "delete noprompt obsolete;"

The "delete obsolete" or "delete noprompt obsolete" will delete the archivelog files past the retention from disk and will also delete any backups on disk or tape.

allocate channel for maintenance type 'SBT_TAPE';
delete noprompt obsolete;
release channel;

will list two parts

. The first part lists the obsolete backups and copies (including archive logs).
. The 2nd part confirms what have been really deleted on archive logs and backup piece, with statement "Deleted xx objects".

When flash recovery area (FRA) is used, Oracle will automatically remove archive logs when space pressure is seen in the FRA. "delete obsolete" command will not remove obsolete archivelog files from the FRA, and so it only reports the obsolete archive logs without "Deleted xx objects" under them.

After a osboleted backpiece has been deleted by the command, the LIST command will not be able to see it any more. For example, I see follwoings on backup piece 635938 in the log:

Backup Set 635930 12-SEP-08
Backup Piece 635938 12-SEP-08 vcjqcgn8_1_1

deleted backup piece
backup piece handle=vcjqcgn8_1_1 recid=2021 stamp=665207528

Now, LIST on them returns errors:

RMAN> list backupset 635930;
RMAN-00571: ==================================
RMAN-00569: === ERROR MESSAGE STACK FOLLOWS ===
RMAN-00571:===================================
RMAN-03002: failure of list command at 09/12/2008 15:32:05
RMAN-06004: ORACLE error from recovery catalog database:
RMAN-20215: backup set not found
RMAN-06159: error while looking up backup set

RMAN> list backuppiece 635938;
RMAN-00571: ==================================
RMAN-00569: === ERROR MESSAGE STACK FOLLOWS ===
RMAN-00571: ==================================
RMAN-03002: failure of list command at 09/12/2008 15:32:34
RMAN-06004: ORACLE error from recovery catalog database:
RMAN-20260: backup piece not found in the recovery catalog
RMAN-06092: error while looking up backup piece

You can use "report obsolete" to find the obsolete backups. If you want to find what will become obsolete in next backup run, use "report obsolete redundancy 6" if the retention redundancy is 7.