Wednesday, January 30, 2008

Oracle Regular Expressions in 10G

I was requested to find all non-numeric values in the column ATTRIBUTE1 of table PA_BUDGET_LINES in 11i. Since the table has more than one million rows, I had to find a good query to do the task.

The new regular expressions helped me to get the job done quickly:

select * from pa_budget_lines
where attribute1 is not null
and (regexp_like(upper(attribute1), '*.[A-Z].*') or regexp_like(attribute1, '[^A-Z,0-9,.,-]'));
-- regexp_like(upper(attribute1), '[^0-9]') does not include the mix.

There are four functions available in both SQL and PL/SQL:

REGEXP_LIKE -- Determine whether pattern matches
REGEXP_SUBSTR -- Determine what string matches the pattern
REGEXP_INSTR -- Determine where the match occurred in the string
REGEXP_REPLACE -- Search and replace a pattern

Thursday, January 24, 2008

RMAN loging/tracing and RMAN schema version

1. You can save RMAN backup info into a file after connecting to the target. For example:

spool log to rman_output.log
set echo on;
list backup of database summary;
list backup of archivelog all;
list backup of controlfile;
list incarnation;
list copy of database;
crosscheck backup;
crosscheck copy;
show all;
report obsolete;
report need backup;
restore database preview;
other RMAN commands that generate errors;
spool log off

Before running the RMAN commands above, issue the following OS command to set the NLS_DATE_FORMAT:
set NLS_DATE_FORMAT=DD-MON-RRRR HH24:MI:SS

2. If you put RMAN commands in a OS script, you can enable RMAN loging and tracing by specifying file names on the RMAN start line:

rman catalog rman/rman@catdb target sys/pwd@db log=rman.log trace=rman.trc debug

3. To determine schema version of recovery catalog, connect to catalog database by the recover catalog user and then query RCVER table.

$ sqlplus rman/rman@catdb

SQL > SELECT * FROM rcver;

VERSION
------------
11.01.00

If the table displays multiple rows, then the highest version in the RCVER table is the current catalog schema version. The table stores only the major version numbers and not the patch numbers. For example, assume that the rcver table displays the following rows:

VERSION
------------
08.01.07
09.02.00
10.02.00

Saturday, January 19, 2008

"failover to previous backup" during RMAN "restore database"

I issued following commands to restore database files

$ ramn

Recovery Manager: Release 10.2.0.3.0 - Production on Wed Jan 16 19:08:07 2008

Copyright (c) 1982, 2005, Oracle. All rights reserved.

RMAN> connect catalog rman@cat_db

recovery catalog database Password:
connected to recovery catalog database

RMAN> set dbid=3937750366

executing command: SET DBID
database name is "DB_NAME" and DBID is 3937750366

RMAN> connect target

connected to target database: DB_NAME (not mounted)

RMAN> RUN {
2> ALLOCATE CHANNEL ch00 TYPE 'SBT_TAPE';
3> ALLOCATE CHANNEL ch01 TYPE 'SBT_TAPE';
4> ALLOCATE CHANNEL ch02 TYPE 'SBT_TAPE';
5> ALLOCATE CHANNEL ch03 TYPE 'SBT_TAPE';
6> restore database;
7> RELEASE CHANNEL ch00;
8> RELEASE CHANNEL ch01;
9> RELEASE CHANNEL ch02;
10> RELEASE CHANNEL ch03;


I did not pay much attention to errors (caused by ASM) in RMAN:

ORA-19870: error reading backup piece bk_9tj5rapj_1_1_643672883
ORA-19504: failed to create file "+DATA/db_name/datafile/DW_table_idx"
ORA-17502: ksfdcre:4 Failed to create file +DATA/db_name/datafile/DW_table_idx
ORA-15055: unable to connect to ASM instance
ORA-00020: maximum number of processes () exceeded
channel ch01: restored backup piece 1
piece handle=bk_9uj5rapk_1_1_643672884 tag=TAG20080110T220118
channel ch01: restore complete, elapsed time: 00:53:26
failover to previous backup

After the restore, I had difficulty in bringing the database up. The below query result made me realize that the restoration totally failed (per Metalink note 465478.1) because the backup was a cold one and the query should return just one row.

SQL> select status, checkpoint_change#,
to_char(checkpoint_time, 'DD-MON-YYYY HH24:MI:SS') as checkpoint_time, count(*)
from v$datafile_header
group by status, checkpoint_change#, checkpoint_time
order by status, checkpoint_change#, checkpoint_time;

STATUS CHECKPOINT_CHANGE# CHECKPOINT_TIME COUNT(*)
------- ------------------ -------------------- ----------
ONLINE 3.1620E+10 08-JAN-2008 22:01:22 20
ONLINE 3.1620E+10 10-JAN-2008 22:00:31 20
ONLINE 3.1620E+10 13-JAN-2008 21:01:39 38

I had to re-run the restoration, and used "restore database from tag=TAG20080113T210228;" to make sure RMAN only restores the data files from the right tag which can be got from "list backup;" in RMAN.

Additional notes:
1. Before re-restoring, I did not manually delete datafiles restored by the 1st time in the ASM. RMAN recoglize and skips the good ones and only overwrites the bad ones.

2. It is necessary to make sure the status in V$DATAFILE_HEADER for all files is ONLINE, and it is normal to see the status in V$DATAFILE is RECOVER before running command "recover database;" or "alter database open resetlogs;".

3. Run below command to restore individual file:
RMAN> restore datafile 7;
or
RMAN> restore datafile 7 from tag=TAG20080113T210228;

4. After the database is recovered, when RMAN catalog connects to the database (target) again, it will resynchronize the recovery catalog automatically:

RMAN> connect target

new incarnation of database registered in recovery catalog
starting full resync of recovery catalog
full resync complete
connected to target database: DB_NAME (DBID=3937750366)

Wednesday, January 16, 2008

Oracle ASM alert log does not report truth

LUNS allocated to 1st database server were also mistakenly allocated to 2nd database server from a SAN. When restoring data to the 2nd database server, half of hard disks for the 1st database server were ovenwitten. We had 5 Oracle instances and an ASM (10.2.0.3) for storage installed on the 1st database server. But, alert_+ASM.log did not report any error or warning on data loss.

When that occuried, database alert logs and .trc files received huge entries reporting corruped data files and corrupted control files from 5 instances running on the ASM. They quickly made the partition where $ORACLE_BASE resides 100% full. The big confusion was that each time after I removed large .trc files, I still got 100% full in the partition space beceuse all instances were so busy on reporting errors to trace files. I was even unable to shut down the databases by "shutdown immediate" bacause all control files were "corrupted", and so did not know the root cause was the corruptd files or the $ORACLE_BASE was full.

Until the database server was rebooted and I tried to start up the ASM instance, I realized that there was a problem related to ASM because of below errors:

SQL> startup;

ASM instance started
Total System Global Area 130023424 bytes
Fixed Size 2071104 bytes
Variable Size 102786496 bytes
ASM Cache 25165824 bytes

ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "39" is missing
ORA-15042: ASM disk "38" is missing
ORA-15042: ASM disk "37" is missing
ORA-15042: ASM disk "36" is missing
ORA-15042: ASM disk "35" is missing
ORA-15042: ASM disk "34" is missing
ORA-15042: ASM disk "33" is missing
ORA-15042: ASM disk "31" is missing
ORA-15042: ASM disk "30" is missing
ORA-15042: ASM disk "29" is missing
ORA-15042: ASM disk "28" is missing
ORA-15042: ASM disk "27" is missing

Soon late, the System Admins confirmed to us with the SAN problem. What a rough day.

Tuesday, January 15, 2008

List of tapes needed for restoring DB from a backup

My restore job hangged for hours before I found that none of the needed tapes were in the tape drive (library). When I requested tpae Operations to bring the tapes back from offsite, they asked me for the list of tapes needed to recovering the database.

This query on the RMAN catalog database helps to answer the question, but you might check with Tape administration to confirm the list:

select distinct media, to_char(start_time, 'MM-DD-YY')
from rc_backup_piece b, rc_database d
where b.db_id = d.dbid and d.name='PDB'
and b.start_time > sysdate - 3
order by to_char(start_time, 'MM-DD-YY');

Here is another query for getting details on tape speed (in 10G):

select db_id, media, start_time, completion_time, status, elapsed_seconds,
bytes, (bytes /(1024*1024))/(elapsed_seconds/60) MB_PER_MINS
from rc_backup_piece b, rc_database d
where b.db_id = d.dbid and d.name='PDB'
and b.start_time > sysdate - 3
-- and status != 'A'
order by to_char(start_time, 'MM-DD-YY');

If the date is beyond the RMAN Redundancy period of the database, you may not get the tape list from above views. You have to run a query on the target database itself as DBA:

select distinct SYS_CONTEXT('USERENV','DB_NAME') DB, to_char(start_time, 'MM-DD-YY') bk_date, media tape
from v$backup_piece
where start_time > sysdate - 13
order by to_char(start_time, 'MM-DD-YY') desc
;

Note if you go back too many days, the tape may be overwritten by the tape retention policy which makes the restore impossible.

Monday, January 14, 2008

RMAN: find what has been backed up

$ rman target / catalog rman/rman_pwd@mcat
(run this on the target db server to connect to RMAN first)

1. list command
RMAN> list backup of database completed after 'sysdate-3';

-- Show how many times a datafile has been backed up (and valid)
RMAN> list backup of datafile nnn;

-- Show how many times an archivelog has been backed up (in RAC).
RMAN> list backup of archivelog sequence nnnn thread 2;

2. Restore preview

The "restore database preview" gives a report of all the backup pieces and archivelogs that are required to restore/recover the database, by checking the target database controlfile or RMAN catalog. It usually gives the start sequence and end sequence of archivelogs for planning your restore and recovery operation.

3. Restore validate (Note:466221.1)

RMAN command "restore database validate" will read and validate the physical RMAN backuppieces whether they are on tape or on disk and "validate backupset BS_KEY" will test the backups block by block. For example,

RMAN> run {
allocate channel c1 TYPE 'SBT_TAPE';
restore archivelog from sequence xxx until sequence yyy validate;
}

Replace the xxx, yyy with the start and end archivelog sequence reported by "restore database preview" command.

RMAN> run {
set until time "to_date('2008-01-09:21:05:15', 'YYYY-MM-DD:HH24:MI:SS')" ;
allocate channel t1 type 'sbt_tape';
restore database validate;
release channel t1; }

RMAN> run {
ALLOCATE CHANNEL ch00 TYPE 'SBT_TAPE';
ALLOCATE CHANNEL ch01 TYPE 'SBT_TAPE';
ALLOCATE CHANNEL ch02 TYPE 'SBT_TAPE';
ALLOCATE CHANNEL ch03 TYPE 'SBT_TAPE';
restore database validate;
RELEASE CHANNEL ch00;
RELEASE CHANNEL ch01;
RELEASE CHANNEL ch02;
RELEASE CHANNEL ch03;
}

The result should like:

Starting restore at 14-JAN-08
channel ch00: starting validation of datafile backupset
channel ch01: starting validation of datafile backupset
channel ch02: starting validation of datafile backupset
channel ch03: starting validation of datafile backupset
channel ch00: reading from backup piece bk_a3j634f9_1_1_643928553
channel ch01: reading from backup piece bk_a4j634f9_1_1_643928553
channel ch02: reading from backup piece bk_a5j634f9_1_1_643928553
channel ch03: reading from backup piece bk_a2j634f8_1_1_643928552
channel ch00: restored backup piece 1
piece handle=bk_a3j634f9_1_1_643928553 tag=TAG20080113T210228
channel ch00: validation complete, elapsed time: 02:07:46
channel ch01: restored backup piece 1
piece handle=bk_a4j634f9_1_1_643928553 tag=TAG20080113T210228
channel ch01: validation complete, elapsed time: 02:10:21
channel ch03: restored backup piece 1
piece handle=bk_a2j634f8_1_1_643928552 tag=TAG20080113T210228
channel ch03: validation complete, elapsed time: 02:14:26
channel ch02: restored backup piece 1
piece handle=bk_a5j634f9_1_1_643928553 tag=TAG20080113T210228
channel ch02: validation complete, elapsed time: 02:14:27
Finished restore at 14-JAN-08

4. Backup validate: to validate that all database files and archived redo logs can be backed up.
RMAN> run {
ALLOCATE CHANNEL ch1 TYPE 'sbt_tape';
BACKUP VALIDATE DATABASE
ARCHIVELOG ALL;
}

5. Control File and Spfile

RMAN> list backup of controlfile;

If you see it, use below command to restore it:
RMAN> run {
ALLOCATE CHANNEL ch00 TYPE 'SBT_TAPE';
ALLOCATE CHANNEL ch01 TYPE 'SBT_TAPE';
RESTORE CONTROLFILE FROM 'bk_a6j63cae_1_1_643936590';
}

allocated channel: ch00
channel ch00: sid=319 devtype=SBT_TAPE
channel ch00: VERITAS NetBackup for Oracle - Release 6.0 (2006110304)

allocated channel: ch01
channel ch01: sid=318 devtype=SBT_TAPE
channel ch01: VERITAS NetBackup for Oracle - Release 6.0 (2006110304)

Starting restore at 14-JAN-08

channel ch01: skipped, autobackup already found
channel ch00: restoring control file
channel ch00: restore complete, elapsed time: 00:02:25
output filename=+ARCH/db_name/controlfile/current.269.644253071
output filename=+DATA/db_name/controlfile/current.357.644253071
Finished restore at 14-JAN-08
released channel: ch00
released channel: ch01

RMAN> list backup of spifile;

RMAN> run {
ALLOCATE CHANNEL ch00 TYPE 'SBT_TAPE';
ALLOCATE CHANNEL ch01 TYPE 'SBT_TAPE';
restore spfile from 'bk_a7j63cc6_1_1_643936646';
RELEASE CHANNEL ch00;
RELEASE CHANNEL ch01;
}

or

RMAN> run {
ALLOCATE CHANNEL ch00 TYPE 'SBT_TAPE';
ALLOCATE CHANNEL ch01 TYPE 'SBT_TAPE';
RESTORE SPFILE to '/u01/oracle/admin/db_name/pfile/spfile.ora' FROM 'bk_a7j63cc6_1_1_643936646';
}

6. Archive Logs
RMAN> list backup of archivelog all;
RMAN> list backup of archivelog all completed after 'sysdate -1';
RMAN> list copy of archivelog all;
RMAN> list backup of archivelog from scn 7966093 until scn 7966100;
RMAN> list backup of archivelog from logseq 1220;
RMAN> list backupset 802;

7. show all: to view all configuration parameters.
RMAN> show all;

8. Database views
On the target db server, views can also be used to confirm the backups of the individual database. If you see column COMPLETION_TIME is null, the backup is never completed (such as disk was full). The LIST and REPORT commands are based on those views:

V$BACKUP_SET
V$BACKUP_PIECE
V$BACKUP_REDOLOG
V$BACKUP_DATAFILE


On the RMAN catalog database, you can find views RC_XXXXX are very useful. For example, RC_BACKUP_SET_SUMMARY provides aggregate information about available backup sets for each database registered in the recovery catalog (Oracle® Database Backup and Recovery Reference10g Release 2 (10.2), Part Number B14194-03). It is good for finding what the RMAN catalog server is backing up.

select db_name, max(newest_backup_time), sum(num_backupsets)
from rc_backup_set_summary
group by db_name order by db_name;