Wednesday, January 16, 2008

Oracle ASM alert log does not report truth

LUNS allocated to 1st database server were also mistakenly allocated to 2nd database server from a SAN. When restoring data to the 2nd database server, half of hard disks for the 1st database server were ovenwitten. We had 5 Oracle instances and an ASM (10.2.0.3) for storage installed on the 1st database server. But, alert_+ASM.log did not report any error or warning on data loss.

When that occuried, database alert logs and .trc files received huge entries reporting corruped data files and corrupted control files from 5 instances running on the ASM. They quickly made the partition where $ORACLE_BASE resides 100% full. The big confusion was that each time after I removed large .trc files, I still got 100% full in the partition space beceuse all instances were so busy on reporting errors to trace files. I was even unable to shut down the databases by "shutdown immediate" bacause all control files were "corrupted", and so did not know the root cause was the corruptd files or the $ORACLE_BASE was full.

Until the database server was rebooted and I tried to start up the ASM instance, I realized that there was a problem related to ASM because of below errors:

SQL> startup;

ASM instance started
Total System Global Area 130023424 bytes
Fixed Size 2071104 bytes
Variable Size 102786496 bytes
ASM Cache 25165824 bytes

ORA-15032: not all alterations performed
ORA-15040: diskgroup is incomplete
ORA-15042: ASM disk "39" is missing
ORA-15042: ASM disk "38" is missing
ORA-15042: ASM disk "37" is missing
ORA-15042: ASM disk "36" is missing
ORA-15042: ASM disk "35" is missing
ORA-15042: ASM disk "34" is missing
ORA-15042: ASM disk "33" is missing
ORA-15042: ASM disk "31" is missing
ORA-15042: ASM disk "30" is missing
ORA-15042: ASM disk "29" is missing
ORA-15042: ASM disk "28" is missing
ORA-15042: ASM disk "27" is missing

Soon late, the System Admins confirmed to us with the SAN problem. What a rough day.

No comments: