Friday, January 24, 2014

11G database does not start up or ORA-600 error during startup

First of all, two ways to connect to the database:
a. Make a bequeath connection to the database
    sqlplus / as sysdba
b. Make a connection to the database via listener:
    sqlplus userid/pwd@tnsname

Case:  Database does not start after a fail-over test did not work on a two-nodes cluster.

SQL> startup nomount
ORA-24324: service handle not initialized
ORA-24323: value not allowed
ORA-03113: end-of-file on communication channel

1. checked and did not find any background processes related to this $ORACLE_SID
   $ ps -ef | grep ora_ | grep $ORACLE_SID
  
   Note: Doc ID 794293.1 recommends two more steps to check other OS processes.
         I did not do those because there are other Oracle instances were running on the same server.

2. $ sqlplus sqlplus "/ as sysdba"
   SQL> shutdown abort

3. Remove/rename three files belong to the $ORACLE_SID under $ORACLE_HOME/dbs, such as lk{$ORACLE_SID}

4. SQL> startup nomount;
   SQL>  alter database mount;
   SQL>  alter database open;


============================================


Case: ORA-00600: internal error code, arguments: [3712], [1], [1], [574], [3989605836], [574], [3989605835],[]
LGWR: terminating instance due to error 600
Instance terminated by LGWR, pid = 6487

SQL> startup mount
SQL> select group#, status from v$log;

    GROUP# STATUS
---------- ----------------
         1 UNUSED
         2 CURRENT
         3 UNUSED
         4 UNUSED

SQL> alter database clear unarchived logfile group 1;
Database altered.

SQL> alter database clear unarchived logfile group 3;
Database altered.

SQL> alter database clear unarchived logfile group 4;
Database altered.

SQL> select group#, status from v$log;

    GROUP# STATUS
---------- ----------------
         1 UNUSED
         2 CURRENT
         3 UNUSED
         4 UNUSED

SQL> alter database clear unarchived logfile group 2;
Database altered.

SQL> select group#, status from v$log;

    GROUP# STATUS
---------- ----------------
         1 CURRENT
         2 UNUSED
         3 UNUSED
         4 UNUSED

SQL> shutdown immediate
SQL> startup
SQL> select group#, status from v$log;

    GROUP# STATUS
---------- ----------------
         1 CURRENT
         2 INACTIVE
         3 INACTIVE
         4 INACTIVE

===========================================
Case: ora-600 errors after SAN server crashed

SQL> startup;
ORACLE instance started.  

Total System Global Area 3240689664 bytes
Fixed Size                  2225608 bytes
Variable Size            1543506488 bytes
Database Buffers         1610612736 bytes
Redo Buffers               84344832 bytes
Database mounted.
ORA-00600: internal error code, arguments: [kcratr_scan_lastbwr], [], [], [],
[], [], [], [], [], [], [], []

Fix: It needed media recovery

SQL> startup mount;
ORACLE instance started.  

Total System Global Area 3240689664 bytes
Fixed Size                  2225608 bytes
Variable Size            1543506488 bytes
Database Buffers         1610612736 bytes
Redo Buffers               84344832 bytes
Database mounted.

SQL> recover database;
Media recovery complete.
SQL> alter database open;

Database altered.

===========================================
Case:  database did not start up

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [kcratr_nab_less_than_odr], [1],
[54278], [610128], [610297], [], [], [], [], [], [], []

Fix: recover control file. Doc. 1296264.1 gives two methods. One of them is:

SQL>Startup mount ;
SQL>Show parameter control_files

Query 1
------------
SQL> select a.member, a.group#, b.status from v$logfile a ,v$log b where a.group#=b.group# and b.status='CURRENT' ;

Note down the name of the redo log

SQL> Shutdown abort ;

Take a OS Level backup of the controlfile (This is to ensure we have a backup of current state of controlfile)

SQL> Startup mount ;
SQL> recover database using backup controlfile until cancel ;

Enter location of redo log shown as current in Query 1 when prompted for recovery

Hit Enter

SQL> Alter database open resetlogs ;


============================================
Case: Error in alert log file - ORA-00600: internal error code, arguments: [4194], [#], [#], [], [], [], [], []

This error indicates that a mismatch has been detected between redo records and rollback (undo) records.

ARGUMENTS:
Arg [a] - Maximum Undo record number in Undo block
Arg [b] - Undo record number from Redo block

Fix: It is related undo segment corruption. Doc ID 1428786.1 :

Best practice to create a new undo tablespace.
This method includes segment check.

Create pfile from spfile to edit
SQL> create pfile from spfile;

1. Shutdown the instance
2. set the following parameters in the pfile
    undo_management = manual
    event = '10513 trace name context forever, level 2'
3. SQL> startup restrict pfile=<initsid.ora>
4. SQL> select tablespace_name, status, segment_name from dba_rollback_segs where status != 'OFFLINE';
 

This is critical - we are looking for all undo segments to be offline - System will always be online.

If any are 'PARTLY AVAILABLE' or 'NEEDS RECOVERY' - Please open an issue with Oracle Support or update the current SR.  There are many options from this moment and Oracle Support Analyst can offer different solutions for the bad undo segments.

If all offline then continue to the next step

5. Create new undo tablespace - example
SQL> create undo tablespace <new undo tablespace> datafile <datafile> size 2000M;
6. Drop old undo tablespace
SQL> drop tablespace <old undo tablespace> including contents and datafiles;
7. SQL> shutdown immediate;
8. SQL> startup mount;
9. modify the pfile with the new undo tablespace name
SQL> alter system set undo_tablespace = '<new tablespace>' scope=pfile;
10. SQL> shutdown immediate;
11. SQL> startup;
Then, startup using the normal spfile