ERP on DB

Thursday, June 12, 2014

du / find to check space & Linux commands

More OS commands are used as an EBS apps admin than a database DBA. More notes are added to this post after I feel they are useful.

~~~~~~~~~~~~~~~~~~~~~~~ du ~~~~~~~~~~~~~~~~~~~~~~~
Below command will show the disk space used by first level of each folder in RH Linux. It is a good way to find which folder fills up the disk space.

$ du -h --max-depth=1

To get size of current directory:

$ du -sh * <== List space used by each file & folder under it

$ du -sh (or, $ du -sk ) <== Get the total space used in current location
$ du -k . <== List all details under current location. Not useful.

But, file size reported by "du -sh filename.txt" is much less than the size from "ls -al filename.txt". The difference is big for a large file. A good thing is both "du --apparent-size -m filename.txt" and "ls -lh filename.txt" agree on file size on my RHEL servers.

~~~~~~~~~~~~~~~~~~~~~ find big files ~~~~~~~~~~~~~~~~~
Find the big files under a directory in Linux, which is useful when disk space is filling up:

$ find /home -size +1000000 -exec ls -alh {} \;

For example
$ find /u01/app -type f -size +1000000 -exec ls -alh {} \;
-rw-rw-r-- 1 ebs oracle 953M Aug 9 2011 /path/to/ftp/out/gl_journal_to_adam.txt

-rw-r----- 1 ebs oracle 1.8G Sep 27 2013 /path/to/ora/10.1.3/Apache/ssl_engine_log
-rw-r--r-- 1 ebs oracle 8.3G Mar 30 13:24 /path/to/ora/10.1.3/j2ee/oacore/oacore_default_group_1/application.log

You can search for exact file size, or just for bigger (+) or smaller (–) files. For example all bigger than 512k files would be found with command:

$ find /home -type f -size +512k -exec ls -lh {} \;

units:
    b – for 512-byte blocks (this is the default if no suffix is used)
    c – for bytes
    w – for two-byte words
    k – for Kilobytes (units of 1024 bytes)
    M – for Megabytes (units of 1048576 bytes)
    G – for Gigabytes (units of 1073741824 bytes)
~~~~~~~~~~~~~~~~~~~~ find the total files ~~~~~~~~~~~~~~~~~~~
Find the total number of files under a directory:

$ find folder_name -type f | wc -l
$ find folder_name -maxdepth 1 -type f | wc -l    <== to exclude subdirectories

~~~~~~~~~~~~~~~~~~~~ find then sort ~~~~~~~~~~~~~~~~~~~~~~
Find and sort by timestamp:

$ find /path/. -name '*.*' -printf "%T+\t%s\t%p\n" | sort | more

%T+ - modification time (and date)
%s - size
%p - path of file

~~~~~~~~~~~~~~~~~~~~ find then replace ~~~~~~~~~~~~~~~~~~~
Use below line to replace wrong file owner under a directory:
# find /u01 -user wrongUserID -print | wc -l

# find /u01 -user wrongUserID -exec chown rightUserID:userGroup {} \;

To find/list all files under current directory tree that is not owned by you (applMgr):

$ find . ! -user applMgr

$ find . ! -user applMgr -exec ls -altr {} \;

~~~~~~~~~~~~~~~~~~~ find then delete ~~~~~~~~~~~~~~~~~~~~
Use find to delete files:
One day, CM log file and Output file can not be opened by browser with error "Authentication failed." I found the disk space was full with 11GB of Apache folder. Use "find" to delete old files (-r will also delete sub-folders):

$ cd $LOG_HOME/ora/10.1.3/Apache
$ find *.* -type f -mtime +7 -exec rm -rf {} \;

NOTES: If there is a huge pile of files, above command may get error. Try a different way by placing * inside the single quotes so that it is used as a matching wildcard and not for shell filename expansion:

$ find . -name '*.tmp' | wc -l
$ find . -name '*.tmp' -mtime +7 -exec rm -f {} \;

Below first line worked for me (in cron also) to delete files from a folder with more than 15K files. But the 2nd line failed.
$ /usr/bin/find /u01/path/utl_dir/. -name '*.tmp' -mtime +60 -exec rm -f {} \;

$ /usr/bin/find /u01/path/utl_dir/*.tmp -mtime +60 -exec rm -f {} \;
-ksh: /usr/bin/find: /usr/bin/find: cannot execute [Argument list too long]

~~~~~~~~~~~~~~~~~~~~~ zip ~~~~~~~~~~~~~~~~~~~~~~~~~~
To Zip up a folder, including all subdirectories:
$ cd /path/to/parent_folderName <= go to the upper folder first
$ zip -r file_name.zip folderName

this is useful when use "admrgpch -s source_path -d target_path -manifest manifest.txt" to merge patches and then copy the merged folder to different node.

~~~~~~~~~~~~~~~~~~ grep / egrep ~~~~~~~~~~~~~~~~~~~~~~~
To find multiple words in a text file:

$ /bin/grep -i -e 'drop ' -e 'alter ' -e 'grant ' -e 'commit ' ascii_file.sql

$ egrep -i 'fail|error' long_file_name.log

~~~~~~~~~~~~~~~~~~~ lsof ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
lsof to find open files. it is useful when the system acts strangely

$ lsof /home | wc -l
$ lsof | grep deleted | wc -l (files in being deleted queue)

~~~~~~~~~~~~~~~~~~ port ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To find what is using a port in Linux:

Below line will tell if a port is used or not on the server:

$ netstat -tuanp | grep 6230

(Not all processes could be identified, non-owned process info

will not be shown, you would have to be root to see it all.)

tcp 0 0 167.69.109.82:6230 0.0.0.0:* LISTEN 31876/opmn

Then the process owner (or root) can use below line to see more details on using the port:

$ lsof -i :6230

COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME

opmn 31876 ebsdev 9u IPv4 42886593 0t0 TCP server3d.domain.com:6230 (LISTEN)

Below line may also works (Note: "lsof" may show result only to root or process owner):

$ ls -l /proc/31876/exe <= use the PID from above line
lrwxrwxrwx 1 ebiz user 0 Mar 5 11:19 /proc/31876/exe -> /u05/app/EBSDEV/apps/tech_st/10.1.3/opmn/bin/opmn

NOTES: even a port is free and available on a server, it can be blocked by firewall and such to prevent access from other servers.
~~~~~~~~~~~~~~~~~ screen ~~~~~~~~~~~~~~~~~~~~~~~

First of all, make sure "screen" is installed (in /usr/bin).

To start a "screen" session for a long-run job:

$ screen

$ adop phase=fs_clone

... ...

To list running screen/background session:

$ screen -list (Or, $ screen -l)

There are screens on:

1653.pts-5.server2d (Attached)

23515.pts-3.server2d (Attached)

2 Sockets in /var/run/screen/S-userID.

To re-attach a session:

$ screen -r -d 23515

Session ID: 16

Node: server2d

Phase: fs_clone

Log: $ADOP_LOG_HOME/16/xxxxxx_201956/adop.log

... ...

~~~~~~~~~~~~~~~~~ rsync ~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To copy a folder to another folder on the SAME server and preserve exactly all attributes. For example, in EBS R12.2, to synchronize custom folder xxap from PATCH file system to RUN file system:

$ rsync -a --delete $PATCH_BASE/EBSapps/appl/xxap $RUN_BASE/EBSapps/appl

Notes: add rsync -a --delete %s_current_base%/EBSapps/appl/xxap %s_other_base%/EBSapps/appl to file $APPL_TOP_NE/ad/custom/adop_sync.drv

If contents of a folder are static, add rsync -zr %s_current_base%/EBSapps/appl/xxe4a %s_other_base%/EBSapps/appl to file $APPL_TOP_NE/ad/custom/adop_sync.drv

The option -a is very powerful, it will copy the directory and all its sub-directories recursively, and preserve symbolic links, modification times, group, ownership, and permissions.

-a option is a combination of a bunch of other options:

-r Recurse into directories

-l Copy symlinks as symlinks

-p Preserve file permissions

-t Preserve modification times

-g Preserve group

-o Preserve owner

-D Preserve device and special files

"-- delete" will DELETE files in Target folder but not in Source folder.

Below line will copy the folder structure with same permissions to current location, without copying files:
$ cd /path/to/target
$ rsync -av -f"+ */" -f"- *" /path1/to/source .

Copy a folder to a different server (and preserve exactly all attributes):

$ cd /to/path/source_FOLDER

$ rsync --progress -avze ssh applmgr@nodeName:/to/path/target_FOLDER

~~~~~~~~~~~~~~~~~~ Stickybit ~~~~~~~~~~~~~~~~~~~~~

If your ID owns /path/to/LOG directory and you want to allow users in other Groups to rename the files in that directory, you have to remove Stickybit on LOG directory (by chmod -t) if it is enabled, because enabled Stickybit makes other users unable to rename/delete the files under LOG directory.

$ ls -al /path/to/LOG

drwxrwxrwt 10 user group 134 Aug 14 09:57

$ chmod -t /path/to/LOG # remove Stickybit

$ ls -al /path/to/LOG

drwxrwxrwx 10 user group 134 Aug 14 09:57

Right permission is needed for others to write/modify your file. To get 664 permissions (-rw-rw-r--) on all files created by your ID, add one line to .bashrc or .kshrc or .profile of your OS ID, or add the line to the shell script for generating the file:

set umask to 0002 (or 002 ?)

umask u=rwx,g=rwx,o=rx

~~~~~~~~~~~~~~~~~ chmod -R ~~~~~~~~~~~~~~~~~~~~

Run "chmod -R" to change permission on a folder and all subfolders & files under it. For example, to make a folder and all subfolders & all files under it "readable":

$ chmod -R +r /path/to/LOG

~~~~~~~~~~~~~~~~~ list folders ~~~~~~~~~~~~~~~~~

- To list all folders (sub-directories called "fs_clone") under a location:

$ find . -name fs_clone -type d -print

- Get the list of directories. Below were tested in K Shell
$ echo $SHELL
/bin/ksh
1.
$ find /u01/app/patches/* -prune -type d (Or go to the directory first, $ find * -prune -type d )
/u01/app/patches/11902981
/u01/app/patches/13006289
/u01/app/patches/16367827
Note: if just want the list of files, use "find * -prune -type f"

2. -- List directories older than two days
$ find * -prune -type d -mtime +2

3.
$ ls -l |egrep "^drw" |awk '{print $9}'
-- Loop through the list to do whatever you want
$ for x in `ls -l |egrep "^drw" |awk '{print $9}'`
do
echo $x;
done

4.
-- Note: FS is used to define the delimiter in AWK
The default is defined by OS env variable $FS
$ ls -1d */ | awk 'BEGIN {FS="/"};{print $1}'

Monday, April 7, 2014

Apply Oracle database 11G Security Patch Update (SPU) on AIX

January 2014 Database SPU Installation steps:

1.   Check/install opatch:
Above patches require opatch version 11.2.0.3. Check current version of opatch and install correct version if required. To check:
$ cd /app/oracle/product/OPatch
$ ./opatch version
To install opatch newer version 11.2.0.3, download patch 6880880 and unzip it:
$ cd /app/oracle/product
$ rm -rf OPatch

$ unzip opatch_p6880880_112000_AIX64-5L.zip
$ rm opatch_p6880880_112000_AIX64-5L.zip
$ export PATH=/app/oracle/product/OPatch:$PATH
$ opatch version <= to verify opatch version. It should be 11.2.0.3

2.   FTP/unzip SPU patch:
$ mkdir $ORACLE_BASE/patches/spu_jan14_11203
$ cd $ORACLE_BASE/patches/spu_jan14_11203
$ unzip p17478415_112030_AIX64-5L.zip
$ rm p17478415_112030_AIX64-5L.zip

3.   Check/set environment:

- Set correct ORACLE_HOME
Set ORACLE_BASE
$ export ORACLE_BASE=/app/oracle

Set LIBPATH
$ export LIBPATH=$ORACLE_HOME/lib

Set LD_LIBRARY_PATH
$ export LD_LIBRARY_PATH=$ORACLE_HOME/lib

- Ensure that the $PATH has the following executables: make, ar, ld, and nm.
$ which make
$ which ar
$ which ld
$ which nm

The location of these executables depends on your operating system. On many operating systems, they are located in /usr/ccs/bin, in which case you can set your PATH as follows:

      $ export PATH=/usr/ccs/bin:$PATH

- Check jdk version: It should be version 1.4.2 or higher.

$ java -version

If java version is lower, check if version 1.4.2 or higher is available on the host.
    $ lslpp -L | grep Java
              (or)
    $ lslpp -L | grep Java14
    $ export PATH=/usr/java14/bin:$PATH

If available, adjust PATH accordingly and check again.

If java version 1.4.2 or higher is not available, work with System Admin to have it installed.

- Check location of oracle inventory in /etc/oraInst.loc
- Run oracle inventory

   $ cd /app/oracle/product/OPatch
    $ ./opatch lsinventory
    Make sure that Oracle version is 11.2.0.3.

- Check XDB: XDB must be VALID.
SQL> select status from dba_registry where comp_id='XDB'

If XDB is invalid, re-install DB – reference Oracle Support note 1292089.1

4.   List invalid objects:

For each database on the server, save a list of invalid objects for comparison after SPU patch.
SQL> select owner, object_name, object_type from dba_objects
         where status= 'INVALID';

5.    Stop database replication tool if any.

6.    Stop database monitoring tools, such as Grid Control agent.

7.   Shutdown databases for the Oracle Home being patched and stop listener.

8.   Check processes:
Make sure that there are no oracle processes running except aioserver process.
$ ps -fu'oracle’ | grep -v aioserver
$ ps -ef | grep sqlplus

Make sure that there are no oraggs processes running except aioserver process.
$ ps -fu'oragg’

9.   Check shared file in use:
Following commands should not return any file name:
$ genld -l | grep $ORACLE_HOME
$ genkld | grep $ORACLE_HOME

If these return any file name, have system administrator run following command
$ /usr/sbin/slibclean

Check again.

10.   Apply patch

$ cd $ORACLE_BASE/patches/spu_jan14_11203/17478415
$ /app/oracle/product/OPatch/opatch napply -skip_subset -skip_duplicate

An observation: During installation on one host, it might as for Oracle Support login ID / password to make an internet connection to Oracle Support website for any latest information relating patch. Just press return.

Patch will possible complete with warnings, as listed below. Per Oracle these can safely be ignored:
ld: 0711-319 WARNING: Exported symbol not defined: ****
Symbol xxxxxx is already exported.
ld: 0711-773 WARNING: Object ***, imported symbol ***
Symbol was expected to be local.
ld: 0711-783 WARNING: TOC overflow. TOC size: Maximum size:
0711-224 WARNING: Duplicate symbol

Check the following log files in $ORACLE_BASE/cfgtoollogs/catbundle for any errors:
    catbundle_CPU_<database SID>_APPLY_<TIMESTAMP>.log
    catbundle_CPU_<database SID>_GENERATE_<TIMESTAMP>.log

11.   Check inventory again:
$ /app/oracle/product/OPatch/opatch lsinventory

12.   Load modified SQL files into databases:

For each database instance running on the Oracle home being patched, connect to the database using SQL*Plus. Connect as SYSDBA and run the catbundle.sql script as follows:
$ cd $ORACLE_HOME/rdbms/admin
$ sqlplus /nolog
SQL> CONNECT / AS SYSDBA
SQL> STARTUP
SQL> @catbundle.sql cpu apply
SQL> @utlrp.sql
SQL> QUIT

Note: Start the LISTENER

For information about the catbundle.sql script, see My Oracle Support Note 605795.1 Introduction to Oracle Database catbundle.sql.

Check the following log files in $ORACLE_BASE/cfgtoollogs/catbundle for any errors:
catbundle_CPU_<database SID>_APPLY_<TIMESTAMP>.log
catbundle_CPU_<database SID>_GENERATE_<TIMESTAMP>.log

13.   Invalid objects check
For each database on the server, make a list of invalid objects and compare with the list saved before patching. Take appropriate action if there are new invalid objects

14.   Check XDB
If XDB is invalid, there is a possibility that LIBPATH has not been set correctly.

To check status of XDB:
SQL> select status from DBA_REGISTRY where comp_id='XDB'

If status is INVALID:
$ export LIBPATH=$ORACLE_HOME/lib
Shutdown database
Startup database
Run utlrp

If XDB is still invalid, re-install DB – reference Oracle Support note 1292089.1

15.   Final check on registry:
$ sqlplus /nolog
SQL> CONNECT / AS SYSDBA
SQL> SELECT * FROM sys.registry$history where comments='CPUJan2014';
    It should return one line for CPUJan2014.

Thursday, February 27, 2014

Move objects from one schema to another schema

All database objects of a home-grown business application were created in a wrong schema. After year and year, it became a very large database with more than 2,000GB data. The problem never got corrected until getting urgent calls for violating company auditing standard. I used transport tablespaces concept in Doc ID 371556.1 to move those objects, mainly tables, to a new dedicated schema within the same 11G database.

Pre-steps:

1. create new schema owner EDI in the database

create user edi identified by "ediPassword"
default tablespace TBS_MED_DT1
temporary tablespace temp
quota unlimited on TBS_MED_DT1
profile STANDARD;

alter user edi quota unlimited on TBS_LRG_DT1;    -- do the same with other tablespaces
grant ALTER ANY MATERIALIZED VIEW to edi;
grant CREATE SESSION to edi;
grant GLOBAL QUERY REWRITE to edi;
grant CREATE TABLE to edi;

2. Create DIRECTORY for datapump
SQL> create or replace directory dpump_dir as '/path/to/export';
SQL> grant read, write on directory dpump_dir to system;

3. Identify 5 tablespaces that hold all targeted objects for moving. ORADBA is the schema where objects will be moved out. Save the counts
.
SQL> select distinct tablespace_name, count(*) from dba_tables
where owner = 'ORADBA' group by tablespace_name;

SQL> select distinct tablespace_name, count(*) from dba_indexes where owner = 'ORADBA' group by tablespace_name;

4. Get and save the list of data files for IMPDP script (142 data files)

SQL> select ''''||file_name|| ''',' from dba_data_files where tablespace_name in
('TBS_LRG_DT1','TBS_MED_DT1','TBS_DT_LOB','ORADBA_IX1','ORADBA_IX2');

5. compile invalid objects and record down the invalid list
SQL> @?/rdbms/admin/utlrp.sql
SQL> select owner, object_type, object_name from dba_objects where status = 'INVALID';
... ... ...
29 rows selected.

6. statements for re-creating sequences
SQL> select 'create sequence edi.' || sequence_name || ' start with ' || last_number || ' INCREMENT BY 1;' FROM dba_sequences where sequence_owner = 'ORADBA';

7. statement for re-creating synonyms
select 'create public synonym ' || synonym_name || ' for EDI.' || table_name || ';' from dba_synonyms where table_owner = 'ORADBA' and owner='PUBLIC'
union
select 'create synonym ' || owner||'.'||synonym_name || ' for EDI.' || table_name ||';' from dba_synonyms where table_owner = 'ORADBA' and owner!='PUBLIC'

8. query dba_tab_privs to get all necessary "grant" statements.
9. extract affected codes of packages, functions, triggers, views, materialized views, db links.

Moving steps:

1. Prepare for exporting the tablespace (Doc 371556.1)

SQL> execute sys.dbms_tts.transport_set_check(
'TBS_LRG_DT1,TBS_MED_DT1,TBS_DT_LOB,ORADBA_IX1,ORADBA_IX2', true);

SQL> select * from sys.transport_set_violations;

2. Make fixes to 6 violations from Step 1. Such as:

SQL> ALTER TABLE ORADBA.AIS_DATA DROP PRIMARY KEY CASCADE drop index;
SQL> alter table ORADBA.AIS_DATA move tablespace TBS_DT_LOB;
SQL> ALTER TABLE ORADBA.AI_DATA_SET ADD
CONSTRAINT AIS_DATA_PK
PRIMARY KEY (KEY, FORMAT, STATUS_CD)
USING INDEX tablespace ORADBA_IX1 ;
... ... ...

Repeat Step 1 until no violation in sys.transport_set_violations

3. Drop two materialized views from TBS_MED_DT1, and drop other objects

SQL> drop materialized view oradba.xxxxxx;
SQL> drop MATERIALIZED VIEW LOG ON ORADBA.STATUS_TAB;

Drop triggers and synonyms
SQL> @drop_trigger.sql
SQL> @drop_synonym.sql

4. Alter 5 tablespaces to READ only

$ more tbs_read_only.sql
alter tablespace TBS_LRG_DT1 read only;
alter tablespace TBS_MED_DT1 read only;
alter tablespace TBS_DT_LOB read only;
alter tablespace ORADBA_IX1 read only;
alter tablespace ORADBA_IX2 read only;

SQL> @tbs_read_only.sql

confirm by
SQL> select * from dba_tablespaces where status = 'READ ONLY';

4. Run EXPDP script: No "TRANSPORT_FULL_CHECK=Y"

$ more expdp_5tbs.par
## $ expdp system/PSWD parfile=expdp_8tbs.par
DUMPFILE=expdp_5tbs.dmp
LOGFILE=expdp_5tbs.log
DIRECTORY=dpump_dir
TRANSPORT_TABLESPACES=TBS_LRG_DT1,TBS_MED_DT1,TBS_DT_LOB,ORADBA_IX1,ORADBA_IX2
# TRANSPORT_FULL_CHECK=Y

$ expdp system/PSWD parfile=expdp_5tbs.par

5. Drop tablespaces. The KEY is all datafiles will not be deleted and untouched!

$ more tbs_drop.sql
drop tablespace TBS_LRG_DT1 INCLUDING CONTENTS;
drop tablespace TBS_MED_DT1 INCLUDING CONTENTS;
drop tablespace TBS_DT_LOB INCLUDING CONTENTS;
drop tablespace ORADBA_IX1 INCLUDING CONTENTS;
drop tablespace ORADBA_IX2 INCLUDING CONTENTS;

SQL> @tbs_drop.sql

confirm by
SQL> select * from dba_data_files where tablespace_name in
('TBS_LRG_DT1','TBS_MED_DT1','TBS_DT_LOB','ORADBA_IX1','ORADBA_IX2');
0 row returned

6. Run IMPDP script

$ more impdp_5tbs.par
# impdp system/PSWD parfile=impdp_8tbs.par
#
DUMPFILE=expdp_5tbs.dmp
LOGFILE=impdp_5tbs.log
DIRECTORY=dpump_dir
TRANSPORT_DATAFILES='/path/to/data/sml_data_001.dbf',
'/path/to/index/sml_index_001.dbf',
'/path/to/index/med_index_001.dbf',
'..... list all 142 files, See Pre-setp 4',
REMAP_SCHEMA=ORADBA:EDI

$ impdp system/PSWD parfile=impdp_8tbs.par

7. Bring them back "READ WRITE"

alter tablespace TBS_LRG_DT1 read write;
alter tablespace TBS_MED_DT1 read write;
alter tablespace TBS_DT_LOB read write;
alter tablespace ORADBA_IX1 read write;
alter tablespace ORADBA_IX2 read write;

SQL> @tbs_read_write.sql

Verify:
SQL> select * from dba_tablespaces where status = 'READ ONLY';

-- Make sure the object counts match what the database had before the move
SQL> select distinct tablespace_name, count(*) from dba_tables where owner = 'EDI' group by tablespace_name;
SQL> select distinct tablespace_name, count(*) from dba_indexes where owner = 'EDI' group by tablespace_name;
-- The old schema still keeps some objects as needed
SQL> select distinct tablespace_name, count(*) from dba_tables where owner = 'ORADBA' group by tablespace_name;

TABLESPACE_NAME    COUNT(*)
------------------------------------- ----------
                                        2
TOOLS                                  34

SQL> select distinct tablespace_name, count(*) from dba_indexes where owner = 'ORADBA' group by tablespace_name;

TABLESPACE_NAME   COUNT(*)
------------------------------------ ----------
TOOLS                                   3

8. Compile objects
SQL> @?/rdbms/admin/utlrp.sql

9.Sync a MVIEW log
SQL> exec SYS.DBMS_SNAPSHOT_UTL.SYNC_UP_LOG('EDI','STATU_TAB');
PL/SQL procedure successfully completed.

10. Re-create views, sequences.
11. Re-create synonyms, triggers, etc
12. Run grant statements.

Friday, January 24, 2014

11G database does not start up or ORA-600 error during startup

First of all, two ways to connect to the database:
a. Make a bequeath connection to the database
sqlplus / as sysdba
b. Make a connection to the database via listener:
sqlplus userid/pwd@tnsname

Case: Database does not start after a fail-over test did not work on a two-nodes cluster.

SQL> startup nomount

ORA-24324: service handle not initialized

ORA-24323: value not allowed

ORA-03113: end-of-file on communication channel

1. checked and did not find any background processes related to this $ORACLE_SID

$ ps -ef | grep ora_ | grep $ORACLE_SID

Note: Doc ID 794293.1 recommends two more steps to check other OS processes.

I did not do those because there are other Oracle instances were running on the same server.

2. $ sqlplus sqlplus "/ as sysdba"

SQL> shutdown abort

3. Remove/rename three files belong to the $ORACLE_SID under $ORACLE_HOME/dbs, such as lk{$ORACLE_SID}

4. SQL> startup nomount;

SQL> alter database mount;

SQL> alter database open;

============================================

Case: ORA-00600: internal error code, arguments: [3712], [1], [1], [574], [3989605836], [574], [3989605835],[]

LGWR: terminating instance due to error 600

Instance terminated by LGWR, pid = 6487

SQL> startup mount

SQL> select group#, status from v$log;

GROUP# STATUS

---------- ----------------

1 UNUSED

2 CURRENT

3 UNUSED

4 UNUSED

SQL> alter database clear unarchived logfile group 1;

Database altered.

SQL> alter database clear unarchived logfile group 3;

Database altered.

SQL> alter database clear unarchived logfile group 4;

Database altered.

SQL> select group#, status from v$log;

GROUP# STATUS

---------- ----------------

1 UNUSED

2 CURRENT

3 UNUSED

4 UNUSED

SQL> alter database clear unarchived logfile group 2;

Database altered.

SQL> select group#, status from v$log;

GROUP# STATUS

---------- ----------------

1 CURRENT

2 UNUSED

3 UNUSED

4 UNUSED

SQL> shutdown immediate

SQL> startup

SQL> select group#, status from v$log;

GROUP# STATUS

---------- ----------------

1 CURRENT

2 INACTIVE

3 INACTIVE

4 INACTIVE

===========================================

Case: ora-600 errors after SAN server crashed

SQL> startup;
ORACLE instance started.

Total System Global Area 3240689664 bytes
Fixed Size                  2225608 bytes
Variable Size            1543506488 bytes
Database Buffers         1610612736 bytes
Redo Buffers               84344832 bytes
Database mounted.
ORA-00600: internal error code, arguments: [kcratr_scan_lastbwr], [], [], [],
[], [], [], [], [], [], [], []

Fix: It needed media recovery

SQL> startup mount;
ORACLE instance started.

Total System Global Area 3240689664 bytes
Fixed Size                  2225608 bytes
Variable Size            1543506488 bytes
Database Buffers         1610612736 bytes
Redo Buffers               84344832 bytes
Database mounted.

SQL> recover database;
Media recovery complete.
SQL> alter database open;

Database altered.

===========================================
Case: database did not start up

SQL> alter database open;
alter database open
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [kcratr_nab_less_than_odr], [1],
[54278], [610128], [610297], [], [], [], [], [], [], []

Fix: recover control file. Doc. 1296264.1 gives two methods. One of them is:

SQL>Startup mount ;
SQL>Show parameter control_files

Query 1
------------
SQL> select a.member, a.group#, b.status from v$logfile a ,v$log b where a.group#=b.group# and b.status='CURRENT' ;

Note down the name of the redo log

SQL> Shutdown abort ;

Take a OS Level backup of the controlfile (This is to ensure we have a backup of current state of controlfile)

SQL> Startup mount ;
SQL> recover database using backup controlfile until cancel ;

Enter location of redo log shown as current in Query 1 when prompted for recovery

Hit Enter

SQL> Alter database open resetlogs ;

============================================
Case: Error in alert log file - ORA-00600: internal error code, arguments: [4194], [#], [#], [], [], [], [], []

This error indicates that a mismatch has been detected between redo records and rollback (undo) records.

ARGUMENTS:
Arg [a] - Maximum Undo record number in Undo block
Arg [b] - Undo record number from Redo block

Fix: It is related undo segment corruption. Doc ID 1428786.1 :

Best practice to create a new undo tablespace.
This method includes segment check.

Create pfile from spfile to edit
SQL> create pfile from spfile;

1. Shutdown the instance
2. set the following parameters in the pfile
    undo_management = manual
    event = '10513 trace name context forever, level 2'
3. SQL> startup restrict pfile=<initsid.ora>
4. SQL> select tablespace_name, status, segment_name from dba_rollback_segs where status != 'OFFLINE';

This is critical - we are looking for all undo segments to be offline - System will always be online.

If any are 'PARTLY AVAILABLE' or 'NEEDS RECOVERY' - Please open an issue with Oracle Support or update the current SR. There are many options from this moment and Oracle Support Analyst can offer different solutions for the bad undo segments.

If all offline then continue to the next step

5. Create new undo tablespace - example
SQL> create undo tablespace <new undo tablespace> datafile <datafile> size 2000M;
6. Drop old undo tablespace
SQL> drop tablespace <old undo tablespace> including contents and datafiles;
7. SQL> shutdown immediate;
8. SQL> startup mount;
9. modify the pfile with the new undo tablespace name
SQL> alter system set undo_tablespace = '<new tablespace>' scope=pfile;
10. SQL> shutdown immediate;
11. SQL> startup;
Then, startup using the normal spfile

Friday, December 13, 2013

Clean WRH$_ACTIVE_SESSION_HISTORY in SYSAUX

Tablespace SYSAUX in a 11g database grows quickly. Run Oracle script awrinfo.sql to find what is using the space. One section of the report shows that ACTIVE_SESSION_HISTORY is the biggest consumer.

SQL> @$ORACLE_HOME/rdbms/admin/awrinfo.sql

COMPONENT MB    SEGMENT_NAME           - % SPACE_USED SEGMENT_TYPE
--------- ----- ------------------------------ -------------- ---------------

ASH       408.0 WRH$_ACTIVE_SESSION_HISTORY    -  98%         TABLE PARTITION
                .WRH$_ACTIVE_21536658_7139e
ASH        41.0 WRH$_ACTIVE_SESSION_HISTORY_PK -  98%         INDEX PARTITION
                .WRH$_ACTIVE_21536658_7139
FIXED      36.0 WRH$_SYSMETRIC_HISTORY         -  98%         TABLE
FIXED      26.0 WRH$_SYSMETRIC_HISTORY_INDEX   -  98%         INDEX

--
-- Check the Snapshot retention, it works fine (only 8 snapshots exist).
--
SQL> SELECT snap_interval, retention, most_recent_purge_time
    FROM sys.wrm$_wr_control;

SNAP_INTERVAL     RETENTION         MOST_RECENT_PURGE_TIME
----------------- ----------------- -------------------------
+00000 01:00:00.0 +00007 00:00:00.0 27-NOV-13 12.06.06.995 AM

SQL> select dbid, count(*) from SYS.WRM$_SNAPSHOT group by dbid;

      DBID   COUNT(*)
---------- ----------
  21536658          8

-- One interesting thing is DBA_HIST_SNAPSHOT and SYS.WRM$_SNAPSHOT are different
SQL> select * from DBA_HIST_SNAPSHOT;

no rows selected

--
-- Run below query to identify the big tables in SYSAUX
--
SQL> SELECT *
      FROM (  SELECT bytes / 1024 / 1024 dim_Mb, segment_name, segment_type
                FROM dba_segments
               WHERE tablespace_name = 'SYSAUX'
            ORDER BY dim_Mb DESC)
     WHERE ROWNUM < 5;

    DIM_MB SEGMENT_NAME                   SEGMENT_TYPE
---------- ------------------------------ ------------
       544 WRH$_ACTIVE_SESSION_HISTORY    TABLE PARTITION
       128 I_WRI$_OPTSTAT_H_OBJ#_ICOL#_ST INDEX
        88 WRI$_OPTSTAT_HISTGRM_HISTORY   TABLE
        64 WRH$_SYSMETRIC_HISTORY         TABLE

--
-- Follow Oracle DOC ID 387914.1 to clean table WRH$_ACTIVE_SESSION_HISTORY
--
SQL> select table_name,partition_name
  2  from dba_tab_partitions
  3  where table_name = 'WRH$_ACTIVE_SESSION_HISTORY';

TABLE_NAME                     PARTITION_NAME
------------------------------ ------------------------------
WRH$_ACTIVE_SESSION_HISTORY    WRH$_ACTIVE_21536658_7139
WRH$_ACTIVE_SESSION_HISTORY    WRH$_ACTIVE_SES_MXDB_MXSN

SQL> select count(*) from WRH$_ACTIVE_SESSION_HISTORY;

  COUNT(*)
----------
   1653076

SQL> alter session set "_swrf_test_action" = 72;

Session altered.

SQL> 
set serveroutput on 
declare 
CURSOR cur_part IS 
SELECT partition_name from dba_tab_partitions 
WHERE table_name = 'WRH$_ACTIVE_SESSION_HISTORY'; 

query1 varchar2(200); 
query2 varchar2(200); 

TYPE partrec IS RECORD (snapid number, dbid number); 
TYPE partlist IS TABLE OF partrec; 

Outlist partlist; 
begin 
dbms_output.put_line('PARTITION NAME SNAP_ID DBID'); 
dbms_output.put_line('--------------------------- ------- ----------'); 

for part in cur_part loop 
query1 := 'select min(snap_id), dbid from 
sys.WRH$_ACTIVE_SESSION_HISTORY partition ('||part.partition_name||') group by dbid'; 
execute immediate query1 bulk collect into OutList; 

if OutList.count > 0 then 
for i in OutList.first..OutList.last loop 
dbms_output.put_line(part.partition_name||' Min '||OutList(i).snapid||' '||OutList(i).dbid); 
end loop; 
end if; 

query2 := 'select max(snap_id), dbid 
from sys.WRH$_ACTIVE_SESSION_HISTORY partition ('||part.partition_name||') group by dbid'; 
execute immediate query2 bulk collect into OutList; 

if OutList.count > 0 then 
for i in OutList.first..OutList.last loop 
dbms_output.put_line(part.partition_name||' Max '||OutList(i).snapid||' '||OutList(i).dbid); 
dbms_output.put_line('---'); 
end loop; 
end if; 

end loop; 
end; 
/

PARTITION NAME                SNAP_ID DBID
----------------------------- ------- ----------
WRH$_ACTIVE_21536658_7139 Min 7139    21536658
WRH$_ACTIVE_21536658_7139 Max 7881    21536658
---

PL/SQL procedure successfully completed.

-- After above PL/SQL block ran, it became three partitions
SQL> select table_name,partition_name
      from dba_tab_partitions
     where table_name = 'WRH$_ACTIVE_SESSION_HISTORY';

TABLE_NAME                     PARTITION_NAME
------------------------------ ------------------------------
WRH$_ACTIVE_SESSION_HISTORY    WRH$_ACTIVE_21536658_7139
WRH$_ACTIVE_SESSION_HISTORY    WRH$_ACTIVE_21536658_7884
WRH$_ACTIVE_SESSION_HISTORY    WRH$_ACTIVE_SES_MXDB_MXSN

-- Run drop Procedure
SQL> exec DBMS_WORKLOAD_REPOSITORY.DROP_SNAPSHOT_RANGE(low_snap_id =>7139,high_snap_id =>7881);

PL/SQL procedure successfully completed.

-- Now, WRH$_ACTIVE_SESSION_HISTORY has no rows
SQL> select count(*) from WRH$_ACTIVE_SESSION_HISTORY;

  COUNT(*)
----------
         0

SQL> SELECT *
      FROM (  SELECT bytes / 1024 / 1024 dim_Mb, segment_name, segment_type
                FROM dba_segments
               WHERE tablespace_name = 'SYSAUX'
            ORDER BY dim_Mb DESC)
     WHERE ROWNUM < 5;  

    DIM_MB SEGMENT_NAME                   SEGMENT_TYPE
---------- ------------------------------ -----------
       128 I_WRI$_OPTSTAT_H_OBJ#_ICOL#_ST INDEX
        88 WRI$_OPTSTAT_HISTGRM_HISTORY   TABLE
        64 WRH$_SYSMETRIC_HISTORY         TABLE
        55 I_WRI$_OPTSTAT_H_ST            INDEX
   
-- But partitions were NOT dropped
SQL> select table_name,partition_name
     from dba_tab_partitions
     where table_name = 'WRH$_ACTIVE_SESSION_HISTORY';

TABLE_NAME                     PARTITION_NAME
------------------------------ ------------------------------
WRH$_ACTIVE_SESSION_HISTORY    WRH$_ACTIVE_21536658_7139
WRH$_ACTIVE_SESSION_HISTORY    WRH$_ACTIVE_21536658_7884
WRH$_ACTIVE_SESSION_HISTORY    WRH$_ACTIVE_SES_MXDB_MXSN

--
-- Follow DOC ID 287679.1 to shrink the tablespace
--
SQL> SELECT COUNT(1) Orphaned_ASH_Rows
    FROM wrh$_active_session_history a
    WHERE NOT EXISTS
      (SELECT 1
      FROM wrm$_snapshot
      WHERE snap_id       = a.snap_id
      AND dbid            = a.dbid
      AND instance_number = a.instance_number
      );

ORPHANED_ASH_ROWS
-----------------
                0

-- To reclaim the freed space (since row movement is enabled on WRH$_ACTIVE_SESSION_HISTORY by default)
SQL> alter table WRH$_ACTIVE_SESSION_HISTORY shrink space;

Table altered.

SQL> column OCCUPANT_NAME format a15
SQL> SELECT occupant_name,
      occupant_desc,
      space_usage_kbytes
    FROM v$sysaux_occupants
    WHERE occupant_name LIKE '%AWR%';

OCCUPANT_NAME OCCUPANT_DESC                                          SPACE_USAGE_KBYTES
------------- ------------------------------------------------------ ------------------
SM/AWR        Server Manageability - Automatic Workload Repository   300416

--
-- Check again after a few days: 
-- It seems the old partition was dropped automatically and a new one was created.
SQL> select table_name,partition_name
     from dba_tab_partitions
     where table_name = 'WRH$_ACTIVE_SESSION_HISTORY'; 

TABLE_NAME                     PARTITION_NAME
------------------------------ ------------------------------
WRH$_ACTIVE_SESSION_HISTORY    WRH$_ACTIVE_21536658_7885
WRH$_ACTIVE_SESSION_HISTORY    WRH$_ACTIVE_SES_MXDB_MXSN

-- And new entries are written into table WRH$_ACTIVE_SESSION_HISTORY
SQL> select count(*) from WRH$_ACTIVE_SESSION_HISTORY;

  COUNT(*)
----------
     31569

-- while the Snapshot retention keeps unchanged.
SQL> SELECT snap_interval, retention, most_recent_purge_time
    FROM sys.wrm$_wr_control;

SNAP_INTERVAL     RETENTION         MOST_RECENT_PURGE_TIME
----------------- ----------------- -------------------------
+00000 01:00:00.0 +00007 00:00:00.0 11-DEC-13 12.00.29.165 AM