ERP on DB

Tuesday, September 21, 2010

SQLPLUS Hang

Discoverer users reports to me that they can not run Discoverer Plus reports. On the "Request Progress" screen after entering Discoverer login information, they get the little clock has been around multiple times without making progress.

To address the issue, I stopped the Discoverer services on the server, and then failed to start them. To make sure there is no problem on the Discoverer configuration, I tested Sql*Plus on the server and found it hung by just staying on the logging in:

SQL> connect userID/passwd@TNS_STRING

The strange thing was that if I logged onto the same Linux (SuSe 10) server as a different OS user using LDAP Authentication, userID could run Sql*Plus to log onto the TNS_STRING database without problem.

The trace on sqlnet.ora did not help much. Finally, an Oracle document (736509.1) suggests that the nscd (name service cache daemon) is reqired to run for Sql*Plus to work. After Systems Administrator started the nscd daemon, both Discoverer services and Sql*Plus worked well on the server.

NSCD should be started automatically during server reboot and keeps running all the time. Now, the question becomes what did stop the nscd running?

Wednesday, September 1, 2010

Rollback of Large Transaction by SMON

After killing a large running transaction or shadow process, or aborting the database, database seems to hang or smon and parallel query servers take all the available CPU. You may also see entry in trace log file (in 11gR2):

Parallel Transaction recovery coordinatorcaught exception 10388

During the Apps upgrade from EBS 11i to EBS R12 after the database was upgraded from 10g to 11gR2, Apps DBA stopped the Apps Upgrade patching and re-started it for a couple of times on Apps server. Then, we experenced the poor performance on the database for Apps patching jobs. We did not have ideas on what was the cause until we realized that rollback recovering was running in the database. There are two views to identify the undo recovering:

v$fast_start_transactions: contains one row for each one of the transactions that Oracle is recovering in Parallel.

v$fast_start_servers: provides information about all the recovery slaves performing parallel transaction recovery. This contains one row for each parallel query slave.

Note 414242.1 provides a query to estimate the finish time if recovering is going. Here is the result from my database

SQL> set linesize 100
SQL> alter session set NLS_DATE_FORMAT='DD-MON-YYYY HH24:MI:SS';

Session altered.

SQL> select usn, state, undoblockstotal "Total", undoblocksdone "Done", undoblockstotal-undoblocksdone "ToDo",
decode(cputime,0,'unknown',sysdate+(((undoblockstotal-undoblocksdone) / (undoblocksdone / cputime)) / 86400)) "Estimated time to complete"
from v$fast_start_transactions;

USN STATE Total Done ToDo Estimated time to complete
--- ------------------ ----------- ------- ------ ---------------------------------
41 RECOVERING 254103 303 253800 03-SEP-2010 03:33:56

SQL>

After the recovering completed, the query result becomes

USN STATE Total Done ToDo Estimated time to complete
--- ------------------ ----------- ------- ------ ---------------------------------
41 RECOVERED 112 112 0 03-SEP-2010 09:22:03

And after the recovering completed, there is no row in view v$fast_start_servers and no row returning from "select ktuxesiz from x$ktuxe where ktuxesta='ACTIVE' and ktuxecfl='DEAD';"

If you want to know waht is recovering, below query will help to identify the parallel child sessions:

select
decode(px.qcinst_id,NULL,username,
' - '||lower(substr(s.program,length(s.program)-4,4) ) ) "Username",
decode(px.qcinst_id,NULL, 'QC', '(Slave)') "QC/Slave" ,
to_char( px.server_set) "Slave Set",
to_char(s.sid) "SID",
decode(px.qcinst_id, NULL ,to_char(s.sid) ,px.qcsid) "QC SID",
px.req_degree "Requested DOP",
px.degree "Actual DOP"
from gv$px_session px, gv$session s
where px.sid=s.sid (+) and px.serial#=s.serial#
order by 5 , 1 desc ;

Then, using each SID to query gv$active_session_history (or gv$session_longops) to get more details. Its CURRENT_OBJ# column gives the object_id for the object which is being recovered.

"QC SID" from above query is the session ID for SMON. Dring transaction recovering, the EVENT column in v$session have value "Wait for stopper event to be increased" for the SMON session.

Note 238507.1 gives a way to disable Parallel Transaction Recovery.

If you can shutdown database, bounce the database after defining a parameter inside init.ora file (144332.1):

fast_start_parallel_rollback = false

FAST_START_PARALLEL_ROLLBACK defines the maximum number of processes which may exist for performing parallel rollback. If the value is FALSE, parallel rollback is disabled. If the value is LOW (default), 2 * CPU_COUNT number of processes may be used. If the value is HIGH, at most 4 * CPU_COUNT number of rollback servers are used for parallel rollback.

Monday, May 24, 2010

Datafile Needs Recovery

When there is a disk read/write problem, such as adapter issues connecting SAN luns to server or other server IO problems, Oracle data files may fail to get checkpoint. In this case, Oracle may place the datafile offline and write errors to the alert log:

Errors in file /u01/app/oracle/admin/dw/bdump/dw_ckpt_1081360.trc:
ORA-01171: datafile 7 going offline due to error advancing checkpoint
ORA-01110: data file 7 : '/u06/app/oracle/oradata/dw/dw_idx01.dbf'
ORA-01115: IO error reading block from file 7 (block # 1)
ORA-27063: number of bytes read/written is incorrect
IBM AIX RISC System/6000 Error: 5: I/O error
Additional information: -1
Additional information: 16384

Database view DBA_DATA_FILES will confirm that the datafile needs recovery:

SQL> select * from dba_data_files
where online_status != 'ONLINE';

FILE_NAME ONLINE_STATUS
/u06/app/oracle/oradata/dw/dw_idx01.dbf RECOVER
/u02/app/oracle/oradata/dw/system01.dbf SYSTEM

The fix to the problem is to run two "alter database ..." commands to bring the data file back online, after the disk IO issue has been firmly resolved. Here is the alert log when the two commands ran:

Fri May 7 23:25:30 2010
alter database recover datafile '/u06/app/oracle/oradata/dw/dw_idx01.dbf'
Fri May 7 23:25:30 2010
Media Recovery Start
parallel recovery started with 7 processes
Fri May 7 23:25:31 2010
Recovery of Online Redo Log: Thread 1 Group 2 Seq 70541 Reading mem 0
Mem# 0: /u03/app/oracle/oradata/dw/redo02a.log
Mem# 1: /u03/app/oracle/oradata/dw/redo02b.log
Fri May 7 23:25:34 2010
Recovery of Online Redo Log: Thread 1 Group 3 Seq 70542 Reading mem 0
Mem# 0: /u04/app/oracle/oradata/dw/redo03a.log
Mem# 1: /u04/app/oracle/oradata/dw/redo03b.log
Fri May 7 23:25:36 2010
Media Recovery Complete (dw)
Completed: alter database recover datafile '/u06/app/oracle/oradata/dw/dw_idx01.dbf'

Fri May 7 23:26:07 2010
alter database datafile '/u06/app/oracle/oradata/dw/dw_idx01.dbf' online
Fri May 7 23:26:07 2010
Starting control autobackup
Control autobackup written to DISK device
handle '/u01/app/oracle/product/10.2.0/db_1/dbs/c-1219972149-20100507-02'
Completed: alter database datafile '/u06/app/oracle/oradata/dw/dw_idx01.dbf' online

Thursday, May 13, 2010

Install Oracle Fusion Middleware 11gR1

Oracle Fusion Middleware expands Oracle Applications Server 10g to include more components that you may never need.

I studies the way of installing Oracle Dsicoverer 11.1.1 for Oracle E-business Realease 12 (1074326.1). Here are some key steps to make the installation work.

1. Indentify the installation files from download.

As of today 5/14/10, the location is
http://www.oracle.com/technology/software/products/middleware/htdocs/fmw_11_download.html

This page also has a link to downlaod WebLogic
http://www.oracle.com/technology/software/products/ias/htdocs/wls_main.html

File names in three areas:

- Repository Creation Utility (RCU, 11.1.1.2.1)
ofm_rcu_linux_11.1.1.2.1_disk1_1of1.zip

- Weblogic server (10.3.2)
wls1032_generic.jar

- Portal, Forms, Reports and Discoverer (11.1.1.2.0)
ofm_pfrd_linux_11.1.1.2.0_64_disk1_1of1.zip
ofm_pfrd_linux_11.1.1.2.0_64_disk2_1of1.zip
ofm_pfrd_linux_11.1.1.2.0_64_disk3_1of1.zip
ofm_pfrd_linux_11.1.1.2.0_64_disk4_1of1.zip

2. Need a database for repository

The supported database versions by Oracle Fusion Middleware 11gR1 are

Oracle 10.2.0.4+ (if it's a 10g)
Oracle 11.1.0.7+ (if it's an 11gR1)
Oracle 11.2.0.1+ (if it's an 11gR2)

3. Create repository schemas

Run RCU to install schemas for Oracle Portal and Oracle Discoverer. RCU only works on Linux platform. If the database is on other platforms, run RCU remotely connecting to the database.

4. Install WebLogic server

Run the .jar file to create new Middleware Home directory. Java must be available (by the PATH env variable) for the installtion, and its version must be high enough. Here is the way to check Java version:

$ ./java -version
java version "1.6.0_20"
Java(TM) SE Runtime Environment (build 1.6.0_20-b02)
Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode)

5. Install Oracle Portal, Forms, Reports and Discoverer

Following guid has screenshots on installation:

Installation Guide for Oracle Portal, Forms, Reports and Discoverer
11g Release 1 (11.1.1) E10421-02

Wednesday, April 21, 2010

init parameter PROCESSES

Initialization parameter PROCESSES defines the max number of user connections that can access the database at a same time. The default value for the parameter is 300.

If the max number exceeds the parameter value, new connection will not be able to connect to the database. Grid Control will report the connection problem and the database will give error in trace file:

ORA-00020: maximum number of processes (300) exceeded

In that case, the issues can be confirmed by number of OS sessions and by database view v$process:

$ echo $ORACLE_SID
$ ps -ef | grep $ORACLE_SID | wc -l
299

SQL> select count(*) from v$process;

COUNT(*)
----------
297

To increase the value, run an alter statement. Then a database bounce is necessary.

SQL> alter system set processes=500 scope=pfile;

I read that below three parameters are related by a formula (in 10G, but might not be always). If database reports ORA-00018 error, you may also consider modifying parameter PROCESSES (or find why the number of sessions jumped).

ORA-00018: maximum number of sessions exceeded

processes=x
sessions=x*1.1+5
transactions=sessions*1.1