Wednesday, September 1, 2010

Rollback of Large Transaction by SMON

After killing a large running transaction or shadow process, or aborting the database, database seems to hang or smon and parallel query servers take all the available CPU. You may also see entry in trace log file (in 11gR2):

Parallel Transaction recovery coordinatorcaught exception 10388

During the Apps upgrade from EBS 11i to EBS R12 after the database was upgraded from 10g to 11gR2, Apps DBA stopped the Apps Upgrade patching and re-started it for a couple of times on Apps server. Then, we experenced the poor performance on the database for Apps patching jobs. We did not have ideas on what was the cause until we realized that rollback recovering was running in the database. There are two views to identify the undo recovering:

v$fast_start_transactions: contains one row for each one of the transactions that Oracle is recovering in Parallel.

v$fast_start_servers: provides information about all the recovery slaves performing parallel transaction recovery. This contains one row for each parallel query slave.

Note 414242.1 provides a query to estimate the finish time if recovering is going. Here is the result from my database

SQL> set linesize 100
SQL> alter session set NLS_DATE_FORMAT='DD-MON-YYYY HH24:MI:SS';

Session altered.

SQL> select usn, state, undoblockstotal "Total", undoblocksdone "Done", undoblockstotal-undoblocksdone "ToDo",
decode(cputime,0,'unknown',sysdate+(((undoblockstotal-undoblocksdone) / (undoblocksdone / cputime)) / 86400)) "Estimated time to complete"
from v$fast_start_transactions;

USN STATE Total Done ToDo Estimated time to complete
--- ------------------ ----------- ------- ------ ---------------------------------
41 RECOVERING 254103 303 253800 03-SEP-2010 03:33:56

SQL>

After the recovering completed, the query result becomes

USN STATE Total Done ToDo Estimated time to complete
--- ------------------ ----------- ------- ------ ---------------------------------
41 RECOVERED 112 112 0 03-SEP-2010 09:22:03

And after the recovering completed, there is no row in view v$fast_start_servers and no row returning from "select ktuxesiz from x$ktuxe where ktuxesta='ACTIVE' and ktuxecfl='DEAD';"

If you want to know waht is recovering, below query will help to identify the parallel child sessions:

select
decode(px.qcinst_id,NULL,username,
' - '||lower(substr(s.program,length(s.program)-4,4) ) ) "Username",
decode(px.qcinst_id,NULL, 'QC', '(Slave)') "QC/Slave" ,
to_char( px.server_set) "Slave Set",
to_char(s.sid) "SID",
decode(px.qcinst_id, NULL ,to_char(s.sid) ,px.qcsid) "QC SID",
px.req_degree "Requested DOP",
px.degree "Actual DOP"
from gv$px_session px, gv$session s
where px.sid=s.sid (+) and px.serial#=s.serial#
order by 5 , 1 desc ;

Then, using each SID to query gv$active_session_history (or gv$session_longops) to get more details. Its CURRENT_OBJ# column gives the object_id for the object which is being recovered.

"QC SID" from above query is the session ID for SMON. Dring transaction recovering, the EVENT column in v$session have value "Wait for stopper event to be increased" for the SMON session.

Note 238507.1 gives a way to disable Parallel Transaction Recovery.

If you can shutdown database, bounce the database after defining a parameter inside init.ora file (144332.1):

fast_start_parallel_rollback = false

FAST_START_PARALLEL_ROLLBACK defines the maximum number of processes which may exist for performing parallel rollback. If the value is FALSE, parallel rollback is disabled. If the value is LOW (default), 2 * CPU_COUNT number of processes may be used. If the value is HIGH, at most 4 * CPU_COUNT number of rollback servers are used for parallel rollback.

1 comment:

halimdba said...

hi
very nice. its help me to find out the ghost.

Thanks
Muhammad Abdul Halim
http://halimdba.blogspot.com