ERP on DB: September 2010

Tuesday, September 28, 2010

CPU Usage by User Session

Oracle system views can be used to find the amount of CPU consumed by each database session. This script will display the CPU information:

SELECT
s.username, t.SID, p.spid DB_OS_ID,
t.VALUE/100 cpu_usage_seconds
FROM v$session s, v$sesstat t, v$statname n, v$process p
WHERE t.STATISTIC# = n.STATISTIC#
and n.NAME like '%CPU used by this session%'
and t.SID = s.SID
and s.paddr(+) = p.addr
and s.status='ACTIVE'
and s.username is not null
order by VALUE desc;

The OS commend "ps -ef" also gives CPU info in the 4th column of the output.

To troubleshoot performance issue, you can use below query and the session ID from above query to find the SQL statement for tuning candidate:

SELECT s.username, s.osuser, s.machine, s.program, s.status, s.sid, s.serial#, s.process client_pid, p.spid DB_os_pid, t.sql_text, t.disk_reads, t.executions
FROM v$session s, v$process p, v$sqlarea t
WHERE s.paddr = p.addr and s.sql_address = t.address(+)
and s.sid = &SID

In Oracle 11g, if the init parameter audit_trail is set to DB level by

SQL> alter system set audit_trail='DB' scope=spfile;

And then bounce the database, view dba_audit_session (or sys.aud$) has a column SESSION_CPU displaying session's CPU usage.

Friday, September 24, 2010

Memory used by a DB session

When a database session is slow, it may be helpful on finding the cause by checking the memory consumed by the session.

1. Identify the session ID (SID), if necessary. Such as:
select * from v$session where lower(program) like '%exe%';

2. Identify the process ID (DB_os_pid / spid) on the Database server:

-- If you know the database sesssion ID, run below query
Select s.username, s.osuser, s.machine, s.program, s.status, s.sid, s.serial#, s.process client_pid, p.spid DB_os_pid, t.sql_text, t.disk_reads, t.executions
from v$session s, v$process p, v$sqlarea t
where s.paddr = p.addr and s.sql_address = t.address(+)
and s.sid = &SID
;

-- If you know the process ID on the client machine (such as Apps server)
Select s.username, s.osuser, s.machine, s.program, s.status, s.sid, s.serial#, s.process client_pid, p.spid DB_os_pid, t.sql_text, t.disk_reads, t.executions
from v$session s, v$process p, v$sqlarea t
where s.paddr = p.addr and s.sql_address = t.address(+)
and s.process = '&clientpid' -- the process ID on the client/MT server
;

3. Run query to see how much memory is used by the session:
select spid, program, PGA_USED_MEM/1024/1024 memory_mb from v$process
where spid = DB_os_pid;

4. Run about query from time to time when the database session is ACTIVE. If the memory value keeps increasing, there may be a memory leak on the session.

5. On the OS level, run "ps -ef | grep 21561522" to confirm the session time and other characters (assuming 21561522 is the DB_os_pid).

Tuesday, September 21, 2010

SQLPLUS Hang

Discoverer users reports to me that they can not run Discoverer Plus reports. On the "Request Progress" screen after entering Discoverer login information, they get the little clock has been around multiple times without making progress.

To address the issue, I stopped the Discoverer services on the server, and then failed to start them. To make sure there is no problem on the Discoverer configuration, I tested Sql*Plus on the server and found it hung by just staying on the logging in:

SQL> connect userID/passwd@TNS_STRING

The strange thing was that if I logged onto the same Linux (SuSe 10) server as a different OS user using LDAP Authentication, userID could run Sql*Plus to log onto the TNS_STRING database without problem.

The trace on sqlnet.ora did not help much. Finally, an Oracle document (736509.1) suggests that the nscd (name service cache daemon) is reqired to run for Sql*Plus to work. After Systems Administrator started the nscd daemon, both Discoverer services and Sql*Plus worked well on the server.

NSCD should be started automatically during server reboot and keeps running all the time. Now, the question becomes what did stop the nscd running?

Wednesday, September 1, 2010

Rollback of Large Transaction by SMON

After killing a large running transaction or shadow process, or aborting the database, database seems to hang or smon and parallel query servers take all the available CPU. You may also see entry in trace log file (in 11gR2):

Parallel Transaction recovery coordinatorcaught exception 10388

During the Apps upgrade from EBS 11i to EBS R12 after the database was upgraded from 10g to 11gR2, Apps DBA stopped the Apps Upgrade patching and re-started it for a couple of times on Apps server. Then, we experenced the poor performance on the database for Apps patching jobs. We did not have ideas on what was the cause until we realized that rollback recovering was running in the database. There are two views to identify the undo recovering:

v$fast_start_transactions: contains one row for each one of the transactions that Oracle is recovering in Parallel.

v$fast_start_servers: provides information about all the recovery slaves performing parallel transaction recovery. This contains one row for each parallel query slave.

Note 414242.1 provides a query to estimate the finish time if recovering is going. Here is the result from my database

SQL> set linesize 100
SQL> alter session set NLS_DATE_FORMAT='DD-MON-YYYY HH24:MI:SS';

Session altered.

SQL> select usn, state, undoblockstotal "Total", undoblocksdone "Done", undoblockstotal-undoblocksdone "ToDo",
decode(cputime,0,'unknown',sysdate+(((undoblockstotal-undoblocksdone) / (undoblocksdone / cputime)) / 86400)) "Estimated time to complete"
from v$fast_start_transactions;

USN STATE Total Done ToDo Estimated time to complete
--- ------------------ ----------- ------- ------ ---------------------------------
41 RECOVERING 254103 303 253800 03-SEP-2010 03:33:56

SQL>

After the recovering completed, the query result becomes

USN STATE Total Done ToDo Estimated time to complete
--- ------------------ ----------- ------- ------ ---------------------------------
41 RECOVERED 112 112 0 03-SEP-2010 09:22:03

And after the recovering completed, there is no row in view v$fast_start_servers and no row returning from "select ktuxesiz from x$ktuxe where ktuxesta='ACTIVE' and ktuxecfl='DEAD';"

If you want to know waht is recovering, below query will help to identify the parallel child sessions:

select
decode(px.qcinst_id,NULL,username,
' - '||lower(substr(s.program,length(s.program)-4,4) ) ) "Username",
decode(px.qcinst_id,NULL, 'QC', '(Slave)') "QC/Slave" ,
to_char( px.server_set) "Slave Set",
to_char(s.sid) "SID",
decode(px.qcinst_id, NULL ,to_char(s.sid) ,px.qcsid) "QC SID",
px.req_degree "Requested DOP",
px.degree "Actual DOP"
from gv$px_session px, gv$session s
where px.sid=s.sid (+) and px.serial#=s.serial#
order by 5 , 1 desc ;

Then, using each SID to query gv$active_session_history (or gv$session_longops) to get more details. Its CURRENT_OBJ# column gives the object_id for the object which is being recovered.

"QC SID" from above query is the session ID for SMON. Dring transaction recovering, the EVENT column in v$session have value "Wait for stopper event to be increased" for the SMON session.

Note 238507.1 gives a way to disable Parallel Transaction Recovery.

If you can shutdown database, bounce the database after defining a parameter inside init.ora file (144332.1):

fast_start_parallel_rollback = false

FAST_START_PARALLEL_ROLLBACK defines the maximum number of processes which may exist for performing parallel rollback. If the value is FALSE, parallel rollback is disabled. If the value is LOW (default), 2 * CPU_COUNT number of processes may be used. If the value is HIGH, at most 4 * CPU_COUNT number of rollback servers are used for parallel rollback.