Thursday, September 12, 2019

Concurrent job hangs and stays at "Running" forever

To make all R12.1 services run on a single node, instead of on multiple apps-tier nodes, I followed 4 steps:

1. Edited $CONTEXT_FILE to make all 5 variables "enabled":

Root Service Group : s_root_status
Web Entry Point Services : s_web_entry_status
Web Application Services : s_web_applications_status
Batch Processing Services : s_batch_status
Other Service Group : s_other_service_group_status

2. Ran autoconfig on the single node 
3. Start all services, including concurrent managers on the single node. 
4. Keep all services down in all other nodes.

Everything worked without a problem. But, when I ran a concurrent request, such as "Active Users", the job stays hung and frozen forever.  Below query show it reached the database:

SQL> SELECT a.request_id, d.sid, d.serial# ,d.osuser,d.process , c.SPID
FROM apps.fnd_concurrent_requests a, apps.fnd_concurrent_processes b, v$process c, v$session d
WHERE a.controlling_manager = b.concurrent_process_id AND c.pid = b.oracle_process_id
AND b.session_id=d.audsid AND a.request_id = &Request_ID AND a.phase_code = 'R';

REQUEST_ID  SID    SERIAL#  OSUSER  PROCESS  SPID
----------------- ------- -------------- ----------- --------------- -------
33925067         2345   40605         ebsdev      18208          42991822

And, database server has its process  
$ ps -ef | grep 42991822
 oracle 42991822        1   0 16:31:07      -  0:00 oracleEBSDEV (LOCAL=NO)

After I cancelled the request from GUI Forms, the Standard Manager got crashed and posted a very misleading error in Manager's log: 

12-MON-20XX 21:55:43
Request  : 33925067
Priority : 50
Program  : 0/20641
State    : T

12-MON-20XX 21:55:43
Attempting process termination for process 17331 on node nodeName
12-MAR-2021 21:55:43 - Could not submit job to kill request session 33925067:

No such process
An error occured in client-side routine afpsmckp for Service Manager FNDSM_NodeName_EBSDEV.  The routine returned code 1. Check for preceding errors and as well as the service manager log file for further details."

12-MON-20xx 21:57:44 - Could not submit job to kill concurrent process 260751: Oracle error 100: ORA-01403: no data found has been detected in FND_CONC_RAC_UTILS.SUBMIT_MANAGER_KILL_SESSION.
Found dead process: spid=(17331), cpid=(260751), ORA pid=(74), manager=(0/0)
The real problem is exactly as described in Doc ID 737445.1 (R12 Concurrent Requests Run Forever, rwrun Errors with REP-50125) and the fix is to delete file $ORACLE_HOME/reports/conf/rwnetwork.conf.  Only below line will give the true error in file areport.trc.

$ $INST_TOP/ora/10.1.2/bin/appsrwrun.sh userid=apps/appsPWD mode=character report=$FND_TOP/reports/US/FNDSCURS.rdf \
batch=yes destype=file desname=./areport.out desformat=$FND_TOP/reports/HPL pagesize=132x66 traceopts=trace_all tracefile=areport.trc tracemode=trace_replace 

By the way, to totally remove a node from multiple nodes structure, run two steps BEFORE above 4 steps:
a) SQL>  EXEC FND_CONC_CLONE.SETUP_CLEAN 
b) Run autoconfig on database server.

Then, below two table shall have no data:
SQL> select * from fnd_concurrent_queues;
SQL> select * from fnd_concurrent_processes;
And below two tables shall have only one row:
SQL> select * from fnd_nodes;
SQL> select * from fnd_conflicts_domain;