Thursday, May 19, 2022

Apache (OHS) in R12.2 failed to stop and refused to start

After Linux server patching and reboot, alerts were sent out that httpd process for OHS (Oracle HTTP Server) in a production instance was not running on the server. I tried to stop/start it without luck. Somehow it tied to a pid owned by root very strangely (or, it ties to a pid that does not exist). 

$ ps -ef | grep httpd         <== No httpd running

$ adapcctl.sh start
$ adopmnctl.sh status
Processes in Instance: EBS_web_EBSPROD_OHS1
------------------------ --------------+------------------+-------+-------
ias-component                     | process-type    |     pid | status
------------------------- -------------+------------------+-------+-------
EBS_web_EBSPROD         | OHS                |    919 | Stop

$ ps -ef | grep 919 (or, process 919 does not exist)
root       919     2  0 09:32 ?        00:00:00 [xxxxxx]

$ iName=$(tr < $CONTEXT_FILE '<>' '  ' | awk '/"s_ohs_instance"/ {print $(NF-1)}' )
$ SUBiName=${iName%?????}
$ cd $FMW_HOME/webtier/instances/$iName/diagnostics/logs/OHS/$SUBiName

The log file shows many lines of message:
--------
22/0X/05 02:47:03 Stop process
--------
$FMW_HOME/webtier/ohs/bin/apachectl stop: httpd (no pid file) not running
--------
22/0X/05 02:48:03 Stop process
--------
$FMW_HOME/webtier/ohs/bin/apachectl hardstop: httpd (no pid file) not running

File httpd.pid shall reside in this log folder, which is defined in httpd.conf in $FMW_HOME/webtier/instances/$iName/config/OHS/$SUBiName (or, $IAS_ORACLE_HOME/instances/$iName/config/OHS/$SUBiName) in R12.2. I believe the problem is httpd.pid was removed BEFORE "adapcctl.sh stop" fully completed, maybe due to Linux server crash or power off.  Normally, "adapcctl.sh stop" checks it and then removes it. Because of that, adapcctl.sh failed on checking a status and refused to start Apache.

Additionally, opmn logs can be found in $FMW_HOME/webtier/instances/$iName/diagnostics/logs/OPMN/opmn

The workaround:

1. Stop/kill all opmn processes  (keeping WLS related processes will be fine) 
$ sh $ADMIN_SCRIPTS_HOME/adopmnctl.sh stop
$ ps -ef | grep opmn

2. Create a empty file 
$ cd $FMW_HOME/webtier/instances/$iName/diagnostics/logs/OHS/$SUBiName
$ touch httpd.pid

3. Clear a folder (important step) 
$ cd $FMW_HOME/webtier/instances/$iName/config/OPMN/opmn
$ ls -al states
-rw-r----- 1 user group 19 Jun 21 18:57 .opmndat
-rw-r----- 1 user group 579 Jun 21 18:54 p1878855085
$ mv states states_BK
$ mkdir states
$ ls -al states

4. Now, starting Apache shall work
$ ./adapcctl.sh start
$ ./adopmnctl.sh status
$ ps -ef | grep httpd | wc -l
4                  <== 3 httpd.worker processes running

5. Make sure all work
./adstpall.sh apps/appsPWD
./adstrtal.sh apps/appsPWD
./adopmnctl.sh status

When Apache (OHS) starts up, it writes the process ID (PID) of the parent httpd process to the httpd.pid file. When Apache is running, file httpd.pid shall exist and not be empty.

Wednesday, May 4, 2022

FRM-40735 on some custom Forms & Forms trace

Some users (but not all users) can not open EBS forms by a pop-up error:
FRM-40735: ON-ERROR trigger raised unhandled exception ORA-06508.

Before that happened, we got alerts the diskspace for holding Forms temp file (defined by forms_tmpdir) was full briefly when "deleted" files keep staying in memory or somewhere. 

Log from Forms process are at $FMW_HOME/user_projects/domains/EBS_domain_${TWO_TASK}/servers/forms_server1/logs. But individual forms' error may not be written to it. The only way to get a direction on finding the root cause is to turn on FRD trace. See Oracle Doc ID 2796573.1 (EBS:FRD Trace in EBS 12.2).

1. Log into EBS > profile > system
 select "user" > give the username (who is going to reproduce the issue)
 Profile : ICX: Forms Launcher > click on find
 Set the profile for the user as https://hostname:[port_number]/forms/frmservlet?record=collect
 logout of EBS

2. Go to control panel > java > advanced
enable logging and show console

Now login to EBS with the username (who is going to reproduce the issue)
(in the Java Console search for "record=collect" to confirm it is being used)
Open forms > reproduce the issue
 
3. Log into EBS server as OS owner of the application (putty session), get the file from the trace path
 
$ echo $FORMS_TRACE_DIR
$ cd $FORMS_TRACE_DIR
$ ls -lrt *collect*

In my case, I see messages in one of "collect" files:
Opened file: $CUSTOM_TOP/12.0.0/forms/US/XXARXTWM.fmx
Error Message: FRM-40039: Cannot attach library ARXCOQIT while opening form ARXTWMAI.

ON-ERROR Trigger Fired:
Form: ARXTWMAI

State Delta:
ARXTWMAI, 21, Trigger, Entry, 758240456, ON-ERROR

ARXTWMAI, 22, Prog Unit, Entry, 758567456, /ARXTWMAI-6/P120_26_SEP_202102_21_28

Unhandled Exception ORA-06508
State Delta:

Error Message: FRM-40735: ON-ERROR trigger raised unhandled exception ORA-06508.
ARXTWMAI, 22, Trigger, Exit, 765768456, ON-ERROR

# 16 - ARXTWMAI:<null>.<null>.1648228229694425000

Nothing showed the true problem. But when I look around the folder, I see file $AU_TOP/resource/ARXCWWIN.plx had a new timestamp and 0 bytes. After I copied the same file from another node to replace it, all forms errors went away.

Very strange thing is why that file became 0 bytes! One possibility is when the folder for holding the temp forms file was full and Linux Admin removed some temp files in memory by file ID, some unexpected thing happened.