Monday, October 27, 2014

Apache httpd services fail to start

R12 Apache server does not start. But the log file reports exiting status 0, which means normal.

Error
   --> Process (index=1,uid=41488097,pid=14132)
   failed to start a managed process after the maximum retry limit
   Log:
   /u02/app/EBSPROD/inst/apps/
<CONTEXT name>/logs/ora/10.1.3/opmn/HTTP_Server~1.log
   10/26/2014-10:04:39 : : adapcctl.sh: existing with status 0

 
HTTP_Server~1.log file does not give any useful information as well:
   14/10/26 10:04:39 Start process
   --------
   /u02/app/EBSPROD/inst/apps/
<CONTEXT name>/ora/10.1.3/Apache/Apache/bin/apachectl startssl: execing httpd


After deleted httpd.pid file in $INST_TOP/pids/10.1.3/Apache and added below debug options to httpd.conf file in $ORA_CONFIG_HOME/10.1.3/Apache/Apache/conf, "adapcctl.sh start" can start the httpd services. See Doc ID 422419.1 on how to enable debug logs.

LogLevel debug
OraLogMode oracle
OraLogSeverity TRACE:32
OraLogDir /u02/app/EBSPROD/inst/apps/<CONTEXT name>/logs/ora/10.1.3/Apache/oracle
 
But, The real fix seems to delete Apache folder under $LOG_HOME/ora/10.1.3 with all httpd log files, and then re-create it (and sub-folder oracle). I believe the cause was some log file exceeded 2GB limit in size.

Troubleshooting:

Logs for troubleshooting on startup:
$LOG_HOME/ora/10.1.3/Apache
$LOG_HOME/ora/10.1.3/opmn/     <= Check file HTTP_Server~1.log here if Apache does not start & no logs
$LOG_HOME/ora/10.1.3/j2ee/oacore/oacore_default_group_1/application.log      <= Notes: For example, this file got 8GB in size due to all 10.1.3.5.0 Container error entries (bug 10126440 ? )
$LOG_HOME/ora/10.1.3/j2ee/forms/forms_default_group_1/application.log

Other files can be checked with egrep -i 'fail|error' to see if any issues, when getting "Page not found":
$LOG_HOME/ora/10.1.3/opmn/oacore_default_group_1/oacorestd.err
$LOG_HOME/appl/rgf/javacache.log   (Java Object Cache (JOC) log, $APPLRGF)

- start APACHE by adapcctl.sh start
Make sure below lines do not give back any error:
./adapcctl.sh status
./adalnctl.sh status
If Apache is up and running with enabled ssl, below URL shall work
https://siteName.domian.com:ssl_port
siteName.domain.com can be replaced by its DNS ip address for troubleshooting (with possible "Certificate error" because the certs file is issued for the original site URL).
By this time, the re-direction to login page will not work because it needs oacore process running.

- start OACORE process (in 10.1.3 Oracle_Home) by adoacorectl.sh start
If the login page is unavailable immediately, check the opmn log first
$LOG_HOME/ora/10.1.3/opmn/default_group~oacore~default_group~1.log
If the login page gets internal errors or hangs, but below URLs work (in R12):
https://ip_address:ssl_port/OA_HTML/ServletPing
https://ip_address:ssl_port/OA_HTML/jsp/fnd/aoljtest.jsp   (it may not be configured in some sites)
https://siteName.domian.com:ssl_port/OA_HTML/jsp/fnd/aoljtest.jsp    -- bring up a test HTML form
check log files in $LOG_HOME/ora/10.1.3/j2ee/oacore/oacore_default_group_1 to get the real error.

-start FORMS process by adformsctl.sh start
After forms services are running, you shall be able to open forms by JRE popup. If forms does not open, check logs in $LOG_HOME/ora/10.1.3/j2ee/forms/forms_default_group_1 for any error.

- start OAFM OC4J by adoafmctl.sh start

If Apache is running on a server with IP 188.xx.67.77, TELNET on port s_webport (8020) returns below message in my R12.1 environment:
$ telnet 188.xx.67.77 8020
Trying...
Connected to 188.xx.67.77.
Escape character is '^]'.
GET
  <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
  <HTML><HEAD>
  <TITLE>400 Bad Request</TITLE>
  </HEAD><BODY>
  <H1>Bad Request</H1>
  Your browser sent a request that this server could not understand.<P>
  invalid request-URI <P>
  </BODY></HTML>
  Connection closed.

But "$ telnet 188.xx.67.77 4463" and GET return only "Connection closed." (for unknown reason). 4463 is the ssl_prot in the ssl-enabled configuration.

Most of times, Apache works fine. The problem may come from company Security policy change, fire wall or F5 network settings. All you need is to provide evidence as proof.

NOTE1: after using "adapcctl.sh start" to start Apache services, do not use "adapcctl.sh stop" to stop httpd services, which may leave file httpd.pid on the file system and leave opmn processes still running. Instead, Use "adopmnctl.sh stopall" to stop opmn processes and httpd processes.

NOTE2: All Apaches logs and opmn logs  (or log directories) can be deleted safely, including
    $LOG_HOME/appl/admin/log/adapcctl.text
    $LOG_HOME/ora/10.1.3/opmn/HTTP_Server~1.log
See Doc ID 1964851.1 to clean the logs and enable debug.

NOTE3: If a port defined for Apache is not available, Apache will not start. One day, "adstrtal.sh" gave a similar message above. After added debug options to Apache conf file, I found the problem "make_sock: could not bind to port 4482". Usually profile option APPS_SERVLET_AGENT has the site URL (name and port).