Friday, March 11, 2022

R12.2 site is down with error "Failure of Web Server bridge"

EBS R12.2 site is not accessible and login page gives error: 

Failure of Web Server bridge:

No backend server available for connection: timed out after 10 seconds or idempotent set to OFF or method not idempotent.

The message does not tell much about the root cause. Most likely it happens when a large data set was pulled by EBS users from database, which might make oacore crash. We took a few actions to address the problem.

1. Update NodeManager parameters via s_append_nm_jvmargs in ${CONTEXT_FILE} (by default, it is not set).  For details, see Doc ID 2300621.1 (Node Manager Consumes High Virtual Memory).

$ grep s_append_nm_jvmargs $CONTEXT_FILE
         <append_nm_jvmargs oa_var="s_append_nm_jvmargs">-XX:PermSize=256m -XX:MaxPermSize=512m -Xms1024m -Xmx1024m</append_nm_jvmargs>

2. Increase memory for oacore. Use the following steps to customize the managed server configuration via the WebLogic Server Administration Console. See Doc ID 2104351.1 ("java.lang.OutOfMemoryError, msg=Java heap space" Error While Uploading In APCC).

$ grep s_wls_adminport $CONTEXT_FILE

1). Log on to the WebLogic Server Administration Console as weblogic (http://[host.domain.com]:s_wls_adminport/console). 
2). Click on Environment => Servers (link).  
3). Click on oacore_server1 - the managed server whose configuration needs to be updated. 
4). Under Configuration (tab) => Server Start  (tab) => You can see the Arguments Section.
5). Click on Lock and Edit button in the 'Change Center' panel.
6). Update the Heap space from
    -XX:PermSize=128m -XX:MaxPermSize=384m -Xms512m -Xmx512m -
-Djava.security.policy=/.../server/lib/weblogic.policy
to new set:
  -XX:PermSize=512m -XX:MaxPermSize=512m -Xms2048m -Xmx4096m -Djava.security.policy=/.../server/lib/weblogic.policy
7). Click the 'Save' button to save the configuration changes.
8). Once the customizations are complete and saved, click the 'Activate Changes' button in the 'Change Center' panel to activate the changes.

Do the same for oacore_server2 if it is a multi-node instance.
(Same steps can also be done by running script  $ perl $AD_TOP/patch/115/bin/adProvisionEBS.pl)

3. Set the Inactive Connection Timeout parameter to one hour. Steps are in Doc ID 1940996.1 (Oracle E-Business Suite 12.2 Data Source Connection Pool Diagnostics)

Login to the WebLogic Server console
Navigate to Services (tree link) => Data Sources (link) => EBSDataSource (page link) => Connection Pool (tab) => Advanced (Expand arrow on the bottom).
Click on the Lock and Edit Button.
Update the Inactive Connection Timeout to the desired value in seconds: 3600 (for one hour).
Then, click on Save, and the 'Activate Changes' button.

4. Steps to ensure the timeouts are in sync:

Login to WebLogic Admin Console, and click on 'Lock & Edit'
Under 'Domain Structure', click on 'Deployments'
Go to the Next page until you see the 'oacore'
Click on the '+' sign next to the word 'oacore'
Click on the module '/OA_HTML'
Click on the 'Configuration' tab
Set 'Session Timeout (in seconds)' to 1800
Click 'Save'
Under 'Domain Structure', click on 'Deployments'
Go to the Next page until you see the 'oacore' and click on it
Click on the 'Configuration' tab
Set 'Session Timeout (in seconds)' to 1800
Click 'Save'
Click 'Release Configuration'

Set the 'ICX Session Timeout' profile option value at the site level to 30. (Notes: later business users requested to change this profile option to 120, which made them out of sync. But EBS site worked fine after that). 
Restart services via adstpall.sh / adstrtal.sh

5. Check oacore connections 
When oacore went down, it had a high number of Active Counts in the Console. Navigate to to Services (tree Link) => Data Sources (tree Link) => EBSDataSource (page link) => Monitoring (tab), It got

Server  ActiveConnectionsCurrentCount
oacore_server1 218
oacore_server2 119

- Doc ID 1940996.1 (Oracle E-Business Suite 12.2 Data Source Connection Pool Diagnostics) gives details on how to collect data on connection leaks.
Log in to the WebLogic Console. Click on Services (tree link) => Data Sources (tree link) => EBSDataSource (page link) => Monitoring (tab)...

- Doc ID 1905593.1 (Managing Configuration of Oracle HTTP Server and Web Application Services in Oracle E-Business Suite Release 12.2) give a way to add additional oacore managed servers depending upon the user load. 1 oacore JVM can handle up to 150 users. (See 4.4.1 Adding a new managed server)

6. Perform the following to enable debug and help isolate the problem:
- Login to the WLS console 
- Environment => Servers => Click on oacore_server<x> (link) for which debug will be enabled => Debug (tab)
- Use Edit and Lock to open the configuration
- Open the Weblogic => Servlet and select line for DebugHttp
- Click on Enable, and then Activate Changes (no bounce is required)

Now the debug is enabled and will log the HTTP request in the $EBS_DOMAIN_HOME/servers/oacore_server<x>/logs/oacore_server<x>.log file

When the dump occurs in server log is enabled this should allow to track back the request(s) just before the dump occurred. Remember to disable it after troubleshooting period.

7. Collecting Fusion Middleware Log Files ( Doc ID 1362900.1 ) (for Oracle Support)

Run this script as the owner of the applications file system to gather the log files for Fusion Middleware components such as NodeManager, AdminServer, forms, oacore and oafm services:
zip -r /tmp/`uname -n`_`date +%m%d%y.%H%M`_FMW.zip $IAS_ORACLE_HOME/../wlserver_10.3/common/nodemanager $EBS_DOMAIN_HOME/servers/oa*/logs/*out* $EBS_DOMAIN_HOME/servers/oa*/logs/*log* $EBS_DOMAIN_HOME/servers/forms*/logs/*out* $EBS_DOMAIN_HOME/servers/forms*/logs/*log* $EBS_DOMAIN_HOME/servers/AdminServer/logs/*out* $EBS_DOMAIN_HOME/servers/AdminServer/logs/*log* $EBS_DOMAIN_HOME/sysman/log/* $EBS_DOMAIN_HOME/servers/oac*/adr/diag/ofm/EBS_domain*/oac*/incident/* $EBS_DOMAIN_HOME/servers/forms_s*/adr/diag/ofm/EBS_domain*/forms_s*/incident/*

Oracle Support recommended some patches for "potentially" fixing the problem. But we did not apply them. Poor database performance might also cause users' session timeouts and hit this error.

8. How to bring the site up quickly?

When the problem happens and EBS site is down, the priority is to bring EBS site up and available. First of all, check if oacore is running or not. Use below command to find the PID of oacore :

$ ps -ef | grep $LOGNAME | grep oacore | grep -v grep | awk '{print $2}'
(or, $ ps -fu $LOGNAME | grep oacore | grep -v grep | awk '{print $2}' )

To find the true problem, also check WLS console is up or not: http://[master_node.domain.com]:wls_adminport/console

If WLS console is available, click on Servers to check the status of oacore process, and recycle/start it under the Control tab. Or, run

$ ./admanagedsrvctl.sh status oacore_server1
You are running admanagedsrvctl.sh version 120.14.12020000.12
Enter the WebLogic Admin password:
oacore_server1 is shutdown
Server specific logs are located at $FMW_HOME/user_projects/domains/EBS_domain_$TWO_TASK/servers/oacore_server1/logs
admanagedsrvctl.sh: exiting with status 0

I did experience that after " ./admanagedsrvctl.sh stop oacore_server1 " ran successfully, " ./admanagedsrvctl.sh status oacore_server1 " showed oacore_server1 was still running. I had to use "Force Shutdown" in the Console to stop it.

If console is not available, most likely Admin Server is down. 
$ adadminsrvctl.sh status
You are running adadminsrvctl.sh version 120.10.12020000.11
Enter the WebLogic Admin password:
Enter the APPS Schema password:
 The AdminServer is not running
AdminServer logs are located at $FMW_HOME/user_projects/domains/EBS_domain_$TWO_TASK/servers/AdminServer/logs
adadminsrvctl.sh: exiting with status 0
adadminsrvctl.sh: check the logfile $LOG_HOME/appl/admin/log/adadminsrvctl.txt for more information ...

In this case, oacore does not start:
$ ./admanagedsrvctl.sh start oacore_server1
You are running admanagedsrvctl.sh version 120.14.12020000.12
Enter the WebLogic Admin password:
Calling txkChkEBSDependecies.pl to perform dependency checks for oacore_server1
*** ALL THE FOLLOWING FILES ARE REQUIRED FOR RESOLVING RUNTIME ERRORS
*** Log File = $LOG_HOME/appl/rgf/TXK/txkChkEBSDependecies_.../txkChkEBSDependecies_....log 
Perl script txkChkEBSDependecies.pl got executed successfully
Starting oacore_server1...
Server specific logs are located at 
$FMW_HOME/user_projects/domains/EBS_domain_${TWO_TASK}/servers/oacore_server1/logs
admanagedsrvctl.sh: exiting with status 1
admanagedsrvctl.sh: check the logfile $LOG_HOME/appl/admin/log/adoacorectl.txt for more information ...  

In another crash, admanagedsrvctl.sh did not take the password and hung, and even adstrtal.sh did the same. We had to kill all OS processes. Log file oacore_server1.out in $FMW_HOME/user_projects/domains/EBS_domain_${TWO_TASK}/servers/oacore_server1/logs had message:

<Error> <ServletContext-/OA_HTML> <BEA-000000> <chain failed
javax.servlet.ServletException: java.lang.OutOfMemoryError: GC overhead limit exceeded
at weblogic.servlet.internal.ServletStubImpl.execute(ServletStubImpl.java:342)
at weblogic.servlet.internal.TailFilter.doFilter(TailFilter.java:26)
at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:60)
at oracle.apps.fnd.security.csrf.GuardFilter.doFilter(GuardFilter.java:311)
at weblogic.servlet.internal.FilterChainImpl.doFilter(FilterChainImpl.java:60)
Truncated. see log file for complete stacktrace
Caused By: java.lang.OutOfMemoryError: GC overhead limit exceeded

Memory setting can also be seen in the Console: Environment => Servers => Click on oacore_server1 => Monitoring => Performance

I saw warning "threadpool has stuck threads" in the Console, and also messages in log file oacore_server1.out from time to time. I assume they can be ignored:

<Error> <Net> <BEA-000903> <Failed to communicate with proxy: /80. Will try connection site_name.domain.com/443 now.java.net.ConnectException: Connection refused
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:339)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:200)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:182)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
Truncated. see log file for complete stacktrace
and FORCE_SUSPENDING FORCE_SHUTTING_DOWN

Also see Java error in log:
Forcibly releasing inactive/harvested connection "weblogic.jdbc.wrapper.PoolConnection_oracle_jdbc_driver_T4CConnection@21edc9b" back into the data source connection pool "EBSDataSource", currently reserved by: java.lang.Exception
        at weblogic.jdbc.common.internal.ConnectionEnv.setup(ConnectionEnv.java:367)
        at weblogic.common.resourcepool.ResourcePoolImpl.reserveResource(ResourcePoolImpl.java:379)
        at weblogic.common.resourcepool.ResourcePoolImpl.reserveResource(ResourcePoolImpl.java:345)
        at weblogic.jdbc.common.internal.ConnectionPool.reserve(ConnectionPool.java:469)
        at weblogic.jdbc.common.internal.ConnectionPool.reserve(ConnectionPool.java:363)
        at weblogic.jdbc.common.internal.ConnectionPoolManager.reserve(ConnectionPoolManager.java:125)
        at weblogic.jdbc.common.internal.RmiDataSource.getPoolConnection(RmiDataSource.java:469)
        at weblogic.jdbc.common.internal.RmiDataSource.getConnectionInternal(RmiDataSource.java:553)
        at weblogic.jdbc.common.internal.RmiDataSource.getConnection(RmiDataSource.java:513)
        at sun.reflect.GeneratedMethodAccessor106.invoke(Unknown Source)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.lang.reflect.Method.invoke(Method.java:606)
        at oracle.apps.fnd.security.DBConnObjWLSDSPool.getConnection(DBConnObjWLSDSPool.java:390)
        at oracle.apps.fnd.security.AppsConnectionManagerWLSDS.dbConnect(AppsConnectionManagerWLSDS.java:223)
        at oracle.apps.fnd.security.AppsConnectionManagerWLSDS.localAppsConnect(AppsConnectionManagerWLSDS.java:193)
        at oracle.apps.fnd.security.AppsConnectionManagerWLSDS.makeGuestConnection(AppsConnectionManagerWLSDS.java:68)
        at oracle.apps.fnd.security.DBConnObjDS.getLabelledConnection(DBConnObjDS.java:157)
        at oracle.apps.fnd.security.DBConnObj.<init>(DBConnObj.java:233)
        at oracle.apps.fnd.security.DBConnObjDS.<init>(DBConnObjDS.java:106)    3-27724233841

No comments: