Saturday, May 25, 2024

EBS forms failed by CrowdStrike

EBS Forms in our financial applications suddenly does not work. The message on the webpage is
Failure of Web Server bridge:
No backend server available for connection: timed out after 10 seconds or idempotent set to OFF or method not idempotent. 
 
The error message does not tell the true problem. When checking into services on OS level, I saw Oracle EBS Forms service was not running and also saw errors from startup script $ADMIN_SCRIPTS_HOME/adstrtal.sh:

Forms service failed to start. 
The Node Manager is already up.
ERROR: Unable to start up the managed server forms_server1
Server specific logs are located at $EBS_DOMAIN_HOME/servers/forms_server1/logs
05/13/24-20:56:26 :: admanagedsrvctl.sh: exiting with status 1

Java error exists in Forms log file $EBS_DOMAIN_HOME/servers/forms_server1/logs/forms_server1.out

<May 13, 2024 8:56:25 PM EDT> <Emergency> <Store> <BEA-280060> <The persistent store "_WLS_forms_server1" encountered a fatal error, and it must be shut down: weblogic.store.PersistentStoreFatalException: [Store:280020]There was an error while reading from the log file
weblogic.store.PersistentStoreFatalException: [Store:280020]There was an error while reading from the log file
        at weblogic.store.io.file.FileStoreIO.open(FileStoreIO.java:128)
        at weblogic.store.internal.PersistentStoreImpl.recoverStoreConnections(PersistentStoreImpl.java:435)
        at weblogic.store.internal.PersistentStoreImpl.open(PersistentStoreImpl.java:423)
        at weblogic.store.admin.AdminHandler.activate(AdminHandler.java:126)
        at weblogic.store.admin.FileAdminHandler.activate(FileAdminHandler.java:207)
        Truncated.
Caused By: java.io.EOFException: premature EOF: expected=512, actual=126
        at weblogic.store.io.file.StoreFile.readBulk(StoreFile.java:316)
        at weblogic.store.io.file.Heap.readStoreFile(Heap.java:1142)
        at weblogic.store.io.file.Heap.getNextRecoveryFile(Heap.java:1226)
        at weblogic.store.io.file.Heap.open(Heap.java:373)
        at weblogic.store.io.file.FileStoreIO.open(FileStoreIO.java:117)
        Truncated.

Seems WebLogic failed to open a file, but the log did not say which file. I knew that Linux Admins just did server maintenance and rebooted server after they applied monthly patches and Security updates on OS level. That was the only change in the application environment recently.

After searching around, I found the Java errors match the description in Oracle Doc ID 3017110.1 ( Managed Forms Server Fails To Start - Displaying Message: FAILED_NOT_RESTARTABLE - ERROR: <BEA-280061> The persistent store "_WLS_forms_server1" could not be deployed: weblogic.store.PersistentStoreFatalException [Store:280020] ). 

The document points out the problem is caused by CrowdStrike, which locks a Forms file in $EBS_DOMAIN_HOME/servers/forms_server#/data/store/default.

CrowdStrike is installed in /opt/CrowdStrike. It is owned by root, and it is running constantly on the Linux server.
$ ps -ef | grep falcon-sensor
root      1081  1079  0 May13 ?        00:22:23 falcon-sensor

The problem can be fixed temporarily by a workaround:

1. Delete/rename below .DAT file (I guess CrowdStrike does not like the file name and so locks it)
$ cd $EBS_DOMAIN_HOME/servers/forms_server1/data/store/default
$ ls -altr
total 1028
drwxr-xr-x 4 user group      40 Sep 13  2023 ..
-rw-r--r-- 1   user group  1049088 May 13 20:51 _WLS_FORMS_SERVER1000000.DAT
drwxr-xr-x 2 user group     42 May 13 20:56 .
$ rm _WLS_FORMS_SERVER1000000.DAT

2. Re-start services cleanly by
$ADMIN_SCRIPTS_HOME/adstrtal.sh
Or
$ADMIN_SCRIPTS_HOME/admanagedsrvctl.sh start forms_server1

The permanent fix is that the Secure team rolls back the CrowdStrike change (and applies it again until CrowdStrike fixes the problem), because its new update touches an Oracle Forms data file wrongly during its scan. 

Saturday, April 20, 2024

Scripts for start & stop EBS services

When server reboots for maintenance or unexpected downtime, we want it to bring EBS services down and up automatically. Sometimes, we also want to schedule an EBS downtime by cron job. I wrote two shell scripts by calling EBS Admin scripts for automation. They also generate log files for tracing the scripts' last run.

Assume that a solid $HOME/.profile for setting up R12.2 environment variables and a file $HOME/xxx_scripts/.EBSpassenv holding key passwords exist on the server. 
$ more .EBSpassenv
export APPS_PWD=apps#@PWD
export SYSTEM_PWD=system%_PWD
export WLS_ADMIN=wls$%^PWD

root can create a short script in directory /etc/init.d (or /etc/rc.d/init.d) which will be called during server reboot to execute both auto_stopall.sh for automatic shutdown and anto_startall.sh for automatic startup.
 
============ script auto_startall.sh ============
# Start all EBS services 
DT=date +"%h %d, %y %H:%M"
RUNLOG="$HOME/xxx_scripts/reboot_scripts/reboot_start.log"
RUNLOG_ERR="$HOME/xxx_scripts/reboot_scripts/reboot_start_Error.log"
if [ -f $RUNLOG ]; then
mv $RUNLOG ${RUNLOG}_old
fi
if [ -f $RUNLOG_ERR ]; then
mv $RUNLOG_ERR ${RUNLOG_ERR}_old
fi
exec 1>$RUNLOG
exec 2>$RUNLOG_ERR
sleep 2
echo "Running at $DT"
. $HOME/.profile
. $HOME/xxx_scripts/.EBSpassenv
ps -ef | grep $LOGNAME           # check current status of EBS services

# for R12.1
# $ADMIN_SCRIPTS_HOME/adstrtal.sh apps/$APPS_PWD@$TWO_TASK

# for R12.2
if [ $isMaster == "enabled" ]; then      ## $isMaster is defined in .profile
{ echo apps ; echo $AAPS_PWD ; echo $WLS_ADMIN ; } | $ADMIN_SCRIPTS_HOME/adstrtal.sh @ -mode=allnodes -nopromptmsg
else
{ echo apps ; echo $AAPS_PWD ; echo $WLS_ADMIN ; } | $ADMIN_SCRIPTS_HOME/adstrtal.sh @ -msimode -nopromptmsg
fi
echo 'sleep 10 seconds'
sleep 10
exit 0
============= end ============
NOTES for concurrent (CM) server/node:
1. Even WLS AdminServer are not started and running on Primary node, adstrtal.sh will fully start concurrent managers on a CM node (where only s_batch_status is "enabled" in CONTEXT_FILE).
2. If CM node crashed and services were not stopped gracefully, adstrtal.sh may not be able to start concurrent managers on CM node next time. Instead,  all FNDLUBR processes may start and run on the Primary node (where even  s_batch_status is "disabled"). If that happens, you have to stop services on all nodes, and then run adstrtal.sh to start concurrent services on CM node first to correct the issue. " adcmctl.sh start " by its own will not do much in R12.2.
3. If adstpall.sh has troubles in stopping CM processes, run " adcmctl.sh stop " may help. 

========== script auto_stopall.sh =========
# Stop all EBS services. It may take 10 minutes for all apps processes shutdown.
DT=date +"%h %d, %y %H:%M"
RUNLOG="$HOME/xxx_scripts/reboot_scripts/reboot_stop.log"
RUNLOG_ERR="$HOME/xxx_scripts/reboot_scripts/reboot_stop_Error.log"
if [ -f $RUNLOG ]; then
mv $RUNLOG ${RUNLOG}_old
fi
if [ -f $RUNLOG_ERR ]; then
mv $RUNLOG_ERR ${RUNLOG_ERR}_old
fi
exec 1>$RUNLOG
exec 2>$RUNLOG_ERR
echo "Running at $DT"
. $HOME/.profile
. $HOME/xxx_scripts/.EBSpassenv
ps -ef | grep $LOGNAME
echo "shutting down ..."
# for R12.1
# $ADMIN_SCRIPTS_HOME/adstpall.sh apps/$APPS_PWD
{ echo apps ; echo $APPS_PWD ; echo $WLS_PWD ; } | $ADMIN_SCRIPTS_HOME/adstpall.sh @ -nopromptmsg
echo 'sleep 20 seconds'
sleep 20
PNUM=ps -ef | grep $LOGNAME | egrep -i 'FNDLIB|FNDSM' | wc -l 
if [ $PNUM -gt 1 ]; then
echo 'sleep 90 seconds more...'
sleep 90
fi
# only check upper case and assume $TWO_TASK is in the $ORACLE_HOME path
PNUM=ps -ef | grep -w $LOGNAME | egrep 'FNDLIB|FNDSM|'$TWO_TASK | wc -l 
if [ $PNUM -gt 1 ]; then
echo 'sleep 30 seconds'
sleep 30
fi
PNUM=ps -ef | grep -w $LOGNAME | egrep 'FNDLIB|FNDSM|'$TWO_TASK | wc -l 
if [ $PNUM -gt 1 ]; then
echo 'sleep 30 seconds more ...'
sleep 30
fi
PNUM=ps -ef | grep -w $LOGNAME | egrep 'FNDLIB|FNDSM|'$TWO_TASK | wc -l 
if [ $PNUM -gt 1 ]; then
echo 'sleep 15 seconds more ...'
sleep 15
fi
ps -ef | grep $LOGNAME
exit 0
============= end ============

Saturday, April 6, 2024

Use .profile in Linux to customize the shell prompt

When you have many EBS instances in a multi-nodes environment, it will be very useful to let the Linux prompt display current user ID, server name and the path location. A custom .profile saved under $HOME works for me very well. Its colors tell if you are in a Admin node or not, and if you are in a production environment or not (assume the last character of production server's name is "p").

For a Linux account, environment variable $HOME is defined by file /etc/passwd. But, if the account was created by AD (Active Directory), the default value of $HOME is defined in "Home Directory" section of AD. The value of $HOME could be set by "export" in file ~applMgr/.profile, in case some confusion.

Our EBS applMgr accounts use Korn shell which uses two startup files under $HOME, the .profile and the .kshrc. During a session start, .profile is first read once, then .kshrc (if it exists) is read by each new ksh. e.g. :

$ echo $SHELL
/bin/ksh
$ echo $0
-ksh
$ which ksh
/usr/bin/ksh
$ more .kshrc
alias ftp="print 'Reminder: Use sftp instead of \\\ftp'"
echo "This is .kshrc"
$ ksh
This is .kshrc
$ ftp
Reminder: Use sftp instead of \ftp

============= $HOME/.profile =============
PATH=/bin:/usr/bin:/usr/local/bin
export PATH
MANPATH=/usr/share/man:/usr/local/share/man
export MANPATH      # for man manual 
EDITOR=/bin/vi
export EDITOR
# ENV=$HOME/.kshrc
# export ENV
. /u02/app/EBSPROD/EBSapps.env RUN     # R12.2 env file
. /u02/app/xxx_scripts/.EBSpassenv              # password file (custom)
isMaster="no"
if [ ! -z $APPS_VERSION ] && [ ${APPS_VERSION:0:4} == "12.2" ]
then
s_status=cat $CONTEXT_FILE | grep -i s_adminserverstatus
isMaster="${s_status:60:7}"
fi
if [ $isMaster == "enabled" ]   # on admin/primary node
then
if [ echo -n ${HOSTNAME%%.*} | tail -c -1 != "p" ]   
             # last character of server name is not "p" => non-production server
then
PS1=$'
\e[0;31m$USER@${HOSTNAME%%.}[$TWO_TASK]\e[m$PWD
-->$ '  
else       # on production server: Red, and Green color on PWD
PS1=$'
\e[0;31m$USER@${HOSTNAME%%.}[$TWO_TASK]\e[m\E[32m$PWD \E[0m
-->$ '
fi
else                                          # on other node(s)
if [ echo -n ${HOSTNAME%%.*} | tail -c -1 != "p" ]   
            # on non-production server
then
PS1='
$USER@${HOSTNAME%%.}[$TWO_TASK]$PWD
-->$ '
else      # on production server
PS1=$'
$USER@${HOSTNAME%%.}[$TWO_TASK]\E[32m$PWD \E[0m
-->$ '
fi
fi

alias rm='rm -i'
stty erase ^?
umask u=rwx,g=rwx,o=rx
================ end =================

On an Admin node in production env, the prompt looks like this:
applMgr@server_1p[EBSPROD]/u02/app
-->$

applMgr@server_1p[EBSPROD]/u02/app
-->$ echo $USER
applMgr
applMgr@server_1p[EBSPROD]/u02/app
-->$ echo $TWO_TASK
EBSPROD
applMgr@server_1p[EBSPROD]/u02/app
-->$ cd $TWO_TASK
applMgr@server_1p[EBSPROD]/u02/app/EBSPROD
-->$ echo $HOME
/u02/app
applMgr@server_1p[EBSPROD]/u02/app/EBSPROD
-->$ ls
EBSapps.env   fs1   fs2   fs_ne

Wednesday, March 6, 2024

script to check if a password is expiring

The environment variable $HOME for a Linux account is defined by file /etc/passwd if the account was not created in AD (Active Directory). Each account has an entry line in file /etc/passwd. For example, I can get my account's password expiration date by: 

$ echo $HOME
/u02/app
$ whoami
applmgr
$ grep applmgr /etc/passwd
applmgr:x:50378:102:Oracle EBS ID - J Y:/u02/app:/bin/ksh
$ expstr=$( chage -l $(whoami) | grep "^Password expires" | awk -F: '{ print $(NF) }' | sed -e 's/^ *//g; s/ *$//g;' )
$ echo "password for account `whoami` will expire on $expstr"
password for account applmgr will expire on Jul 30, 2025

But, if the account was created by Windows AD (Active Directory), the variable $HOME is defined in AD by "Home Directory" (Note: an entry in .profile or such could change $HOME to a different path immediately after login). ADHelp search page may show info:
    Unix Account
Home Directory:   /users/applmgr
Login Shell:          /bin/ksh

In that case, "chage" will give a different result:
$ echo $HOME
/users/applmgr
$ expstr=$( chage -l $(whoami) | grep "^Password expires" | awk -F: '{ print $(NF) }' | sed -e 's/^ *//g; s/ *$//g;' )
chage: user 'applmgr' does not exist in /etc/passwd

For an important account created in Linux (vs. an AD account), I wrote a script to email warning out before its password expires. It can be run by a cron job, such as
30 12 * * * /path/to/xxxx_scripts/checkPWDexpire.sh 2>&1

============= script checkPWDexpire.sh =============
let secs_per_day=60*60*24
nowtime=$( date +%s )
expstr=$( chage -l $(whoami) | grep "^Password expires" | awk -F: '{ print $(NF) }' | sed -e 's/^ *//g; s/ *$//g;' )
echo "DEBUG: expstr is $expstr"
if [ "$expstr" == "never" ]; then
echo "Password never expires.";
exit 0;
fi
exptime=$( date --date "$expstr" +%s )
if [ "$exptime" -lt 1 ];        then
echo "Something is wrong.";
exit 255;   # Or, email a message out
fi
if [ "$exptime" -lt "$nowtime" ]; then
echo "Password already expired.";
exit 1;      # Or, email a message out
fi
secs_til_exp=$(expr $exptime - $nowtime)
days_til_exp=$(expr $secs_til_exp / $secs_per_day)
echo "Password expires in $days_til_exp days."
if [ "$days_til_exp" -lt 6 ]; then
# send email out
echo "Please reset password manually and update 3rd party environments." | mailx -s "`whoami` on `uname -n` will expire in $days_til_exp days" me@email.com
# or 
# mailx -s "`whoami` on `uname -n` will expire in $days_til_exp days" -a aFile.log me@email.com < aFile.log
else
echo "All is fine.";
exit ;
fi
============== end =====================

"chage" Linux command:
If OS user applmgr is granted sudo, it can act as root to check another account's status or change password status.

$ sudo su -
[sudo] password for applmgr:
Last login: Mon Mar 28 03:22:57 EDT xxxx
Hostname:  server_name.domain.com
OS:  Red Hat Enterprise Linux release 8.10 (Ootpa)
Arch:  x86_64

[root@server_name ~]# chage -E -1 batch_mgr   # -1 <== number
Notes: passing the number -1 to Expire Date (-E) only never expires the account, but not unexpire the password.   

[root@server_name ~]# chage -l batch_mgr       # -l <== --list
Last password change                                 : Feb 14, 2023
Password expires                                        : May 15, 2023
Password inactive                                       : Jun 14, 2023
Account expires                                          : never
Minimum number of days between password change      : 7
Maximum number of days between password change     : 90
Number of days of warning before password expires       : 7

[root@server_name ~]# chage -M -1 batch_mgr  

Notes: passing the number -1 as MAX DAYS (-M) will remove checking a password validity, which turns off the various password aging properties. Now batch_mgr can use its existing password to login.

[root@server_name ~]# chage -l batch_mgr
Last password change                              : Feb 14, 2023
Password expires                                      : never
Password inactive                                     : never     <= never be deactivated due to inactivity
Account expires                                        : never
Minimum number of days between password change      : 7
Maximum number of days between password change     : -1
Number of days of warning before password expires       : 7

[root@server_name ~]# chage -l applmgr      # b/c applmgr was originally created in AD 
chage: user 'applmgr' does not exist in /etc/passwd

To change a user's password as root:
[root@server_name ~]# passwd batch_mgr
Enter new UNIX password:
... ...

Saturday, February 17, 2024

Shell script for renewing ssl certificate

My post Re-new R12.2 ssl certificate has details on how to renew a certificate. A shell script helps a lot when there are many EBS instances waiting for renewal. I wrote below script which takes only one minute to renew the cert on each node after the certificate is renewed on Venafi website and downloaded/copied to Linux server. 

As of today, we still have difficulties using .yaml file to extract certificate from Venafi server to Linux server automatically. We tried to set up a "push" way on Venafi website to do the automation. But if the password is changed on the Linux account, the push will fail. 

============= Script renew_cert.sh ============
# Script for renewing ssl certificate after new cert file is saved to Linux server

walletpwd='putPWDhere'
# walletpwd='tttest'
walletloc=$HOME/xxx/Certs_Renew   # path where the Venafi cert file is saved
walletname='ewallet.p12'                # Must name Venafi cert file to this name
certname='cwallet.sso'

echo "cert at: $walletloc"
echo "cert name: $walletname"
echo $walletpwd

cd $walletloc

errorC=`env| grep RUN_BASE | wc -l`
if [ $errorC -lt 1 ]; then
  echo "No R12.2 environment"
  exit 1
  # . $HOME/EBSQA/EBSapps.env RUN
fi

alias orapki=$FMW_HOME/oracle_common/bin/orapki

orapki wallet display -wallet $walletloc/$walletname -pwd $walletpwd > viewCert.log
errorC=`egrep -i 'PKI-' viewCert.log | wc -l`
echo "Error: $errorC"

if [ $errorC -gt 0 ]; then
   echo "The password is incorrect or the Venafi cert file is incorrect."
   exit 2
fi

DT=`date +"%h_%d_%y_%H%M"`
if [ -f $certname ]; then
   mv $certname ${certname}_${DT}
fi

orapki wallet create -wallet $walletloc/$walletname -pwd $walletpwd -auto_login

if [ ! -f $certname ]; then
   echo "Failure in getting new cert file. Exiting."
   exit 3
fi

echo " "
echo "Copy cert file to directories ..."

cd $NE_BASE/inst/$CONTEXT_NAME/certs    # save a copy in this folder
if [ -d Apache ]; then
mv Apache Apache_${DT}
fi
mkdir Apache
cd Apache
pwd
cp -p $walletloc/$walletname ${walletname}
cp -p $walletloc/$certname ${certname}

iName=$(tr < $CONTEXT_FILE '<>' '  ' | awk '/"s_ohs_instance"/ {print $(NF-1)}' )
SUBiName=${iName%?????}
cd $FMW_HOME/webtier/instances/$iName/config/OPMN/opmn/wallet
pwd
if [ -f $certname ]; then
   mv $certname ${certname}_${DT}
fi
cp -p $walletloc/$certname ${certname}

cd $FMW_HOME/webtier/instances/$iName/config/OHS/$SUBiName/keystores/default
pwd
if [ -f $certname ]; then
   mv $certname ${certname}_${DT}
fi
cp -p $walletloc/$certname ${certname}

cd $FMW_HOME/webtier/instances/$iName/config/OHS/$SUBiName/proxy-wallet
pwd
if [ -f $certname ]; then
   mv $certname ${certname}_${DT}
fi
cp -p $walletloc/$certname ${certname}

echo " "
echo "Recycle Apache service..."
cd $ADMIN_SCRIPTS_HOME
./adopmnctl.sh stop
sleep 10
./adopmnctl.sh status

./adapcctl.sh start
./adopmnctl.sh status

echo "Paths for log files:"
echo $FMW_HOME/webtier/instances/$iName/diagnostics/logs/OHS/$SUBiName
echo $FMW_HOME/webtier/instances/$iName/diagnostics/logs/OPMN/opmn
cd
============ End ==========

Run the script to renew certificate on each node:
$ ./renew_cert.sh
cert at: $HOME/temp/Certs_Renew
cert name: ewallet.p12
putPWDhere
Error: 0
Oracle PKI Tool : Version 11.1.1.9.0
Copyright (c) 2004, 2015, Oracle and/or its affiliates. All rights reserved.

Copy cert file to directories ...

/u04/app/EBSQA/fs_ne/inst/EBSQA_nodeName/certs/Apache
$FMW_HOME/webtier/instances/EBS_web_EBSQA_OHS1/config/OPMN/opmn/wallet
$FMW_HOME/webtier/instances/EBS_web_EBSQA_OHS1/config/OHS/EBS_web_EBSQA/keystores/default
$FMW_HOME/webtier/instances/EBS_web_EBSQA_OHS1/config/OHS/EBS_web_EBSQA/proxy-wallet

Recycle Apache service ...

You are running adopmnctl.sh version 120.0.12020000.2

Stopping Oracle Process Manager (OPMN)  and the managed processes ...
opmnctl stopall: stopping opmn and all managed processes...

adopmnctl.sh: exiting with status 0

adopmnctl.sh: check the logfile $LOG_HOME/appl/admin/log/adopmnctl.txt for more information ...  

You are running adopmnctl.sh version 120.0.12020000.2

Checking status of OPMN managed processes...
opmnctl status: opmn is not running.

adopmnctl.sh: exiting with status 0

adopmnctl.sh: check the logfile $LOG_HOME/appl/admin/log/adopmnctl.txt for more information ...  

You are running adapcctl.sh version 120.0.12020000.6

Starting OPMN managed Oracle HTTP Server (OHS) instance ...

adapcctl.sh: exiting with status 0

adapcctl.sh: check the logfile $LOG_HOME/appl/admin/log/adapcctl.txt for more information ...  

You are running adopmnctl.sh version 120.0.12020000.2

Checking status of OPMN managed processes...

Processes in Instance: EBS_web_ARQA_OHS1
--------------------------------+--------------------+---------+---------
ias-component                    | process-type       |     pid    | status  
--------------------------------+--------------------+---------+---------
EBS_web_EBSQA             | OHS                   |   14542 | Alive   

adopmnctl.sh: exiting with status 0

adopmnctl.sh: check the logfile $LOG_HOME/appl/admin/log/adopmnctl.txt for more information ...  

Paths for log files:
$FMW_HOME/webtier/instances/EBS_web_EBSQA_OHS1/diagnostics/logs/OHS/EBS_web_EBSQA
$FMW_HOME/webtier/instances/EBS_web_EBSQA_OHS1/diagnostics/logs/OPMN/opmn

Check files in folder $HOME/xxx/Certs_Renew:
$ ls 
renew_cert.sh
ewallet.p12
ewallet.p12.lck
cwallet.sso.lck
viewCert.log
cwallet.sso

NOTES: there is a cert file in $EBS_DOMAIN_HOME/opmn/EBS_web_EBSQA_OHS1/wallet and $EBS_DOMAIN_HOME/opmn/EBS_web_EBSQA_OHS1/EBS_web/wallet. But I do not know what uses them.