Tuesday, June 28, 2022

Abort an ADOP patching cycle

When trying to run through the adop cycle before database 19c upgrade, it failed on PREPARE:
$ adop phase=prepare
... ...
Initializing.
    Run Edition context  : $CONTEXT_FILE
    Patch edition context: ... ...
    Patch file system free space: 50.28 GB

Validating system setup.
    Node registry is valid.
    Log: $APPLRGF/TXK/verifyssh.log
    Output: $APPLRGF/TXK/out.xml
    Remote execution is operational.
... ...
Running prepare phase on node(s): ... ...
    [UNEXPECTED]adop phase=prepare on all the nodes returned failure.
    [UNEXPECTED]Unable to continue processing.

The log file does not tell any specific reason. The SQL line prior to the error tells it may fail on checking the status of a CUSTOM TOP.

$ vi txkADOPPreparePhaseSynchronize.log
... ...
==========================
Inside getPatchStatus()...
==========================
SQL Command: SELECT status||',' FROM ad_adop_session_patches 
WHERE  --... ...
bug_number = 'ADSPLICE_docXX'

patch_status = Y
Updated patch_status = Y
EXIT STATUS: 1

=============================
Inside evalADPATCHStatus()...
=============================
message_status: ERROR
Adsplice action did not go through successfully.
*******FATAL ERROR*******
PROGRAM : ($AD_TOP/patch/115/bin/txkADOPPreparePhaseSynchronize.pl)
TIME    : Wed Jul 20 14:00:36 2022
FUNCTION: main::execADSPLICE [ Level 1 ]
ERRORMSG: Adsplice action did not go through successfully.

I ran below statement to get an idea what could be the problem, then tried to de-register the bad CUSTOM TOP docXX (as I did not know what was bad):
SQL> select * FROM ad_adop_session_patches where bug_number like 'ADSPLICE%'
order by adop_session_id desc, end_date desc;

$ perl $AD_TOP/bin/adDeregisterCustomProd.pl

Enter the APPS username: apps
Enter the APPS password:
Enter the Custom Application to De-Register: docXX

Enter the Application Id to De-Register: 20268
Script adjava oracle.apps.ad.splice.adCustProdSplicer mode=uninstall options=inputpair logfile=customderegister.log
.. ...
Performing Validations for De-Register

AD Custom Product Splicer error:
Patching cycle in progress - run this utility from patch file system
You can only run it from run file system when not patching

AD Custom Product Splicer is completed with errors.
Please see the below log file for more details.

I believe I also tried to run "adop phase=fs_clone" at this stage, and it did not do anything and failed very quickly. I did not want and had no confidence to de-register a CUSTOM TOP from patch file system. So, to run through the ADOP patch cycle, the only choice is to abort the patch session first in below three steps:

$ adop phase=abort
.. ...
Validating system setup.
    Node registry is valid.
    Log: $APPLRGF/TXK/verifyssh.log
    Output: $APPLRGF/TXK/out.xml
    Remote execution is operational.

Checking for existing adop sessions.
    Continuing with existing session [Session ID: 7].
        Session Id            :   7
        Prepare phase status  :  FAILED
        Apply phase status    :    NOT COMPLETED
        Cutover  phase status :  NOT COMPLETED
        Abort phase status    :    NOT COMPLETED
        Session status        :       FAILED
The above session will be aborted. Do you want to continue [Y/N]? Y
===================================================================
ADOP (C.Delta.12)
Session ID: 7
Node: node1Name
Phase: abort
Log: $NE_BASE/EBSapps/log/adop/7/20220X21_170727/adop.log
=====================================================================

Verifying existence of context files in database.

Checking if adop can continue with available nodes in the configuration.
    Log: $NE_BASE/EBSapps/log/adop/7/20220721_170727/abort/node1Name
        txkADOPEvalSrvStatus.pl returned SUCCESS

Creating list of nodes list where abort needs to be run.
    The abort phase needs to be run on node: node1Name
    The abort phase needs to be run on node: node2Name
    The abort phase needs to be run on node: node3Name

Running abort phase on node(s): [node1Name,node2Name and node3Name].
    Output: $NE_BASE/EBSapps/log/adop/7/20220721_170727/abort/remote_execution_result_level1.xml
    Log: $NE_BASE/EBSapps/log/adop/7/20220721_170727/abort/node1Name
        txkADOPEvalSrvStatus.pl returned SUCCESS

Generating node-specific status report.
    Output: $NE_BASE/EBSapps/log/adop/7/20220721_170727/adzdnodestat.out

Summary report for current adop session:
    Node node1Name: Completed successfully
       - Abort status:      Completed successfully
    Node node2Name: Completed successfully
       - Abort status:      Completed successfully
    Node node3Name: Completed successfully
       - Abort status:      Completed successfully
    For more details, run the command: adop -status -detail
adop exiting with status = 0 (Success)

$ adop phase=cleanup cleanup_mode=full
... ...
The cleanup phase completed successfully.
adop exiting with status = 0 (Success)

$ adop -status

Enter the APPS password:
Connected.
==============================================================
ADOP (C.Delta.12)
Session Id: 7
Command: status
Output: $NE_BASE/EBSapps/log/adop/7/20220X21_172115/adzdshowstatus.out
===============================================================
Node Name       Node Type  Phase           Status                            Started              Finished             Elapsed
--------------- ---------- --------------- ----------------------------------- ------------------- -------------------- ------------
node1Name       master  PREPARE     SESSION ABORTED 2022/0X/20 13:38:49  2022/0X/21 12:54:19  23:15:30
                                       APPLY               SESSION ABORTED
                                       FINALIZE         SESSION ABORTED
                                       CUTOVER        SESSION ABORTED
                                       CLEANUP        COMPLETED       2022/0X/21 17:13:58  2022/0X/21 17:14:48  0:00:50
node2Name       slave    PREPARE      SESSION ABORTED 2022/0X/20 13:38:49  2022/0X/21 12:55:01  23:16:12
                                       APPLY              SESSION ABORTED
                                       FINALIZE        SESSION ABORTED
                                       CUTOVER       SESSION ABORTED
                                       CLEANUP       COMPLETED       2022/0X/21 17:13:58  2022/0X/21 17:14:48  0:00:50
node3Name       slave    PREPARE      SESSION ABORTED 2022/0X/20 13:38:49  2022/0X/21 12:55:00  23:16:11
                                       APPLY              SESSION ABORTED
                                       FINALIZE        SESSION ABORTED
                                       CUTOVER       SESSION ABORTED
                                       CLEANUP       COMPLETED       2022/0X/21 17:13:58  2022/0X/21 17:14:48  0:00:50
File System Synchronization Type: Full

INFORMATION: Patching cycle aborted, so fs_clone will run automatically on node2Name,node3Name nodes in prepare phase of next patching cycle.
adop exiting with status = 0 (Success)

$ adop phase=fs_clone
... ...
The fs_clone phase completed successfully.
adop exiting with status = 0 (Success)

When I ran "adop -status" again, it did not mention "fs_clone" but gave the same status as in previous one.

Then, I stop Apps services and de-register the custom top on all nodes. Now, it worked:
$ ./adstpall.sh apps/appsPWD
$ perl $AD_TOP/bin/adDeregisterCustomProd.pl   (all three nodes)

$ adstrtal.sh apps/appsPWD

After that, the patch cycle ran through successfully on the current run file system:
$ adop phase=prepare
$ adop phase=actualize_all
$ adop phase=finalize finalize_mode=full
$ adop phase=cutover
On the new run file system:
$ adop phase=cleanup cleanup_mode=full 

To re-create the CUSTOM TOP (on all three nodes. First, make sure three config files in $APPL_TOP/admin are good on all nodes):

$ ./adstpall.sh apps/appsPWD
$ adsplice 


In another case, CUTOVER phase failed with below error. I had no idea on how to fix it and ran ABORT phase to get out from it.
$NE_BASE/EBSapps/log/adop/5/20210228_114950/cutover/validate/node1Name/ADOPValidations_detailed.log: 
-------------------------------------------------------------------------------------------------------------------
Lines #(51-55):
Checking the value of s_active_webport...

ERROR: The value of s_active_webport are different on RUN & PATCH Context files.
The Value present in RUN Context file = 4484
The Value present in PATCH Context file = 8041

I think if I ran "adop phase=fs_clone" BEFORE running PREPARE phase, it might screen out the issue or avoid the issue and so I did not have to run ABORT (which is time consuming).

No comments: