Monday, June 20, 2022

ADOP hits synchronization error after patching

After applying patches separately to each node in downtime mode, "adop phase=fs_clone" (and "adop phase=prepare") failed with same error:
  Checking for existing adop sessions.
    No pending session exists.
    Starting new adop session.
    [UNEXPECTED]The following nodes are not synchronized: node2Name, node3Name
    You must synchronize the nodes before continuing
    [UNEXPECTED]Unrecoverable error occurred. Exiting current adop session.

ADOP quits quickly and does not post any error in log file. 

Use below query to find PATCHRUN_ID was wrong on two nodes for a patch:
SQL> select bug_number, patchrun_id, node_name, adop_session_id 
from  AD_ADOP_SESSION_PATCHES 
where bug_number in ('33207251'); 

BUG_NUMBER   PATCHRUN_ID  NODE_NAME  ADOP_SESSION_ID
---------------------  ---------------------  ------------------- -------------------
33207251          -1             node2Name       4
33207251         29420             primaryName            4
33207251              -1             node3Name       4   

It seems patch 33207251 was included and applied with January 2022 CPU patch 33487428. But it then was applied again. There was something wrong during that. Sometimes, the keyStore file for Java Signing was wrong on a node and that failed the patching.

The fix to re-apply patch 33207251 on node2Name and node3Name separately by

$ adop phase=apply apply_mode=downtime patches=33207251 allnodes=no action=nodb restart=yes options=forceapply,nodatabaseportion
... ...
Validating credentials.
Initializing.
    Run Edition context  : /.../xxx.xml
    Patch edition context: /../xxx.xml

Warning: Ignoring 'abandon' parameter as no failed previous patching cycle was found.
Warning: Ignoring 'restart' parameter as no failed previous patching cycle was found.
    Patch file system free space: 43.73 GB

Validating system setup.
    Node registry is valid.

Checking for existing adop sessions.
    Application tier services are down.
    Continuing with the existing session [Session ID: 6].
... ...
Applying patch 32501487.
    Log: $NE_BASE/EBSapps/log/adop/... /33207251/log/u33207251.log
... ...
The apply phase completed successfully.
adop exiting with status = 0 (Success)

$ egrep -i 'error|fail|ora-' u33207251.log

After re-apply, it updated table AD_ADOP_SESSION_PATCHES, and then "adop phase=fs_clone" worked successfully.

FS_CLONE option: force=yes/no [default: no]
       Use force=yes to restart a previous failed fs_clone command from the beginning.  
       By default fs_clone will restart where it left off.

Before applying the patch again, I tried to update table AD_ADOP_SESSION_PATCHES manually by a SQL statement. But that did not make "adop phase=fs_clone" move forward.

NOTES: 
If necessary, FS_CLONE can be run separately on each node by
$ adop phase=fs_clone allnodes=no action=nodb force=yes

But if it gets below error, you have to run "adop phase=fs_clone" from the Primary node:
[UNEXPECTED]The admin server for the patch file system is not running.        
Start the patch file system admin server from the admin node and then rerun fs_clone.

After "adop phase=fs_clone" completed successfully, "adopscanlog -latest=yes" may still show various errors, such as ORA- error.

No comments: