08 November 2017

ORA-15137 cluster in rolling patch

This happened during the apply of PSU 26610308 on GI. The opatch script rolled back 2 patches on one node of the cluster, but not on the other.  DBA who was applying the patch didn't notice this, as patch reported "Success" at the end. 

Next day, when  we tried to add diskgroup, we got ORA-15137 .

Symptoms

ORA-15137 when running Grid configuration commands . For example:

   CREATE DISKGROUP ... etc...
   ORA-15018: diskgroup cannot be created 
   ORA-15137: cluster in rolling patch 

When I checked patch releases in Grid on both nodes, they were different. Because of that, GI was still in rolling patch mode.
   [racnode1:~] crsctl query crs softwarepatch 
   Oracle Clusterware patch level on node racnode1 is [2100979899]. 

   [racnode2:~] crsctl query crs softwarepatch 
   Oracle Clusterware patch level on node racnode2 is [1930497511]. 

While listing patches on both nodes, I noticed two more patches on the first node.
    $ORACLE_HOME/bin/kfod op=patches 
   List of Patches
   ===============
   19769480
   20299023
   20831110
   21359755
   21436941
   21359761  <== These two patches were not on node2
   21359758  <== These two patches were not on node2
   21948354
   22291127
   23054246
   24006101
   24732082
   25755742
   25869830
   26609783

This meant we needed to rollback those two patches. When I tried to do this with the normal opatch -rollback option, GI said that patches were 'inactive' and couldn't rolled them back. So, we needed a way to force a rollback.

Solution

I'm publishing this because Google wasn't helping with this error. It took me a while before I stumbled on patchgen and we realised we could use it for this. I'm hoping this helps out other DBAs who need to force patch rollback.

Login to the node with "extra" patches (racnode1 for us)
Stop crs  
cd GI_HOME/bin 
./patchgen commit -rb 21359761 
./patchgen commit -rb 21359758 
rootcrs.pl -patch 
Verify with kfod that patch levels are the same.
Restart crs 

Lesson learned

Testing softwarepatch release number was added to a patching process, so this kind of situation wouldn't happen again.

Labels:


Comments: Post a Comment



<< Home

This page is powered by Blogger. Isn't yours?