Tuesday, December 11, 2018

Disable pstack Called From Diagsnap After Applying PSU/RU released between October 2017 and July 2018 to Grid Infrastructure (GI) Home on 12.1.0.2 and 12.2. (Doc ID 2422509.1)

Description

Troubleshooting node reboots/evictions within Grid Infrastructure (GI) often is difficult due to the lack of Network and OS level resource information.  To help circumvent this situation the diagsnap feature has been developed and integrated with Grid Infrastructure.  Diagsnap is triggered to collect Network and OS level resource information when a given node is about to get evicted or when Grid Infrastructure is about to crash.
  
The diagsnap feature is enabled automatically starting from 12.1.0.2 Oct2017 PSU and 12.2.0.1 Oct2017 RU.For more information about the diagsnap feature, refer to the Document 2345654.1 "What is diagsnap resource in 12c GI and above?"  

Occurrence

In certain situations diagsnap executes pstack (and pfiles on Solaris) against critical daemons like ocssd.bin and gipcd.bin. 
Although very infrequent, taking pstack and pfiles on ocssd.bin can suspend the ocssd.bin daemon long enough to cause node reboots and evictions.  For this reason Oracle has decided to ask customers to disable diagsnap functionality until the proper fixes are  provided in a future PSU and/or RU. Once the fixes are applied, diagsnap will not call pstack (and pfiles on Solaris).

Symptoms

Node reboots and evictions after applying the 12.1.0.2 Oct2017 PSU (and later) or 12.2.0.1 Oct2017 RU (and later) but before 12.1.0.2 Oct2018 PSU and 12.2.0.1 Oct2018 RU To Grid Infrastructure (GI) Home.
The problem is fixed in 12.1.0.2 Oct2018 PSU and 12.2.0.1 Oct2018 RU.


Workaround

Either apply the patch or disable the osysmond from issuing pstack (and diagsnap from issuing pfiles in Solaris)

For non-Solaris environments:

1.  apply the latest PSU or RU or the patch for Bug:28266751, and the fix disables the osysmond from issuing pstack.

The fix for bugs 28266751 is included in the 12.1.0.2 Oct 2018 PSU and 12.2.0.1 Oct 2018 RU,
so
the strong recommendation is to apply 12.1.0.2 Oct 2018 PSU and 12.2.0.1 Oct 2018 RU or later.Refer to the Document 756671.1 "Master Note for Database Proactive Patch Program" for the patch number for the latest 12.1.0.2 PSU and 12.2.0.1 RU.

OR
2.  Disable osysmond from issuing pstack:
As root user, issue
crsctl stop res ora.crf -init
Update PSTACK=DISABLE in $GRID_HOME/crf/admin/crf<HOSTNAME>.ora
crsctl start res ora.crf -init
 

Patches

Following bugs are opened to remove the pstack and pfiles feature from diagsnap.
Bug:28266751 - REMOVE PSTACK FOR CSS AND GIPC IN DIAGSNAPBug:26943660 - DIAGSNAP.PL SHOULDN'T RUN PFILES ON CRSD.BIN

No comments:

Post a Comment