ORA-29701: unable to connect to Cluster Synchronization Service
PurposeThis note explains relevant issues if Oracle Clusterware's network socket files are deleted or wrongly owned.
Oracle Clusterware(CRS or Grid Infrastructure) network socket files are located in /tmp/.oracle, /usr/tmp/.oracle or /var/tmp/.oracle, it's important not to touch them manually unless instructed by Oracle Support to keep clusterware healthy.
A typical listing of the '/var/tmp/.oracle' shows a number of such files:
# ls -l
srwxrwxrwx 1 oracle dba 0 Sep 6 10:50 s#9862.2
srwxrwxrwx 1 oracle dba 0 Sep 15 11:35 sAracnode1_crs_evm
srwxrwxrwx 1 root root 0 Sep 15 11:35 sracnode1DBG_CRSD
srwxrwxrwx 1 oracle dba 0 Sep 15 11:34 sracnode1DBG_CSSD
srwxrwxrwx 1 oracle dba 0 Sep 15 11:35 sracnode1DBG_EVMD
srwxrwxrwx 1 oracle dba 0 Sep 15 11:35 sCracnode1_crs_evm
srwxrwxrwx 1 root root 0 Sep 15 11:35 sCRSD_UI_SOCKET
srwxrwxrwx 1 oracle dba 0 Sep 15 11:35 sEXTPROC
srwxrwxrwx 1 oracle dba 0 Sep 15 11:34 sOCSSD_LL_racnode1_crs
srwxrwxrwx 1 oracle dba 0 Sep 15 11:34 sOracle_CSS_LclLstnr_crs_1
srwxrwxrwx 1 root root 0 Sep 15 11:35 sora_crsqs
srwxrwxrwx 1 root root 0 Sep 15 11:35 sprocr_local_conn_0_PROC
srwxrwxrwx 1 oracle dba 0 Sep 15 11:35 sSYSTEM.evm.acceptor.auth
When a file is deleted on Unix, it becomes "invisible" at the filesystem level, however any process which had the file opened when it was deleted will still be able to use it.
Attempts to open a "deleted" file for reading will fail (ENOENT 2 /* No such file or directory */) , opening a file with the same name for writing will create a new (different) file.
Therefore only processes that attempted to open the socket file during the initial handshake were failing with ORA-29701 while existing processes were unaffected.
SolutionThe only way to re-create these special files is to restart (instance, listener, CRS). In a RAC environment this requires the shutdown & restart of the entire CRS stack.
As these special files are required to communicate with the various CRS daemons, it most likely will not be possible to stop (and restart) the CRS stack using the following commands as user root - but it won't hurt to try it anyway:
# $ORA_CRS_HOME/bin/crsctl start crs
If the above fails to successfully stop the CRS stack, a system reboot will be inevitable.
As for deleting files from temporary directory via a cronjob (or otherwise):
the directory '/var/tmp/.oracle' (on some platform /tmp/.oracle) should be excluded from such jobs/tasks. The files in this directory occupy only a few bytes and generally do not need to be cleaned up.
1. Size. File must be big enough, i.e. anything bigger than 5MB
2. Date. File must be old enough, i.e. only those that's not accessed/modified for more than 30 days.