The Grid Infrastructure installation went smoothly until we tried to run root.sh on the second node. The script failed with the following error:
CRS-4402: The CSS daemon was started in exclusive mode but found an active CSS daemon on node <nodename>, number 1, and is terminating An active cluster was found during exclusive startup, restarting to join the cluster Failed to start Oracle Clusterware stack Failed to start Cluster Synchorinisation Service in clustered mode at /u01/app/22.214.171.124/grid/crs/install/crsconfig_lib.pm line 1016 /u01/app/126.96.36.199/grid/perl/bin/perl -I/u01/app/188.8.131.52/grid/perl/lib -I/u01/app/184.108.40.206/grid/crs/install /u01/app/220.127.116.11/grid/crs/install/rootcrs.pl execution failed
We decided to try installing Oracle 18.104.22.168 and then upgrading to Oracle 22.214.171.124. This process requires the latest 126.96.36.199 Grid Infrastructure PSU to prevent a failure when the rootupgrade.sh script is executed on the first node. When this patch has been applied, rootupgrade.sh succeeded on the first node, but failed with the error described above on the second node.
This issue is described in MOS Note 1212703.1 "Grid Infrastructure install or upgrade may fail due to Multicasting"
If multicasting is not enabled on the private network, then root.sh will be successful on the first node, but will fail on the second and subsequent nodes when attempting to start CSSD. This affects both installations and upgrades.
Multicasting is required to enable the new HAIP interconnect feature. If multicast is not enabled, the node will not be able to join the cluster.
According to the note, the only solution is to enable multicasting on the private network (interconnect). This could be difficult on a production system, particularly for an out-of-hours upgrade where the relevant network specialists may not available to modify the switch configurations.
However, we did some research and it appears that multicasting is already enabled by default in OEL5U5. Each network interface described in ifconfig already had MULTICAST enabled. However past experience tells us that just because something is configured at operating system level, we cannot assume it is configured at switch level - remember jumbo frames?
Since we originally discovered this problem, Oracle have released a utility to test the availability of multicast addresses. The utility is called mcasttest and can be downloaded from MOS Note 1212703.1 Grid Infrastructure install or upgrade may fail due to Multicasting.
In the environment discussed here the mcasttest utility returned the following output:
$ ./mcasttest.pl -n server23,server24 -i bond1 ########### Setup for node server23 ########## Checking node access 'server23' Checking node login 'server23' Checking/Creating Directory /tmp/mcasttest for binary on node 'server23' Distributing mcast2 binary to node 'server23' ########### Setup for node server24 ########## Checking node access 'server24' Checking node login 'server24' Checking/Creating Directory /tmp/mcasttest for binary on node 'server24' Distributing mcast2 binary to node 'server24' ########### testing Multicast on all nodes ########## Test for Multicast address 188.8.131.52 Nov 19 11:29:11 | Multicast Failed for bond1 using address 184.108.40.206:42000 Test for Multicast address 220.127.116.11 Nov 19 11:29:12 | Multicast Succeeded for bond1 using address 18.104.22.168:42001
The mcasttest utility first attempts to use 22.214.171.124 which is the default address. It then repeats the test for 126.96.36.199. If the first test fails, but the second test succeeds as shown in the above example then Oracle recommends that the patch for bug Bug 9974223 - "Grid Infrastructure needs multicast communication on 188.8.131.52 address working" is installed on each node in the cluster after installation of the Oracle binaries, but before running root.sh or rootupgrade.sh.
I have subsequently successfully installed Oracle 184.108.40.206 Grid Infrastructure at another site without any issues. The second site is also a 2-node Linux x86-64 cluster, this time running Red Hat Enterprise Linux 5 Update 4. Both the public and private networks are bonded. In this case Oracle 220.127.116.11 installed without any problems at the first attempt.
In Oracle 18.104.22.168 and above this issue has been resolved by Oracle. The installation process now attempts to detect whether multicasting is enabled; if available then multicasting is used; if not available then the installer reverts to the non-multicasting algorithm used in Oracle 22.214.171.124.