Sun Cluster 3.x Quorum Issue
Peter van der Weerd
Clustering software usually consists of a collection
of scripts and binaries that unconfigure an interface, bring down an
application, unmount some file systems, give away a group of disks, and
reverse this procedure on some other machine. This goes for all Unix
cluster solutions. Of course, there are some differences on different
levels. Different vendors use different storage products and software to
manage devices, have their own ideas about establishing and maintaining
membership between the clustered machines, and so on.
So, if Unix clustering is so straightforward and
common practice, why am I writing an article on it? Good question. In this
article, I will not list all pros and cons of all different cluster
products. I will describe one con of one Unix cluster product, elaborate on
it a bit, and come up with a script that could help. Specifically, I will
cover the quorum issue in Sun Microsystems' Sun Cluster 3.x product.
Even though Sun, to my mind, has one of the
most advanced cluster solutions in the field, there is a drawback. This
drawback is the quorum device issue, or to be a little more exact, the
ignored issue of losing the disk that is your quorum device.
Quorum Device
You may already know about the quorum device issue,
but just to make sure, here's a short recap. Unix clusters all have
some sort of heartbeat protocol that uses either a dedicated network or all
available networks to communicate and establish "membership".
Membership means that both, or in the case of more than two nodes, all
nodes have to be aware of each other at all times.
|