SolarisTM
Failover -- Keeping Connected at All Times
Brian Gollsneider and Arthur Messenger
Success in today's high-tech world demands high-availability systems. Five
or six 9s availability requires high-end hardware, stable operating systems,
and a stable connection to the network. In SolarisTM 8, Sun introduced
IP Network Multipathing. This capability allows administrators to create a hot
standby for a network interface card (NIC) or to configure several active NICs
on a machine in a multipath group to back up each other. The hot standby can
take over for a failed primary card in as little as 100 ms. In this article,
we present how to configure a system for failover, then describe the network
impact of multipath groups, how resilient normal network applications are to
timeouts, and the system logging and notification of the appropriate events.
We assume a working Solaris 8 system that is on a network and a second available
network card. Ethereal was used to monitor and record the network activity.
Background
Network failover is the ability to recover from a network problem on one network
path and switch to another. The failure can be the network card itself dying,
the network cable being cut or disconnected, or some other equivalent event.
Note that we forced network failures by physically disconnecting the network
cable at the appropriate time. Sun's IP Network Multipathing has three main
parts: failure detection, repair detection, and outbound load spreading.
Failure detection is sensing when a link is no longer good. On the other hand,
repair detection is determining when the link is good again.
|