Monitoring
and Managing Linux Software RAID
Ryan Matteson
Systems administrators managing a data center face numerous challenges to
achieve required availability and uptime. Two of the main challenges are shrinking
budgets (for hardware, software, and staffing) and short deadlines in which
to deliver solutions. The Linux community has developed kernel support for software
RAID (Redundant Array of Inexpensive Disks) to help meet those challenges. Software
RAID, properly implemented, can eliminate system downtime caused by disk drive
errors. The source code to the Linux kernel, the RAID modules, and the raidtools
package are available at minimal cost under the GNU Public License. The interface
is well documented and comprehensible to a moderately experienced Linux systems
administrator.
In this article, I'll provide an overview of the software RAID implementation
in the Linux 2.4.X kernel. I will describe the creation and activation of software
RAID devices as well as the management of active RAID devices. Finally, I will
discuss some procedures for recovering from a failed disk unit.
Introduction to RAID
RAID is a set of algorithms for writing data blocks to disk devices. Each
RAID mode, or level, specifies the layout of data blocks on multiple disks.
Each RAID mode provides an enhancement in one aspect of data management: redundancy
or reliability, read or write performance, or logical unit capacity. Simple
RAID modes are named with an integer number: RAID 0, RAID 1, or RAID 5. Complex
RAID modes that combine multiple simple modes are named with a combined name:
RAID 0+1, RAID 1+0.
RAID 0 is used to enhance the read/write performance of large data sets, and
to increase logical unit capacity beyond the limits of a single disk device.
|