Software RAID in SUSE Linux

Pre-amble and Disclaimer

Firstly, this article assumes that you know what you are doing. This isn't a general RAID tutorial or glossary. If you want to learn about the fundamentals of software RAID, have a look at the HOWTO.

Also, if you end up hosing your system due to acting upon anything you read here, then that is your tough luck. I accept no responsibility for any damage you may cause to yourself, your computer, your companies servers, your cat, etc.

You should also be conversent with partitioning terminology and concepts, as these are implied as understood!

More than anything, this article is just an HTML-ised version of the notes I made whilst conducting my tests. It is not intended to be a complete guide to any of the technologies and concepts discussed herein.

Introduction

This article discusses my experiences configuring software RAID on two SUSE Linux systems.

For my tests, I decided to use VMWare rather than experimenting with my live systems.

The first test case was configured as a SUSE 8.0 Professional system, with two 10GB SCSI disks. My intention with this system was to let the initial YaST installation handle setting up RAID1 (mirroring) automatically. The system would therefore run as a two-disk mirrored set.

The second test case was configured as a SUSE 8.2 Professional system, with six 10GB SCSI disks. With this system, I put the entire system onto the first SCSI disk, and then manually configured RAID5 across the remaining 5 disks, to be used as a file storage area, mounted into the root filesystem at /files.

One of the nice things about recent SUSE (and most other modern Linux distributions) is that RAID support is compiled into the kernel by default.

One late night, lots of caffeine, and a great deal of patience later, I had the two systems up and running.

Read on for some tips, tricks and instructions on configuring the two systems. The fact that VMWare was used is neither here nor there really, and the techniques should be the same if used on "real" systems.

RAID1 (Mirroring) Under SUSE 8.0 Professional

Note: This section is not a full discussion and tutorial of YaST! Also, the procedure outlined here will not give bootable redundancy. If the disk containing /boot fails the system will be unbootable. You could, however, use a boot disk and access the good disk in order to rebuild the system. This is a limitation of booting from software RAID devices using earlier distributions of Linux, using LILO. Try to set this up with YaST, and it'll warn you that it "may" not be bootable. I know that Red Hat Enterprise Linux 4 and GRUB will boot successfully from a software RAID1 /boot partition. Anyway, I always use hardware RAID on Linux systems anyway as disks are cheap these days, and hardware RAID remains transparent to the OS. That said, this procedure gives a feel for software mirroring a Linux system, and on this test box I only wanted data redundancy so I could easily mount a disk if the other failed and recover my files.

Bootable software RAID1 is tricky, but doable, under Solaris 10 for Intel and I have written an article here that describes the process.

As discussed, this system uses two 10GB SCSI disks. I planned the partitions to be set up as follows.

Partition	Size		Mount Point		Type	 RAID?
			/dev/sda	/dev/sdb	
1		2GB	/		/		Primary   Y
2		1GB	/var		/var		Primary   Y	
3		2GB	/usr		/usr		Primary   Y
4		NA	NA		NA		Extended  NA
5		512MB	swap		swap		Logical   N
6		400MB	/tmp		/tmp		Logical   Y
7		112MB	/boot		/spare		Logical   N
8		4GB	/home		/home		Logical   Y

Several factors influenced my partitioning scheme above. The first is that SUSE 8.0 uses LILO as its boot loader by default. LILO can sometimes have problems booting from RAID, so I decided to have a standard ext3 /boot partition that is not part of the mirror. Seeing as a /boot partition isn't going to change much (only when a kernel is compiled, etc), this can be backed up manually to tape or to /spare. The /spare partition again is ext3 and is not mirrored. This is to keep both disks in the mirrored set identical in terms of layout.

As you'll no doubt be aware, we'll be setting up the first three partitions as primary partitions, the fourth as an extended partition and the rest as logical partitions. These are basic partitioning concepts and I'll say no more. Google for information if you're unsure.

I did not include the two swap partitions in the mirror. I have read various papers, some advocating using RAID for swap, others admonishing it, so I decided to just leave the partitions as standard swap.

Everything else would be mirrored. I know this partitioning scheme may not be to everyones liking (more space on /tmp would be handy), but for the test scenario, it is more than sufficient.

YaST Installer Screenshot As you can see, each partition is initially marked as unformatted, with filesystem type 0xFD (Linux RAID). The installer will allow us to configure the true filesystem type, and format the mirror, later in the procedure.
YaST Installer Screenshot YaST Installer Screenshot After initial partitioning of the two SCSI drives, you will have a partitioning scheme somewhat like the two screenshots to the left. Review the partition layout, and then click RAID | Create RAID, to launch the RAID Wizard and create the first mirror.

Each "pair" of partitions (i.e. /dev/sda1 and /dev/sdb1) will be within its own mirror, therefore /dev/md0 will be the mirrored / filesystem, /dev/md1 will be /var, and so on.

YaST Installer Screenshot We now create the mirrors (we specify mountpoints and filesystem types during this step). The YaST installer takes care of all this, and despite some strange ways of going about things the installtion is relatively painless. (i.e. in the top-most screenshot, you can see that even though we have specified a RAID Type of RAID1 previously, it "appears" in this window as RAID0 with the drop-down box greyed out. However, after installation everything is RAID1 as expected. A strange quirk of the installer.....)

You will need to run the RAID wizard for each mirror you wish to create. On our test system, we create five mirrors, /dev/md0 through /dev/md4.

The bottom-most screenshot shows the partition layout now that all of the mirrors are in place. For my tests, I have used a mixture of ext3 and ReiserFS for the filesystem types.
YaST Installer Screenshot
YaST Installer Screenshot

After installation and configuration of the rest of the system, we can now log in and see if our RAID1 setup is working as expected.

Firstly, check /proc/mdstat to see if the RAID mirror is operating correctly:

$ cat /proc/mdstat
Personalities : [raid1] 
read_ahead 1024 sectors
md0 : active raid1 sdb1[1] sda1[0]
      2104384 blocks [2/2] [UU]
      
md1 : active raid1 sdb2[1] sda2[0]
      1052160 blocks [2/2] [UU]
      
md2 : active raid1 sdb3[1] sda3[0]
      2104448 blocks [2/2] [UU]
      
md3 : active raid1 sdb6[1] sda6[0]
      409536 blocks [2/2] [UU]
      
md4 : active raid1 sdb8[1] sda8[0]
      4160704 blocks [2/2] [UU]
      
unused devices: <none>

Yes, despite what YaST was telling us, RAID1 is in effect as requested! All mirrors appear to be working as expected.

Let's have a look at /etc/raidtab, as prepared by YaST

# autogenerated /etc/raidtab by YaST2 

raiddev /dev/md0
   raid-level        1
   nr-raid-disks     2
   nr-spare-disks    0
   persistent-superblock 1
   chunk-size        4
   device   /dev/sda1
   raid-disk 0
   device   /dev/sdb1
   raid-disk 1

raiddev /dev/md1
   raid-level        1
   nr-raid-disks     2
   nr-spare-disks    0
   persistent-superblock 1
   chunk-size        4
   device   /dev/sda2
   raid-disk 0
   device   /dev/sdb2
   raid-disk 1

raiddev /dev/md2
   raid-level        1
   nr-raid-disks     2
   nr-spare-disks    0
   persistent-superblock 1
   chunk-size        4
   device   /dev/sda3
   raid-disk 0
   device   /dev/sdb3
   raid-disk 1

raiddev /dev/md3
   raid-level        1
   nr-raid-disks     2
   nr-spare-disks    0
   persistent-superblock 1
   chunk-size        4
   device   /dev/sda6
   raid-disk 0
   device   /dev/sdb6
   raid-disk 1

raiddev /dev/md4
   raid-level        1
   nr-raid-disks     2
   nr-spare-disks    0
   persistent-superblock 1
   chunk-size        4
   device   /dev/sda8
   raid-disk 0
   device   /dev/sdb8
   raid-disk 1

Obviously, chunk sizes can be changed during the installtion, as well as the persistent-superblock option.

Let's make sure all of our filesystems are in place:

$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/md0              2.0G  920M 1000M  48% /
/dev/sda7             114M  8.7M   99M   8% /boot
/dev/md4              4.0G   33M  3.9G   1% /home
/dev/sdb7             114M  4.1M  104M   4% /spare
/dev/md3              387M  9.6M  357M   3% /tmp
/dev/md2              2.0G  1.2G  891M  57% /usr
/dev/md1             1011M   88M  872M  10% /var
shmfs                 124M     0  124M   0% /dev/shm

And that concludes the configuration of RAID1 mirroring on SUSE 8.0.

RAID5 Under SUSE 8.2 Professional

The entire root (/) filesystem was, in this test, all thrown onto /dev/sda1. Not very practical in a production system, but I needed to get a base system configured quickly so that I could configure /dev/sdb1 - /dev/sdf1 as a RAID5 array - which is what this is all about!

After logging into the system (as root, of course), it was easy to get RAID5 up and running.

Firstly, I created the /etc/raidtab file as follows

# Test RAID5 Setup on suseraid5 (192.168.0.254)
# 5 disks total: 4 disks used (10Gb each), with 1 x 10Gb disk spare

raiddev			/dev/md0
raid-level		5
persistent-superblock	1
chunk-size		32
parity-algorithm	left-symmetric

nr-raid-disks		4
nr-spare-disks		1

device			/dev/sdb1
raid-disk		0

device			/dev/sdc1
raid-disk		1

device			/dev/sdd1
raid-disk		2

device			/dev/sde1
raid-disk		3

device			/dev/sdf1
spare-disk		0

As you can see, the array was to consist of five 10GB SCSI disks. One of the disks is spare, and remains in the "pool" incase of a drive failure.

After creating /etc/raidtab, verify that RAID isn't running, by checking /proc/mdstat

# cat /proc/mdstat
Personalities : 
read_ahead not set
unused devices: <none>

As you can see, no RAID to be seen.

Use fdisk to create a partition (/dev/sdb1) on the first disk that will be in your array. Remember to set the filesystem type to "fd", Linux RAID. The fdisk session follows....

# fdisk /dev/sdb
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel. Changes will remain in memory only,
until you decide to write them. After that, of course, the previous
content won't be recoverable.


The number of cylinders for this disk is set to 1305.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
   (e.g., DOS FDISK, OS/2 FDISK)
Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)

Command (m for help): p

Disk /dev/sdb: 10.7 GB, 10737418240 bytes
255 heads, 63 sectors/track, 1305 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot    Start       End    Blocks   Id  System

Command (m for help): m
Command action
   a   toggle a bootable flag
   b   edit bsd disklabel
   c   toggle the dos compatibility flag
   d   delete a partition
   l   list known partition types
   m   print this menu
   n   add a new partition
   o   create a new empty DOS partition table
   p   print the partition table
   q   quit without saving changes
   s   create a new empty Sun disklabel
   t   change a partition's system id
   u   change display/entry units
   v   verify the partition table
   w   write table to disk and exit
   x   extra functionality (experts only)

Command (m for help): n
Command action
   e   extended
   p   primary partition (1-4)
p
Partition number (1-4): 1
First cylinder (1-1305, default 1): 
Using default value 1
Last cylinder or +size or +sizeM or +sizeK (1-1305, default 1305): 
Using default value 1305

Command (m for help): t
Selected partition 1
Hex code (type L to list codes): fd
Changed system type of partition 1 to fd (Linux raid autodetect)

Command (m for help): p

Disk /dev/sdb: 10.7 GB, 10737418240 bytes
255 heads, 63 sectors/track, 1305 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot    Start       End    Blocks   Id  System
/dev/sdb1             1      1305  10482381   fd  Linux raid autodetect

Command (m for help): w
The partition table has been altered!

Calling ioctl() to re-read partition table.
Syncing disks.

Repeat the fdisk procedure for each disk in the array (including the spare).

Next, with /etc/raidtab set up (and double-checked for syntax), run mkraid.

# mkraid /dev/md0
handling MD device /dev/md0
analyzing super-block
disk 0: /dev/sdb1, 10482381kB, raid superblock at 10482304kB
disk 1: /dev/sdc1, 10482381kB, raid superblock at 10482304kB
disk 2: /dev/sdd1, 10482381kB, raid superblock at 10482304kB
disk 3: /dev/sde1, 10482381kB, raid superblock at 10482304kB
disk 4: /dev/sdf1, 10482381kB, raid superblock at 10482304kB

If you see something along the lines of the above, all is going well.

Even if you check /proc/mdstat now, you'll see the RAID rebuilding....

# cat /proc/mdstat
Personalities : [raid5] 
read_ahead 1024 sectors
md0 : active raid5 sdf1[4] sde1[3] sdd1[2] sdc1[1] sdb1[0]
      31447104 blocks level 5, 32k chunk, algorithm 2 [4/4] [UUUU]
      [=>...................]  resync =  7.8% (821944/10482368) finish=5.2min speed=30442K/sec
unused devices: <none>

While it's rebuilding (remember it has no filesystem at the moment), run mke2fs (or the filesystem creation command of your choice) to create the filesystem.

# mke2fs /dev/md0

mke2fs /dev/md0
mke2fs 1.28 (31-Aug-2002)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
3932160 inodes, 7861728 blocks
393086 blocks (5.00%) reserved for the super user
First data block=0
240 block groups
32768 blocks per group, 32768 fragments per group
16384 inodes per group
Superblock backups stored on blocks: 
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
4096000

Writing inode tables: done
Writing superblocks and filesystem accounting information: done

This filesystem will be automatically checked every 26 mounts or
180 days, whichever comes first.  Use tune2fs -c or -i to override.

Of course, you can add your usual parameters to mke2fs, but we'll just keep it simple for our RAID tests.

Now, we can just sit back and monitor /proc/mdstat to see the array rebuilding. Once the array is built, the progress indicator will disappear from the /proc/mdstat output, e.g.

# cat /proc/mdstat
Personalities : [raid5] 
read_ahead 1024 sectors
md0 : active raid5 sdf1[4] sde1[3] sdd1[2] sdc1[1] sdb1[0]
      31446912 blocks level 5, 32k chunk, algorithm 2 [4/4] [UUUU]
      
unused devices: <none>

To check everything is up, run lsraid...

# lsraid -A -p
[dev   9,   0] /dev/md0         94113DE8.00ED4B7F.C37A5040.9A1F194B online
[dev   8,  17] /dev/sdb1        94113DE8.00ED4B7F.C37A5040.9A1F194B good
[dev   8,  33] /dev/sdc1        94113DE8.00ED4B7F.C37A5040.9A1F194B good
[dev   8,  49] /dev/sdd1        94113DE8.00ED4B7F.C37A5040.9A1F194B good
[dev   8,  65] /dev/sde1        94113DE8.00ED4B7F.C37A5040.9A1F194B good
[dev   8,  81] /dev/sdf1        94113DE8.00ED4B7F.C37A5040.9A1F194B spare
# lsraid -D -a /dev/md0
[dev 8, 17] /dev/sdb1:
md device= [dev 9, 0] /dev/md0
md uuid= 94113DE8.00ED4B7F.C37A5040.9A1F194B
state= good

[dev 8, 33] /dev/sdc1:
md device= [dev 9, 0] /dev/md0
md uuid= 94113DE8.00ED4B7F.C37A5040.9A1F194B
state= good

[dev 8, 49] /dev/sdd1:
md device= [dev 9, 0] /dev/md0
md uuid= 94113DE8.00ED4B7F.C37A5040.9A1F194B
state= good

[dev 8, 65] /dev/sde1:
md device= [dev 9, 0] /dev/md0
md uuid= 94113DE8.00ED4B7F.C37A5040.9A1F194B
state= good

[dev 8, 81] /dev/sdf1:
md device= [dev 9, 0] /dev/md0
md uuid= 94113DE8.00ED4B7F.C37A5040.9A1F194B
state= spare

Now, we can mount the raid, and add an entry to /etc/fstab to make the mount persistent.

# mkdir /files
# mount /dev/md0 /files
# cd /files
  ..all is well..
# vi /etc/fstab
  ..edit fstab..
# grep md0 /etc/fstab
/dev/md0	/files		ext2	   defaults	1 2

A quick df sees that all is in order

# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             9.4G  3.5G  5.5G  39% /
/dev/md0               30G   20K   29G   1% /files
shmfs                 125M     0  125M   0% /dev/shm

As we can see, we have our /dev/md0 RAID5 array mounted at /files!

Now, reboot, and if the array comes up, we can go to bed!

Conclusion

This has been a whirlwind tour of implementing software RAID under SUSE Linux. The YaST tool is very easy to use in order to configure RAID at install time. At times, however, I found it a little clumsy, and not too intuitive. If you just want to add an array to an existing system, I would recommend setting this up manually via the command line, as I found it to be quicker than finding my way around YaST.

I haven't covered the basics of RAID, Partitioning, YaST, etc, here as there are many articles and tutorials describing these concepts and tools throughout the internet - so Google for it!.

Valid CSS!

Valid HTML 4.01!