Firstly, this article assumes that you know what you are doing. This isn't a general RAID tutorial or glossary. If you want to learn about the fundamentals of software RAID, have a look at the HOWTO.
Also, if you end up hosing your system due to acting upon anything you read here, then that is your tough luck. I accept no responsibility for any damage you may cause to yourself, your computer, your companies servers, your cat, etc.
You should also be conversent with partitioning terminology and concepts, as these are implied as understood!
More than anything, this article is just an HTML-ised version of the notes I made whilst conducting my tests. It is not intended to be a complete guide to any of the technologies and concepts discussed herein.
This article discusses my experiences configuring software RAID on two SUSE Linux systems.
For my tests, I decided to use VMWare rather than experimenting with my live systems.
The first test case was configured as a SUSE 8.0 Professional system, with two 10GB SCSI disks. My intention with this system was to let the initial YaST installation handle setting up RAID1 (mirroring) automatically. The system would therefore run as a two-disk mirrored set.
The second test case was configured as a SUSE 8.2 Professional system, with six 10GB SCSI disks. With this system, I put the entire system onto the first SCSI disk, and then manually configured RAID5 across the remaining 5 disks, to be used as a file storage area, mounted into the root filesystem at /files.
One of the nice things about recent SUSE (and most other modern Linux distributions) is that RAID support is compiled into the kernel by default.
One late night, lots of caffeine, and a great deal of patience later, I had the two systems up and running.
Read on for some tips, tricks and instructions on configuring the two systems. The fact that VMWare was used is neither here nor there really, and the techniques should be the same if used on "real" systems.
Note: This section is not a full discussion and tutorial of YaST! Also, the procedure outlined here will not give bootable redundancy. If the disk containing /boot fails the system will be unbootable. You could, however, use a boot disk and access the good disk in order to rebuild the system. This is a limitation of booting from software RAID devices using earlier distributions of Linux, using LILO. Try to set this up with YaST, and it'll warn you that it "may" not be bootable. I know that Red Hat Enterprise Linux 4 and GRUB will boot successfully from a software RAID1 /boot partition. Anyway, I always use hardware RAID on Linux systems anyway as disks are cheap these days, and hardware RAID remains transparent to the OS. That said, this procedure gives a feel for software mirroring a Linux system, and on this test box I only wanted data redundancy so I could easily mount a disk if the other failed and recover my files.
Bootable software RAID1 is tricky, but doable, under Solaris 10 for Intel and I have written an article here that describes the process.
As discussed, this system uses two 10GB SCSI disks. I planned the partitions to be set up as follows.
Partition Size Mount Point Type RAID? /dev/sda /dev/sdb 1 2GB / / Primary Y 2 1GB /var /var Primary Y 3 2GB /usr /usr Primary Y 4 NA NA NA Extended NA 5 512MB swap swap Logical N 6 400MB /tmp /tmp Logical Y 7 112MB /boot /spare Logical N 8 4GB /home /home Logical Y
Several factors influenced my partitioning scheme above. The first is that SUSE 8.0 uses LILO as its boot loader by default. LILO can sometimes have problems booting from RAID, so I decided to have a standard ext3 /boot partition that is not part of the mirror. Seeing as a /boot partition isn't going to change much (only when a kernel is compiled, etc), this can be backed up manually to tape or to /spare. The /spare partition again is ext3 and is not mirrored. This is to keep both disks in the mirrored set identical in terms of layout.
As you'll no doubt be aware, we'll be setting up the first three partitions as primary partitions, the fourth as an extended partition and the rest as logical partitions. These are basic partitioning concepts and I'll say no more. Google for information if you're unsure.
I did not include the two swap partitions in the mirror. I have read various papers, some advocating using RAID for swap, others admonishing it, so I decided to just leave the partitions as standard swap.
Everything else would be mirrored. I know this partitioning scheme may not be to everyones liking (more space on /tmp would be handy), but for the test scenario, it is more than sufficient.
Each "pair" of partitions (i.e. /dev/sda1 and /dev/sdb1) will be within its own mirror, therefore /dev/md0 will be the mirrored / filesystem, /dev/md1 will be /var, and so on.
After installation and configuration of the rest of the system, we can now log in and see if our RAID1 setup is working as expected.
Firstly, check /proc/mdstat to see if the RAID mirror is operating correctly:
$ cat /proc/mdstat
Personalities : [raid1]
read_ahead 1024 sectors
md0 : active raid1 sdb1[1] sda1[0]
2104384 blocks [2/2] [UU]
md1 : active raid1 sdb2[1] sda2[0]
1052160 blocks [2/2] [UU]
md2 : active raid1 sdb3[1] sda3[0]
2104448 blocks [2/2] [UU]
md3 : active raid1 sdb6[1] sda6[0]
409536 blocks [2/2] [UU]
md4 : active raid1 sdb8[1] sda8[0]
4160704 blocks [2/2] [UU]
unused devices: <none>
Yes, despite what YaST was telling us, RAID1 is in effect as requested! All mirrors appear to be working as expected.
Let's have a look at /etc/raidtab, as prepared by YaST
# autogenerated /etc/raidtab by YaST2 raiddev /dev/md0 raid-level 1 nr-raid-disks 2 nr-spare-disks 0 persistent-superblock 1 chunk-size 4 device /dev/sda1 raid-disk 0 device /dev/sdb1 raid-disk 1 raiddev /dev/md1 raid-level 1 nr-raid-disks 2 nr-spare-disks 0 persistent-superblock 1 chunk-size 4 device /dev/sda2 raid-disk 0 device /dev/sdb2 raid-disk 1 raiddev /dev/md2 raid-level 1 nr-raid-disks 2 nr-spare-disks 0 persistent-superblock 1 chunk-size 4 device /dev/sda3 raid-disk 0 device /dev/sdb3 raid-disk 1 raiddev /dev/md3 raid-level 1 nr-raid-disks 2 nr-spare-disks 0 persistent-superblock 1 chunk-size 4 device /dev/sda6 raid-disk 0 device /dev/sdb6 raid-disk 1 raiddev /dev/md4 raid-level 1 nr-raid-disks 2 nr-spare-disks 0 persistent-superblock 1 chunk-size 4 device /dev/sda8 raid-disk 0 device /dev/sdb8 raid-disk 1
Obviously, chunk sizes can be changed during the installtion, as well as the persistent-superblock option.
Let's make sure all of our filesystems are in place:
$ df -h Filesystem Size Used Avail Use% Mounted on /dev/md0 2.0G 920M 1000M 48% / /dev/sda7 114M 8.7M 99M 8% /boot /dev/md4 4.0G 33M 3.9G 1% /home /dev/sdb7 114M 4.1M 104M 4% /spare /dev/md3 387M 9.6M 357M 3% /tmp /dev/md2 2.0G 1.2G 891M 57% /usr /dev/md1 1011M 88M 872M 10% /var shmfs 124M 0 124M 0% /dev/shm
And that concludes the configuration of RAID1 mirroring on SUSE 8.0.
The entire root (/) filesystem was, in this test, all thrown onto /dev/sda1. Not very practical in a production system, but I needed to get a base system configured quickly so that I could configure /dev/sdb1 - /dev/sdf1 as a RAID5 array - which is what this is all about!
After logging into the system (as root, of course), it was easy to get RAID5 up and running.
Firstly, I created the /etc/raidtab file as follows
# Test RAID5 Setup on suseraid5 (192.168.0.254) # 5 disks total: 4 disks used (10Gb each), with 1 x 10Gb disk spare raiddev /dev/md0 raid-level 5 persistent-superblock 1 chunk-size 32 parity-algorithm left-symmetric nr-raid-disks 4 nr-spare-disks 1 device /dev/sdb1 raid-disk 0 device /dev/sdc1 raid-disk 1 device /dev/sdd1 raid-disk 2 device /dev/sde1 raid-disk 3 device /dev/sdf1 spare-disk 0
As you can see, the array was to consist of five 10GB SCSI disks. One of the disks is spare, and remains in the "pool" incase of a drive failure.
After creating /etc/raidtab, verify that RAID isn't running, by checking /proc/mdstat
# cat /proc/mdstat Personalities : read_ahead not set unused devices: <none>
As you can see, no RAID to be seen.
Use fdisk to create a partition (/dev/sdb1) on the first disk that will be in your array. Remember to set the filesystem type to "fd", Linux RAID. The fdisk session follows....
# fdisk /dev/sdb Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel Building a new DOS disklabel. Changes will remain in memory only, until you decide to write them. After that, of course, the previous content won't be recoverable. The number of cylinders for this disk is set to 1305. There is nothing wrong with that, but this is larger than 1024, and could in certain setups cause problems with: 1) software that runs at boot time (e.g., old versions of LILO) 2) booting and partitioning software from other OSs (e.g., DOS FDISK, OS/2 FDISK) Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite) Command (m for help): p Disk /dev/sdb: 10.7 GB, 10737418240 bytes 255 heads, 63 sectors/track, 1305 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System Command (m for help): m Command action a toggle a bootable flag b edit bsd disklabel c toggle the dos compatibility flag d delete a partition l list known partition types m print this menu n add a new partition o create a new empty DOS partition table p print the partition table q quit without saving changes s create a new empty Sun disklabel t change a partition's system id u change display/entry units v verify the partition table w write table to disk and exit x extra functionality (experts only) Command (m for help): n Command action e extended p primary partition (1-4) p Partition number (1-4): 1 First cylinder (1-1305, default 1): Using default value 1 Last cylinder or +size or +sizeM or +sizeK (1-1305, default 1305): Using default value 1305 Command (m for help): t Selected partition 1 Hex code (type L to list codes): fd Changed system type of partition 1 to fd (Linux raid autodetect) Command (m for help): p Disk /dev/sdb: 10.7 GB, 10737418240 bytes 255 heads, 63 sectors/track, 1305 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Device Boot Start End Blocks Id System /dev/sdb1 1 1305 10482381 fd Linux raid autodetect Command (m for help): w The partition table has been altered! Calling ioctl() to re-read partition table. Syncing disks.
Repeat the fdisk procedure for each disk in the array (including the spare).
Next, with /etc/raidtab set up (and double-checked for syntax), run mkraid.
# mkraid /dev/md0 handling MD device /dev/md0 analyzing super-block disk 0: /dev/sdb1, 10482381kB, raid superblock at 10482304kB disk 1: /dev/sdc1, 10482381kB, raid superblock at 10482304kB disk 2: /dev/sdd1, 10482381kB, raid superblock at 10482304kB disk 3: /dev/sde1, 10482381kB, raid superblock at 10482304kB disk 4: /dev/sdf1, 10482381kB, raid superblock at 10482304kB
If you see something along the lines of the above, all is going well.
Even if you check /proc/mdstat now, you'll see the RAID rebuilding....
# cat /proc/mdstat
Personalities : [raid5]
read_ahead 1024 sectors
md0 : active raid5 sdf1[4] sde1[3] sdd1[2] sdc1[1] sdb1[0]
31447104 blocks level 5, 32k chunk, algorithm 2 [4/4] [UUUU]
[=>...................] resync = 7.8% (821944/10482368) finish=5.2min speed=30442K/sec
unused devices: <none>
While it's rebuilding (remember it has no filesystem at the moment), run mke2fs (or the filesystem creation command of your choice) to create the filesystem.
# mke2fs /dev/md0 mke2fs /dev/md0 mke2fs 1.28 (31-Aug-2002) Filesystem label= OS type: Linux Block size=4096 (log=2) Fragment size=4096 (log=2) 3932160 inodes, 7861728 blocks 393086 blocks (5.00%) reserved for the super user First data block=0 240 block groups 32768 blocks per group, 32768 fragments per group 16384 inodes per group Superblock backups stored on blocks: 32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 4096000 Writing inode tables: done Writing superblocks and filesystem accounting information: done This filesystem will be automatically checked every 26 mounts or 180 days, whichever comes first. Use tune2fs -c or -i to override.
Of course, you can add your usual parameters to mke2fs, but we'll just keep it simple for our RAID tests.
Now, we can just sit back and monitor /proc/mdstat to see the array rebuilding. Once the array is built, the progress indicator will disappear from the /proc/mdstat output, e.g.
# cat /proc/mdstat
Personalities : [raid5]
read_ahead 1024 sectors
md0 : active raid5 sdf1[4] sde1[3] sdd1[2] sdc1[1] sdb1[0]
31446912 blocks level 5, 32k chunk, algorithm 2 [4/4] [UUUU]
unused devices: <none>
To check everything is up, run lsraid...
# lsraid -A -p [dev 9, 0] /dev/md0 94113DE8.00ED4B7F.C37A5040.9A1F194B online [dev 8, 17] /dev/sdb1 94113DE8.00ED4B7F.C37A5040.9A1F194B good [dev 8, 33] /dev/sdc1 94113DE8.00ED4B7F.C37A5040.9A1F194B good [dev 8, 49] /dev/sdd1 94113DE8.00ED4B7F.C37A5040.9A1F194B good [dev 8, 65] /dev/sde1 94113DE8.00ED4B7F.C37A5040.9A1F194B good [dev 8, 81] /dev/sdf1 94113DE8.00ED4B7F.C37A5040.9A1F194B spare # lsraid -D -a /dev/md0 [dev 8, 17] /dev/sdb1: md device= [dev 9, 0] /dev/md0 md uuid= 94113DE8.00ED4B7F.C37A5040.9A1F194B state= good [dev 8, 33] /dev/sdc1: md device= [dev 9, 0] /dev/md0 md uuid= 94113DE8.00ED4B7F.C37A5040.9A1F194B state= good [dev 8, 49] /dev/sdd1: md device= [dev 9, 0] /dev/md0 md uuid= 94113DE8.00ED4B7F.C37A5040.9A1F194B state= good [dev 8, 65] /dev/sde1: md device= [dev 9, 0] /dev/md0 md uuid= 94113DE8.00ED4B7F.C37A5040.9A1F194B state= good [dev 8, 81] /dev/sdf1: md device= [dev 9, 0] /dev/md0 md uuid= 94113DE8.00ED4B7F.C37A5040.9A1F194B state= spare
Now, we can mount the raid, and add an entry to /etc/fstab to make the mount persistent.
# mkdir /files # mount /dev/md0 /files # cd /files ..all is well.. # vi /etc/fstab ..edit fstab.. # grep md0 /etc/fstab /dev/md0 /files ext2 defaults 1 2
A quick df sees that all is in order
# df -h Filesystem Size Used Avail Use% Mounted on /dev/sda1 9.4G 3.5G 5.5G 39% / /dev/md0 30G 20K 29G 1% /files shmfs 125M 0 125M 0% /dev/shm
As we can see, we have our /dev/md0 RAID5 array mounted at /files!
Now, reboot, and if the array comes up, we can go to bed!
This has been a whirlwind tour of implementing software RAID under SUSE Linux. The YaST tool is very easy to use in order to configure RAID at install time. At times, however, I found it a little clumsy, and not too intuitive. If you just want to add an array to an existing system, I would recommend setting this up manually via the command line, as I found it to be quicker than finding my way around YaST.
I haven't covered the basics of RAID, Partitioning, YaST, etc, here as there are many articles and tutorials describing these concepts and tools throughout the internet - so Google for it!.