RAID1 Under Solaris 10 for Intel Using Solaris Volume Manager

Introduction

Setting up RAID1 using the Solaris Volume Manager (the artist formerly known as Disksuite) is usually a fairly painless affair. Using it to mirror the root slice (and any other slice that cannot be unmounted such as /usr) can be a little bit more work on SPARC. Attempt to mirror / on an Intel box, using Solaris 10, and prepare for some major headaches. Even once everything is set up, if a disk fails the system will cyclic reboot with a kernel panic. This behaviour is being addressed by Sun, and a workaround has been posted. However, the workaround doesn't seem to work on the Intel flavour. Anyway, to cut a long story short, I've found a workaround for the workaround, and have replaced a "failed" disk and recovered the system. And you'll be pleased to know that I've detailed the whole procedure below, to save you the hours of work that this theoretically simple task has consumed for me. Hopefully Sun will get this nonsense fixed, and soon.

These notes are pretty terse, and assume that you know what you're doing.

Initial Installation

The first thing to note is how Solaris normally installs itself on Intel boxes. An x86 fdisk partition is created, which is something that SVM doesn't know how to handle. The net result here is that you'll never be able to fully mirror the system. On most boxes, this x86 partition isn't required. You can just configure the system at initial installation with a single 100% Solaris fdisk partition on the disk containing the root slice, and all will be well. See this blog entry for the grimy details on Solaris and fdisk on Intel boxes.

So the first step is to backup all your data if you have an x86 fdisk partition, and prepare yourself for a system re-installation.

During the installation, ensure that you only create 1 fdisk partition, and it's a 100% Solaris fdisk partition. I've also created a fairly small s3 slice which i'll be using to hold the metadb's. 30Mb is more than ample for this slice. I should also say that the disk is the master device on the first IDE bus (I like to refer to this as IDE 0:0). And our final system will just contain two IDE disks - in a RAID1 configuration.

Post installation

Once the installation has successfully completed, power the system down (init 5). If your second disk isn't already physically installed, connect it now as the master device on the second IDE bus (IDE 1:0). Boot the system into single user mode (type "b -rs" at the boot prompt). Once the system comes up, have a look if the disk has been detected (it should have done as we specified -r at the boot prompt).

    
# format
Searching for disks...done


AVAILABLE DISK SELECTIONS:
       0. c0d0 <DEFAULT cyl 44381 alt 2 hd 15 sec 63>
          /pci@0,0/pci-ide@7,1/ide@0/cmdk@0,0
       1. c1d0 <DEFAULT cyl 44382 alt 2 hd 15 sec 63>
          /pci@0,0/pci-ide@7,1/ide@1/cmdk@0,0
Specify disk (enter its number): ^C
    
    

If it hasn't you'll see this...

    
# format
Searching for disks...done


AVAILABLE DISK SELECTIONS:
       0. c0d0 <DEFAULT cyl 44381 alt 2 hd 15 sec 63>
          /pci@0,0/pci-ide@7,1/ide@0/cmdk@0,0
Specify disk (enter its number): ^C
    
    

Use devfsadm to rebuild /dev and /devices and the OS should then see the disk...

    
# devfsadm
# format
Searching for disks...done


AVAILABLE DISK SELECTIONS:
       0. c0d0 <DEFAULT cyl 44381 alt 2 hd 15 sec 63>
          /pci@0,0/pci-ide@7,1/ide@0/cmdk@0,0
       1. c1d0 <DEFAULT cyl 44382 alt 2 hd 15 sec 63>
          /pci@0,0/pci-ide@7,1/ide@1/cmdk@0,0
Specify disk (enter its number): ^C
    
    

Next, use fdisk to create your 100% Solaris partition on the secondary disk

    
# fdisk -b /usr/lib/fs/ufs/mboot /dev/rdsk/c1d0p0
No fdisk table exists. The default partition for the disk is:

  a 100% "SOLARIS System" partition

Type "y" to accept the default partition,  otherwise type "n" to edit the
 partition table.
y
    
    

Now we can use prtvtoc and fmthard to create an identical partition table on the secondary disk to the first...

    
# prtvtoc /dev/rdsk/c0d0s2 | fmthard -s - /dev/rdsk/c1d0s2
fmthard:  New volume table of contents now in place.
    
    

Use the format command to check that both partition tables are identical.

Now install the partition boot file and the bootblock on the secondary disk

    
# installboot /usr/platform/`uname -i`/lib/fs/ufs/pboot \
> /usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/c1d0s2
    
    

Configure SVM

Create the state database replicas on our dedicated s3 slices. We create four here - two on each disk. If one disk fails, we still have the prerequisite 50% of DBs to make sure the system is usable.

    
# metadb -a -f -c 2 c0d0s3 c1d0s3
   
    

Now, set up the root, swap and home concats (submirrors) on the first disk.

    
# metainit -f d10 1 1 c0d0s0
d10: Concat/Stripe is setup
# metainit -f d11 1 1 c0d0s1
d11: Concat/Stripe is setup
# metainit -f d17 1 1 c0d0s7
d17: Concat/Stripe is setup
   
    

And the second disk...

    
# metainit -f d20 1 1 c1d0s0
d20: Concat/Stripe is setup
# metainit -f d21 1 1 c1d0s1
d21: Concat/Stripe is setup
# metainit -f d27 1 1 c1d0s7
d27: Concat/Stripe is setup
   
    

Now we can create the actual mirrors using the submirrors from the first drive.

    
# metainit d0 -m d10
d0: Mirror is setup
# metainit d1 -m d11
d1: Mirror is setup
# metainit d7 -m d17
d7: Mirror is setup
   
    

Then set the system to boot from the mirror with the metaroot command

    
# metaroot d0
   
    

This will change the line in /etc/vfstab for the root slice, and also add a line to /etc/system for rootdev. We need to edit /etc/vfstab and modify the entries for the swap slice and the home slice. See the before and after output below (only relevant lines shown). We also add the nologging option to the root slice - we'll need this for recovery and it's a workaround posted by Sun.

    
#Before...
/dev/dsk/c0d0s1 -       	   -         	  swap    -       no      -
/dev/md/dsk/d0  /dev/md/rdsk/d0   /               ufs     1       no      -
/dev/dsk/c0d0s7 /dev/rdsk/c0d0s7  /export/home    ufs     2       yes     -

#After...
/dev/md/dsk/d1  -       	  -               swap    -       no      -
/dev/md/dsk/d0  /dev/md/rdsk/d0   /       	  ufs     1       no      nologging
/dev/md/dsk/d7  /dev/md/rdsk/d7   /export/home    ufs     2       yes     -
   
    

Execute lockfs, then reboot the system

    
# lockfs -fa
# init 6
   
    

Once the system is up (you can let it come up into runlevel 3 now - the system will just be a bit slow during the mirror synch), attach the submirrors from the second drive.

    
# metattach d0 d20
d0: submirror d20 is attached
# metattach d1 d21
d1: submirror d21 is attached
# metattach d7 d27
d7: submirror d27 is attached
   
    

Now you can use the metastat command to monitor the synch operation

    
# metastat | grep "%"
    Resync in progress: 1 % done
    Resync in progress: 78 % done
    Resync in progress: 7 % done
   
    

It's also a good idea to set up the alternate boot path (altbootpath) eeprom variable now, so that the system will attempt to boot from this path automatically if it cannot boot from bootpath.

    
# ls -l /dev/dsk/c1d0s0    # output split for clarity
lrwxrwxrwx   1 root     root          50 Oct 18 23:44 /dev/dsk/c1d0s0 ->
                                  ../../devices/pci@0,0/pci-ide@7,1/ide@1/cmdk@0,0:a
# eeprom altbootpath='/pci@0,0/pci-ide@7,1/ide@1/cmdk@0,0:a'
# eeprom altbootpath
altbootpath=/pci@0,0/pci-ide@7,1/ide@1/cmdk@0,0:a
   
    

Once the synch is complete (keep an eye on it using metastat) the RAID1 is set up. If a disk fails, however, we've got a bit of work ahead of us to get the system recovered.

Disk Failure Simulation and Recovery

Let's simulate a disk failure. Power the system off

    
# init 5
   
    

Physically remove the disk (I went with c0d0 - the master on the first bus).

Attempt to boot the system. You can either interrupt the boot and use DCA to select the other disk, or just let the boot fail, and altbootpath will be used automatically. You'll watch the system appear to start booting, then if you've got lightning vision you'll see the a kernel panic and a cyclic reboot will ensue. The beauty of VMWare and paused virtual machines allows us to view this message in all it's sinister glory....

    
WARNING: md: d10: (Unavailable) needs maintenance
WARNING: Error writing ufs log state
WARNING: ufs log for / changed state to Error
WARNING: Please umount(1M) / and run fsck(1M)

panic[cpu0]/thread=fec1be20: mod_hold_stub: Couldn't load stub module misc/strplumb

fec25dd0 genunix:mod_hold_stub+139 (fec042f0, 1, fe8cad)
fec25dec unix:stubs_common_code+9 ()
   
    

The fix for this, once discovered, is pretty straightfoward. The first is the workaround posted by Sun which fixes the problem on SPARC systems - that is to disable logging for the root volume (we done this earlier, remember?). However, on Intel systems, this appears to do nothing to solve the issue. A little more work is required...

Power down the system

Insert the Solaris installation CD (disc 1) and boot the system. When the installation prompt appears, type "b -s" to boot from the CD-ROM in single user mode.

Once the system is running, mount the "good" root slice - note: although the device should be c1d0s0, when booting in CD-ROM mode it appears as c0d0s0. Confused yet?

    
# mount /dev/dsk/c0d0s0 /mnt
   
    

Sanitise the terminal for the impending vi command...

    
# TERM=sun-cmd; export TERM
   
    

Backup the files we are about to edit...

    
# cp -p /mnt/etc/system /mnt/etc/system.orig
# cp -p /mnt/etc/vfstab /mnt/etc/vfstab.orig
   
    

Comment out the rootdev entry in /etc/system (use * to comment)

    
# vi /mnt/etc/system
Before...
rootdev:/pseudo/md@0:0,0,blk
After...
* rootdev:/pseudo/md@0:0,0,blk
   
    

Modify /etc/vfstab and change all md entries to standard /dev/{r}dsk/blah entries and add nologging to mount options for root slice. Make sure you ignore the fact that in CD-ROM mode we see c1d0 as c0d0...

    
# vi /mnt/etc/vfstab
Before...
/dev/md/dsk/d1  -       	  -               swap    -       no      -
/dev/md/dsk/d0  /dev/md/rdsk/d0   /       	  ufs     1       no      nologging
/dev/md/dsk/d7  /dev/md/rdsk/d7   /export/home    ufs     2       yes     -
After...
/dev/dsk/c1d0s1 -       	   -         	  swap    -       no      -
/dev/dsk/c1d0s0 /dev/rdsk/c1d0s0  /               ufs     1       no      nologging
/dev/dsk/c1d0s7 /dev/rdsk/c1d0s7  /export/home    ufs     2       yes     -
   
    

Unmount the disk and reboot, removing the installation CD at the appropriate point.

    
# umount /mnt
# init 6
   
    

The system will now boot off of altbootpath (it's a good idea to interrupt the boot here and specify "b -s" after it has automatically selected altbootpath - or use DCA, then enter "b -s" at the boot interpreter prompt). It'll complain about some of the submirrors being missing, and might even tell you to fsck some slices - but the system boots. And that is half the battle over.

Once the system is up in single user mode, the system is running with the slices as standard UFS filesystems - outside of SVM control. We can get the system back under SVM control very easily.

First, remove the state replica databases from the non-existant failed disk. Make sure you delete the correct databases!

    
# metadb -f -d c0d0s3
   
    

Then, restore the original /etc/system and /etc/vfstab files that we backed up in CD-ROM mode above.

    
# cp -p /etc/system.orig /etc/system
# cp -p /etc/vfstab.orig /etc/vfstab
   
    

Reboot, and the system will come up in multiuser mode running off c1d0 (the second disk), albeit with a few errors and complaints about submirrors being missing - which is to be expected.

    
# init 6
   
    

metastat will, of course, show that the whole setup needs maintenance. But we are now running off of the good disk and the system is usable, albeit with no redundancy.

We are now at the point we should have been at if SVM worked properly for Solaris 10 and RAID1 root slices. This is, as I've said, being worked on by Sun - although browsing forums I see a lot of people experiencing the same problems and not a lot being done to fix it. The "nologging" workaround seems to be the only hope there is - and again, as I've said, this doesn't solve the problem on an Intel box. Anyway, hopefully this article will help a few people out with that.

Now, to replace that failed disk.

Power off the system

    
# init 5
   
    

Replace failed disk (I used a clean disk, placed as the master drive on the first IDE bus).

Power the system up into single user mode ("b -s"). Chances are the disk will be seen by format as we've not done a devfsadm -C after the disk failed. You can run devfsadm to make sure all is well, though

    
# devfsadm
   
    

Check the disk is present and correct

    
# format
Searching for disks...done


AVAILABLE DISK SELECTIONS:
       0. c0d0 <DEFAULT cyl 44382 alt 2 hd 15 sec 63>
          /pci@0,0/pci-ide@7,1/ide@0/cmdk@0,0
       1. c1d0 <DEFAULT cyl 44381 alt 2 hd 15 sec 63>
          /pci@0,0/pci-ide@7,1/ide@1/cmdk@0,0
Specify disk (enter its number): ^C
   
    

Use fdisk to create our Solaris partition on the replacement disk

    
# fdisk -b /usr/lib/fs/ufs/mboot /dev/rdsk/c0d0p0
No fdisk table exists. The default partition for the disk is:

  a 100% "SOLARIS System" partition

Type "y" to accept the default partition,  otherwise type "n" to edit the
 partition table.
y
   
    

Copy the partition information from c1d0 to c0d0

    
# prtvtoc /dev/rdsk/c1d0s2 | fmthard -s - /dev/rdsk/c0d0s2
   
    

Use the format command to verify that both disks are identically partitioned.

Install the partition boot file and the bootblock on the replacement disk.

    
# installboot /usr/platform/`uname -i`/lib/fs/ufs/pboot \
> /usr/platform/`uname -i`/lib/fs/ufs/bootblk /dev/rdsk/c0d0s2
   
    

Add the state replica databases to the s3 slice on the replacement disk.

    
# metadb -a -c 2 c0d0s3
   
    

Use metareplace to replace the failed submirrors of our mirrors by creating submirrors from the slices on our new replacement disk

    
# metareplace -e d0 /dev/dsk/c0d0s0
d0: device c0d0s0 is enabled
# metareplace -e d1 /dev/dsk/c0d0s1
d1: device c0d0s1 is enabled
# metareplace -e d7 /dev/dsk/c0d0s7
d7: device c0d0s7 is enabled
   
    

We're done! Use metastat to monitor the synch. Once that's complete, reboot the system to make sure everything comes up correctly

    
# init 6
   
    

As we haven't changed the bootpath eeprom variable, the system should now boot off of c0d0 (as per our *original* setup), and the system will come back up as it was before we started.

Job done. Phew!

Cheers
Kevin Waldron
kevin@zazzybob.com

Disclaimer! - This article is provided for guidance only, and does not replace the relevant official documentation and manuals. I will not be held liable for any hosed systems and/or data.

Valid CSS!

Valid HTML 4.01!