If a man speaks in a forest, and his wife's not there, is he still wrong?

Replacing a failed drive in zfs on Solaris

I have a NAS box at home running OpenSolaris (currently using snv104). It has 2x80GB drives (mirrored) as the system area and 10x500GB drives (using raidz2) as the data store, both using zfs.

I've recently installed all the 500GB drives in caddies so they are easily removable/replaceable and I wanted to see what happened when I pulled a drive from the live system. The short answer: not a lot! The data zpool was marked as degraded, and the system kept running quite normally. Adding the drive back took some additional Solaris shenanigans though.

Here's what I did:

 

This is what the zpool looked like before removing the drive:

# zpool status space
  pool: space
 state: ONLINE
 scrub: scrub completed after 0h35m with 0 errors on Tue Jan  6 13:08:24 2009
config:

        NAME        STATE     READ WRITE CKSUM
        space       ONLINE       0     0     0
          raidz2    ONLINE       0     0     0
            c0t0d0  ONLINE       0     0     0
            c0t1d0  ONLINE       0     0     0
            c0t2d0  ONLINE       0     0     0
            c0t3d0  ONLINE       0     0     0
            c0t4d0  ONLINE       0     0     0
            c1t0d0  ONLINE       0     0     0
            c1t1d0  ONLINE       0     0     0
            c1t2d0  ONLINE       0     0     0
            c1t3d0  ONLINE       0     0     0
            c1t4d0  ONLINE       0     0     0

errors: No known data errors

After pulling the drive, it looked like this:

# zpool status space
  pool: space
 state: DEGRADED
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: scrub completed after 0h35m with 0 errors on Tue Jan  6 13:08:24 2009
config:

        NAME        STATE     READ WRITE CKSUM
        space       DEGRADED     0     0     0
          raidz2    DEGRADED     0     0     0
            c0t0d0  ONLINE       0     0     0
            c0t1d0  ONLINE       0     0     0
            c0t2d0  ONLINE       0     0     0
            c0t3d0  ONLINE       0     0     0
            c0t4d0  ONLINE       0     0     0
            c1t0d0  ONLINE       0     0     0
            c1t1d0  ONLINE       0     0     0
            c1t2d0  ONLINE       0     0     0
            c1t3d0  ONLINE       0     0     0
            c1t4d0  REMOVED      0    77     0

errors: No known data errors

When I pushed the drive back in, nothing changed, ie. the drive was not immediately made available.

I used cfgadm to investigate:

cfgadm -l
Ap_Id                          Type         Receptacle   Occupant     Condition
c4                             scsi-bus     connected    unconfigured unknown
sata0/0::dsk/c3t0d0            disk         connected    configured   ok
sata0/1::dsk/c3t1d0            disk         connected    configured   ok
sata1/0::dsk/c0t0d0            disk         connected    configured   ok
sata1/1::dsk/c0t1d0            disk         connected    configured   ok
sata1/2::dsk/c0t2d0            disk         connected    configured   ok
sata1/3::dsk/c0t3d0            disk         connected    configured   ok
sata1/4::dsk/c0t4d0            disk         connected    configured   ok
sata1/5                        sata-port    empty        unconfigured ok
sata1/6                        sata-port    empty        unconfigured ok
sata1/7                        sata-port    empty        unconfigured ok
sata2/0::dsk/c1t0d0            disk         connected    configured   ok
sata2/1::dsk/c1t1d0            disk         connected    configured   ok
sata2/2::dsk/c1t2d0            disk         connected    configured   ok
sata2/3::dsk/c1t3d0            disk         connected    configured   ok
sata2/4                        disk         connected    unconfigured unknown
sata2/5                        sata-port    empty        unconfigured ok
sata2/6                        sata-port    empty        unconfigured ok
sata2/7                        sata-port    empty        unconfigured ok
usb0/1                         unknown      empty        unconfigured ok
usb0/2                         usb-hub      connected    configured   ok
usb0/2.1                       usb-device   connected    configured   ok
usb0/2.2                       unknown      empty        unconfigured ok
usb0/2.3                       unknown      empty        unconfigured ok
usb0/2.4                       unknown      empty        unconfigured ok
usb1/1                         unknown      empty        unconfigured ok
usb1/2                         unknown      empty        unconfigured ok
usb2/1                         unknown      empty        unconfigured ok
usb2/2                         unknown      empty        unconfigured ok
usb3/1                         unknown      empty        unconfigured ok
usb3/2                         unknown      empty        unconfigured ok
usb4/1                         unknown      empty        unconfigured ok
usb4/2                         unknown      empty        unconfigured ok
usb4/3                         unknown      empty        unconfigured ok
usb4/4                         unknown      empty        unconfigured ok
usb4/5                         unknown      empty        unconfigured ok
usb4/6                         unknown      empty        unconfigured ok
usb4/7                         unknown      empty        unconfigured ok
usb4/8                         unknown      empty        unconfigured ok

I can see that disk sata2/4 is present but not configured.

I used this command to configure it:

cfgadm -c configure sata2/4::sata2/4::dsk/c1t4d0

The drive was then automatically added back to the zpool and resilvering took place:

# zpool status space
  pool: space
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
        attempt was made to correct the error.  Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
        using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://www.sun.com/msg/ZFS-8000-9P
 scrub: resilver completed after 0h0m with 0 errors on Tue Jan  6 14:15:39 2009
config:

        NAME        STATE     READ WRITE CKSUM
        space       ONLINE       0     0     0
          raidz2    ONLINE       0     0     0
            c0t0d0  ONLINE       0     0     0  9K resilvered
            c0t1d0  ONLINE       0     0     0  8.50K resilvered
            c0t2d0  ONLINE       0     0     0  8K resilvered
            c0t3d0  ONLINE       0     0     0  7K resilvered
            c0t4d0  ONLINE       0     0     0  8K resilvered
            c1t0d0  ONLINE       0     0     0  7.50K resilvered
            c1t1d0  ONLINE       0     0     0  9K resilvered
            c1t2d0  ONLINE       0     0     0  8.50K resilvered
            c1t3d0  ONLINE       0     0     0  9K resilvered
            c1t4d0  ONLINE       0    77     0  13K resilvered

errors: No known data errors

Story Options

Replacing a failed drive in zfs on Solaris | 0 comments
The following comments are owned by whomever posted them. This site is not responsible for what they say.

Topics

  • Home
  • Misc (6/0)
  • Audio (5/0)
  • Linux (21/0)
  • Family (1/0)
  • Fishing Diary (1/0)
  • OpenSolaris (7/0)
  • Computing (11/0)
  • General News (7/0)
  • Chloe (1/0)
  • Emily (2/0)
  • Twins (5/0)
  • Classifieds (2/0)
  • GeekLog (2/0)
  • Project Management (1/0)
  • User Functions






    Lost your password?