Yet another disk failure - FreeBSD to the rescue!
Last week, I suffered yet another disk failure in my NAS. I'm running 10x500GB disks on an OpenSolaris box in a raidz2 configuration. This allows me to survive the failure of two disks without losing any data. I'd already had a disk fail and was awaiting a replacement from WD when I suffered another complete failure and, at the same time, a 3rd disk effectively failed as it starting throwing up read errors on several sectors. This caused my 4TB zpool to disappear, "losing" all data stored on it. Bugger!
However, the third disk was not totally unusable; I was able to read from most of the disk. What I needed to do was copy as much as possible from the "bad" disk onto a new disk, put it back in the zpool and cross my fingers.
FreeBSD has a tool called recoverdisk which can copy data from one disk to another, block-by-block, re-trying any failed blocks.
I downloaded FreeBSD 7.1 and burned it to DVD, unplugged all data drives from my NAS except the "bad" one and a new replacement. I plugged both of the drives directly into the motherboard, rather than the Supermicro SATA cards.
I booted from the FreeBSD DVD, dropped to a fixit shell prompt, and ran recoverdisk:
recoverdisk -w worklist /dev/ad8 /dev/ad10
After some time, recoverdisk "finished", in that it got to the end of the disk, but remained running, cycling over the 12 bad blocks it was unable to copy. I killed it with Ctrl-C.
I plugged all my data drives back in (including a the newly copied drive, and a replacement for one of the totally dead drives) and booted the NAS. Once it had booted, I exported and imported the zpool, and ... bingo! It was recovered!
Finally, I replaced one of the failed drives with the new replacement and the zpool began resilvering. After a couple of hours, it completed successfully with a warning that there were unrecoverable errors in one file - a Ubuntu ISO, which was easily replaceable.
Many thanks to Mark P. for his kind help and encouragment.