Friday, September 23, 2011

Replace a failing drive in a ZFS Zpool

I had a hard drive start throwing some sense key errors on one of our big Sun servers at work today. Besides dealing with problems from a major power outage in Southern California recently this was just one more thing to deal with. Good news is it was part of a ZFS zpool and after some refreshing of the commands I needed to use I was able to replace the drive in a matter of minutes and there was no downtime for any of the users on the system. We have about 1,000 zfs file systems on this particular zpool, mostly all being very active user home directories. It is also a print server for about 1,000 users.

ZFS is amazing.

If you need to replace a failing drive in a ZFS zpool, maybe seeing what I needed to do will help you. Follow along to see what was needed.

Here you can see the sense key errors in the messages file


Running the zpool status command shows me only a single read error. This is enough for me to swap out the drive, this would be the c1t5d0 device.


The Sun server has many drives in it and I wanted to make sure I pull the correct drive out so I use the luxadm command to flash the amber light next to the drive in the system. These are fiber channel drives so luxadm is used. Next I tell the zpool system to take the drive offline so I can hot swap it out of the system. I run the the zpool status command to verify the drive is offline. Now I need to run to the data center, find the flashing light on the disk drive cage and swap it out with a new one.


Once the drive is replaced I run the devfsadm -C command to tell the OS to scan the system and add the new drive so it is usable to the zpool system. Now tell zpool to replace the failing drive with the new one. Since I was doing a direct replacement from the same slot in the system the command was very simple, "zpool replace zdata c1t5d0", "zdata" being the name of the zpool itself. Now running the zpool status command you can see it has already begun to rebuild the new drive back into the zpool volume.


With just a few simple commands I was able to hot swap and replace a failing disk drive in a very busy Sun server with no downtime at all. ZFS is an amazing piece of technology.

I hope this post might be helpful for you if you need to replace a failing hard drive using ZFS.


2 comments:

  1. Just wanted to say thank you for this post. How'd you get it to say "Yes Master?" at the prompt?

    ReplyDelete
    Replies
    1. Hi John, I hope the post was helpful to you.
      You can set the UNIX prompt to anything you like, I just happened to choose Yes Master since that is what the first UNIX system I ever used was set to and I thought it was cool.

      Do a search on google for changing the command prompt on unix and you will get many solutions.

      Delete