cluster hard drive

13 Dec

TECH LINE: Disk Signature Disaster
Saving yourself some pain when replacing failed cluster disks.

By Chris Wolf

Chris: We recently had a hard disk fail on or Windows 2003
cluster and it was an absolute nightmare. I replaced the failed
disk and could not get the cluster server to recognize the new
disk in order to restore the missing disk files from backup. I
assigned the same drive letter to the new disk, but each time I
would try and bring the disk resource online, it would fail.

Since we were in a pinch, we decided to scrap the cluster and
start again from scratch. After rebuilding the cluster, we were
able to restore files to the two cluster virtual servers from
backup. I’m sure there has to be an easier way to recover a
failed cluster. What else could I have done?
— James

/————————————————————–\
| GOT A WINDOWS, EXCHANGE OR VIRTUALIZATION QUESTION OR NEED |
| TROUBLESHOOTING HELP? Or, perhaps you’re looking for a |
| better explanation than what’s provided in the manuals or a |
| TechNet article? Describe your dilemma in an e-mail to the |
| MCPmag.com Editors at mailto:editor@mcpmag.com ; the best |
| questions get answered in this column by our experts and the |
| submitter will be sent an MCPmag.com baseball cap. |
| |
| When you send your questions, please include your full first |
| and last name, location, certifications (if any) with your |
| message. (If you prefer to remain anonymous, specify this in |
| your message but submit the requested information for |
| verification purposes.) |
\————————————————————–/

After talking with James, I learned that his cluster ran two
virtual servers: a virtual file server and a virtual print
server. Each virtual server resided in its own group on the
cluster. With this relatively simple setup, rebuilding his
cluster did not take a long time. Since the point of having a
cluster is high availability, taking down an entire cluster is
never the best option. The reason James had this problem is due
to how Microsoft Cluster Service (MSCS) treats disk signatures.

The MSCS associates physical disk resources by the disk
signature that’s written to each physical disk when the disk is
initialized by a Windows OS. If you replace a physical disk
within the cluster, the Cluster service will see the original
disk as failed and will not even see the new disk. To have the
new disk seen as the original disk, the original disk’s
signature reference in the cluster configuration must match the
new disk. While there are a few tools that can do this, by far
the easiest method is to associate the new disk with the failed
disk is by running the Server Cluster Recovery Utility.

The Server Cluster Recovery Utility is included in the Windows
Server 2003 Resource Kit and can be downloaded from Microsoft at
http://www.microsoft.com/downloads . This tool is especially
useful when replacing a shared cluster disk or in a disaster
recovery scenario when a cluster is being rebuilt using new
physical disk resources. Oftentimes, after a cluster quorum is
restored, physical disk resources will still not be able to come
online. That’s because the signature for the disks stored in the
cluster configuration does not match the signature of the new
disks. In these instances, the Server Cluster Recovery Utility
can be used to return the disks to a usable state.

To use the Server Cluster Recovery Utility, first install the
replacement disk and use Disk Management to initialize and
format the new disk as NTFS. Then go to Cluster Administrator
and create a new resource for the newly added physical disk.
Here are the steps:

1. In Cluster Administrator, right-click the Resources
container, select New, and then click Resource.
2. In the New Resource dialog box, enter a name for the
new resource, select “Physical Disk” as the resource
type, and then select the group in which to
associate the resource.
3. Select the possible owners for the disk (same as
original disk) and click Next.
4. In the Dependencies dialog box, click Next.
5. The newly added disk should be displayed in the Disk
drop-down menu. Select the disk and click Finish.

With the newly installed disk associated with the cluster, you can
now use it to replace the failed disk resource. To do this, first
ensure that the Windows Server 2003 Resource Kit Tools are
installed on the node you plan to perform the procedure on and
then follow these steps:

1. Run clusterrecovery.exe to open the Server Cluster
Recovery Utility.
2. Once the tool opens, enter the name of your cluster
in the Cluster Name field. Then select the “Replace
a physical disk resource” radio button and
click Next.
3. Select the original (failed) disk in the “Old
physical disk resource” drop-down menu and then
select the new physical disk from the “New physical
disk resource” drop-down menu. Then click Replace.
4. Next you are given a friendly reminder from the
Server Cluster Recovery Utility to delete the
original disk resource and then change the drive
letter of the new disk resource so that it matches
the drive letter assigned to the original (failed)
disk. Click OK.
5. Click Exit to close the Server Cluster
Recovery Utility.
6. In Cluster Administrator, locate the failed disk
resource. The failed disk resource will be easy to
spot in Cluster Administrator because it will have
the word “(lost)” next to its name. Right-click on
the lost resource and select Delete. When prompted
to confirm, click Yes.
7. Use Disk Management to change the drive letter
associated with the new disk.

At this point, you can bring the virtual server resources back
online and restore the original virtual server data from backup.

Note that some resources may fail to come online. For example, a
File Share resource will fail if the original folder that the
resource is associated with is not present. After the backup is
restored, you will be able to bring all resources in the group
(virtual server) online. Also keep in mind that depending on how
your enterprise back-up software is configured, you’ll most
likely need to reinstall your back-up agent software into the
virtual server in order to perform the restore.

Before the days of the Server Cluster Recovery Utility, cluster
disk recovery was fraught with pain. As soon as I would hear of
a problem, my mind would instantly fill up with the burnt tooth
smell that serves as an ominous sign at most dentist offices.
Now when I hear of a cluster disk failure, I just smile from
ear to ear. This could mean that either I’m comforted by the
ease of the Server Cluster Recovery Utility, or that my sanity
is starting to return!