Recovering a faulty hard disk drive in Linux – PT1

Today, I’ve been working on a server that doesn’t want to mount one of the partitions that has been formatted with reiserfs.

Using the fsck command doesn’t work because there are bad blocks on the drive, resulting in fsck vehemently refusing to cooperate.:

Cannot read the block (41713664): (Input/output error).

To get fsck to play with such a faulty device, you need to build a list of badblocks first, so that fsck can deal with them. If the disk is REALLY bad, then this will do no good at all – just dd_rescue that bad boy!

It’s never bad to backup before doing something like this anyway, so if you can – use dd_rescue to clone onto another disk. The likelihood is that you don’t have another disk and that’s why you’re reading this. If so, read on.

First of all, we need to be able to tell fsck which blocks on the disk are bad. To compile a list of badblocks, you’ll need to use the badblocks tool:

badblocks /dev/hda1 >>badblocks.txt

Obviously, change hda1 for the partition that is causing you trouble.

You may also need to specify a block size for the partition. The default is 4096, which is the standard size of reiserfs, but to be doubly sure, you can run the debugreiserfs command, and the output will show you the blocksize:

Blocksize: 4096

If the blocksize differs, run
badblocks -b 4096 /dev/hda1 >>badblocks.txt
replacing 4096 with your block size.

This may take some time. Grab a book, make a drink, have a snack.

Once the blocks have been read, enter less badblocks.txt to make sure that there is something in the file. You should see a series of numbers output to the screen. If that’s good, press ‘q’ to return to the shell.

Now to run fsck. We will need to account for the bad blocks, so you will need to include that badblocks file that you have just created:

fsck.reiserfs –fix-fixable -B badblocks.txt /dev/hda1. Once again, change the device from hda1 to what you are trying to recover. As long as the driver isn’t completely nerfed, the you might be able to run through the commands and recover the drive table.

If not, then I’ll have another article very soon.