tl;dr recovering data from hard drives can be confusing if RAID is involved
A family member asked if I could get the files off their old computer. I opened it up and there were two hard drives! There were also two optical drives, a 3.5-inch floppy, and a graphics card with DVI and S-Video. The case did not have a name brand and I had a feeling it was a custom build. The sticker on the front said Pentium III, and the one on the side said Windows XP. I would have wanted to try starting the computer, but there wasn’t a monitor or anything on hand, so I took out the two 80GB IDE hard drives and brought them home. I also ordered an appropriate USB adaptor and a flash drive to put the files on.
The plan was to connect the drives and they would mount on my mac as if they were Windows-formatted external drives. This isn’t what happened. Neither Mac OS nor Windows (via Parallels) could read either disk, though both helpfully offered to format them. My first thought was that somehow the disks or the filesystems were damaged. I created full-disk images of both disks using dd program then ran TestDisk against each. This reported that there were potentially partitions on the images, but some were larger than the disk itself, and none could be recovered.
TestDisk comes with a tool called PhotoRec, which scans the entire partition or disk for anything that looks like files. I ran this on both images, and it produced over 100,000 “files”. The files represented a broad range of data: code, legal disclaimers, images, video clips, and programs (my antivirus was blowing up!). None were very large, and most seemed incomplete or corrupted. The files were more like shreds of files, as if I was looking inside a paper shredder. I also noticed that similar-looking file-shreds appeared on both disks, which didn’t agree with my assumption that the disks were used independently for separate things.
I had some prior exposure to RAID (Redundant Array of Independent/Inexpensive Disks), having installed a RAID-1 array on the NAS unit in my closet. At some point it dawned on me that these two disks were potentially a RAID-0 “stripe set” array. One hint, beside from what I could see in the data, was that the two hard drives were the same model, a condition that is recommended for RAID to work properly. While RAID-1 combines the disks to create one fault-tolerant array of the same size, RAID-0 combines the disks to make one larger array. Both schemes promise some improvement in read speed, while RAID-1 also improves write speed. There are other RAID configuration options if you have more disks, but these are the two options if you have two disks. Another possibility was that the two disks were combined head-to-tail, creating a “Spanned Volume” (windows name) / “Concatenated Disk Set” (mac name) / “JBOD” (industry loves acronyms) Just a Bunch Of Disks. This possibility wasn’t entirely off the table, but based on how the data was shredded and distributed on the disks, it was looking more like a RAID striped set.
I started to include “RAID” in my Google queries. There are a number of commercial products offering RAID support or recovery, as well as some open-source tools for Linux. I was also looking into how a basic Windows PC might get set up to use RAID, which is more commonly seen in servers. It sounds like many motherboard BIOS implementations support RAID, sometimes after a firmware upgrade. Windows now has built-in support, but I don’t think XP did. Either way, it’s software-based RAID, meaning the striping logic is handled by the CPU, potentially competing with whatever else the processor is working on. A server or custom PC can also use a hardware disk array controller. These are dedicated cards that connect directly to the hard drives and often (but not always) implement RAID on their own.
The tool that finally helped me was “ReclaiMe Free RAID Recovery“. At first I was sure it was a scam, but it turned out to do just what I needed. I ran this program on my Windows VM and pointed it to the images of the disks. It examined the images and eventually concluded that they did in fact form an array. It gave me the essential details of the array, namely disk order and block size, and provided an option to save the array as a single image. The resulting disk image was 160GB, as expected. I double-clicked the image and right away it mounted 4 partitions on my mac. Two were FAT format and two were NTFS, and they had names like PHOTOS and GAMES. I copied the contents of each partition to different folders in the flash drive using cp.
After a fair amount of head scratching and a number of painfully slow progress bars, I am happy to say the project was successful. It gave me plenty to ponder for a while, and seemed like a good topic for a blog post, so… ta da!
I could probably write another article about the things I tried that didn’t work and the things I only considered trying. One tool that helped me check the health of the disks was HDDScan. This gave me SMART reports, which were hard to decipher, but generally indicated that the drives were OK. I also tried converting the raw disk images to a Parallels-compatible format using qemu-img, though this route was never fruitful.