Recent Posts



« | Main | »

Preventing Catastrophic Data Loss

By dbott | October 11, 2009

It is not a question of IF… it’s a question of WHEN.  It happens when someone’s laptop is stolen.  It happens when fire rips through a business.  And it happens when large, professional data storage companies drop the ball:

The big story today is about Microsoft subsidiary Danger losing all T-Mobile Sidekick customer data from their servers. Danger is the company noted for the T-Mobile Sidekick, the revolution in cloud mobile, and most memorably, almost everybody living in 90210 having to get new phone numbers because of Paris Hilton. Valued T-Mobile Sidekick customers received a notice today from the company updating them on the “data disruption” problem. The good news is that data is no longer being disrupted. The bad news is that there is no data left to be disrupted.

… as Sidekick users found out today, and ironically, as 7,500 users of online backup provider Carbonite found out after the company lost their backups (Carbonite can take some comfort in that they now rank very well for ‘data loss’ in search engines because of the incident. What do they say about bad publicity?). In the Danger case, it appears from initial speculation that the data was lost because they attempted to upgrade a storage array without backing it up first. Here is a case of smart and rational people who do this for a living at one of the best companies in the world, and they didn’t even bother making a backup…

When it happens to you, will your data be safe?

There are a variety of reasons that people buy NAS devices, such as the ReadyNAS.  Many families have multiple computers and are looking for a way to provide easy, centralized access to their data.  Other folks have suffered lost data due to hard drive failures and want an easy way to keep their data backed up.

Many NAS devices incorporate RAID (redundant array of inexpensive/independent disks) into their design.  RAID is designed to allow users to recover from a single disk failure by striping the data across multiple disks.  In the event that a single disk fails, the RAID array can rebuild the missing data by performing a parity calculation and the user can still have un-interrupted access to their data.  When they replace the failed disk with a new one, the RAID array will automatically rebuild itself and continue to offer a protected data volume.


Misconception 1:

Many people think that storing their data on a RAID-protected NAS is an adequate backup.  In fact, it is not.  Many people use their NAS as a central server and store the only copy of their data right on the NAS.  Things like fire, flood, theft, accidental or intentional deletions, user error or multiple hard drive failures can all lead to catastrophic data loss. If the only copy of your data is on the NAS, it is not a backup!

Misconception 2:

Many people use client-based software to backup the data on their PC to the NAS.  This is a good thing because now they can maintain multiple copies of their data.  The most current copy is on their PC and, if they’re using versioning software, they can maintain a number of versions of their files on the NAS.

NTI Shadow with Versioning

NTI Shadow with Versioning

The only problem with this method is that all of your data is stored at the same location.  A fire, flood, theft or other natural disaster will take your PC, NAS and whatever data resides on them.  If all of your data is stored at the same location, it is not a secure backup.

Misconception 3:

The third misconception is that you can use one of the RAID drives as a backup by pulling it out and storing it off-site (kind of like a backup tape).  This is definitely not a recommended method of maintaining a backup or your data. corndog from sums it up perfectly:

…using RAID-1 hot-pull as a method of backup is insane. I’ve posted previously about this. There are a few reasons why you should never ever do this. First – RAID is designed to be a “first level” method of data protection. That means it is supposed to protect the live data on your system. You still need to do backups, but RAID is supposed to protect against most problems, so you only need to go to your backups in the case of a real catastrophe. RAID protects the live running data – there is no guarantee about disks that are pulled. By design, pulled disks are crap. RAID makes no effort to ensure that the data on those disks is meaningful in any way.

Second, when you pull a disk, there are multiple levels of commit that you have just broken. What guarantee do you have that the NAS wasn’t right in the middle of saving a file right when you pulled the disk? Just because you made sure all your clients were not writing anything doesn’t mean the NAS wasn’t clearing a cache or something – you simply do not know. When you do not know, you should EXPECT corruption.

Third, your file system is not cleanly dismounted. Think about it. Say for instance you want to keep a safe copy of your Windows data somewhere, and you are planning to pull the disk that is in your Windows machine. Wouldn’t you make sure you did a clean shutdown and the drive is dismounted properly before pulling out the disk? You wouldn’t just pull the power plug and kill the machine, then pull the disk right? That’s exactly what you are doing to the contents of the disk when you hot-pull a RAID1 drive.

Fourth, think of what RAID1 is all about – protection against drive failure. Now, think about how hard your drives are working during a typical day. Failure is most likely when the drive is most busy. If each day you hot-pull that disk, not only have you just walked away from redundancy until your new replacement disk is all mirrored again, you’ve also just put your drives to work – HEAVY work – remirroring. Guaranteed your disks are working waaaay harder during the re-mirror process than at any other time during the day. So just when they are most likely to fail (when they are working the hardest) you have also intentionally removed the RAID protection at this same time too – that’s just plain CRAZY!

Fifth, when you hot-pull a drive, it is powering off right in the middle of running – it has not been safetied, and it is MOVING (i.e. sliding out of your NAS). No matter how carefully and how smoothly you pull that drive, compared to the fine tolerances of space between the heads and the platter, you are giving it a massively rocky ride. Are you sure you aren’t causing the heads to crash onto the platter while you are doing this? Where have you ever read that it is safe for the disk to hot-pull a good drive? Hot-pull is meant for drives that are already dead. Plain and simple.

Bottom line – when RAID was designed, it was never intended as a method of hot-pull-backup. For the reasons outlined above, it should never be done. Think about it this way: Backups are important. They protect your precious data. Backup solutions (such as backup software or synchronizing tools like rsync) are designed specifically for doing exactly this job – backing up your important stuff. If your data is really so important, why would you rely on a method that was never intended to even work, to protect it, when there are other, much better methods that were actually designed to do this job properly? For example, why would you try to put new tires on your car while driving down the road, when there are perfectly good garages you could pull into, and they can do it right?

I regularly see people on many different forums complaining about trying to use RAID1-hot-pull as a backup method and they always seem to have problems – makes me shake my head in disbelief every time. They mustn’t really care about their data, or they would use a proper backup method.

Proper Backups

A proper backup consists of multiple copies of data stored in different locations.  If the only backup of your NAS sits on the shelf beside your NAS and there’s a fire or theft, chances are your backup is going to be lost as well.  At my place of employment, I backup over a dozen servers to a primary NAS at regular intervals throughout the day.  Every night, I sync the data on the primary NAS to 2 separate NASes located at our remote locations.  In the event of a major catastrophe, I have multiple backups maintained at multiple locations.

I’m not suggesting that this is an affordable or practical solution for the home user or SOHO-user, however, a couple of large capacity external USB drives can be utilized to provide rotating backups that are kept off-site.  There are also services provided specifically for the ReadyNAS like ReadyNAS Vault that offer a secure, remote facility to store your data.

There are data restoration companies like DriveSavers who can recover data from failed or damaged drives for the price of a new car — give or take a few thousand dollars.  They can’t recover data from a stolen device, though.

For a few hundred dollars, you can make sure that your data is safe.  Just be sure to regularly backup your data, verify that your backup procedure works and that you can restore your data, and then keep the backup in a different location.

Topics: ReadyNAS, Tech | 2 Comments »

2 Responses to “Preventing Catastrophic Data Loss”

  1. dbott Says:
    October 20th, 2009 at 11:20 am

    Another article on CNN:

  2. datajunkie Says:
    November 9th, 2009 at 6:58 pm

    I wish I read this 6 months ago. I had a Mac, PC and windows server, all data on mac and pc was synced to my server and the server was backed up to a drobo NAS, thought I was safe. But as you have pointed out I was not. I found this out in a bad way when I had a break in and everything stolen, data including backups all gone.

    I lost around 18GB of data, worse still I ended up losing about 20% of my client base as a result but my business just survived with the remaining 80%. Cost me alot of money I can tell you.

    I had to find a better way of backing up. After purchasing new hardware, pretty much the same as the old I needed a solution which would backup my data locally and also find a way of backing up offsite.

    After a chance meeting I was introduced to this solution now automatically backs up my data to my nas drive but also to 3 offsite data centres so hopefully I will not lose data again. I have done a few test restores, recovering data via the internet is not the quickest, but at least I can recover.

    This was a situation I have heard about but would not have thought it would happen to me, but it did.

    My parting message is get a backup onsite or ioffsite as soon as possible.