RAID5有害无益?

看到一篇文档,认为RAID5在坏了一块盘情形下,数据丢失风险是单盘的N倍,N等于剩下硬盘的数量。例如,一个RAID5阵列由4块硬盘组成,如果坏了一块盘,剩下3个盘组成RAID0工作,数据分散写在3块硬盘上,故障概率是单盘3倍。即使RAID5 + Hotspare也不安全,因为Hotspare恢复数据灰常慢,而且恢复期间大量的读写会增加剩下硬盘故障的风险。结论就是如果你的数据很重要,就不要使用RAID5。推荐RAID6,因为RAID6可以处理2块硬盘同时故障的情况。

对于这种情形,读者自行理解。如下是原文:

RAID5 helps to consolidate storage space of several HDDs with some redundancy ‒ it can survive a single failure of a disk while information is still accessible.

Because of RAID5 availability in server hardware and some RAID controllers some may think it is an attractive storage option. We strongly disagree.

Similar to printer cartridges HDD life time is limited. Sooner or later HDD will die and it is difficult to accurately predict when it will happen. (We’ve seen cases when warranty had to be exercised for a $1000 HDD within weeks after purchase). Disregarding how long HDDs may last their failure is just a matter of time. Knowing this it is only natural to seek solution which can protect data from certain HDD failure.

The main point of any redundancy system is to protect against expected malfunction while never endanger data more than it is at risk on any non-redundant storage. RAID5 goes against this principle: in the expected situation when one HDD has failed, RAID5 is N times (where N is number of remaining disks in array) more vulnerable to data loss than single (standalone) non-redundant HDD.

RAID5 redundancy is fake because RAID5 is extremely vulnerable during HDD failures. Consider RAID5 array of 4 HDDs: data is spread across 3 HDDs plus one parity disk to ensure redundancy. However when one of the disks fails RAID5 works as RAID0 and the likelihood of catastrophic failure is 3 times higher than if data would be kept on standalone HDD. Obviously the more HDDs are consolidated to RAID5 the greater the risk. Over time risk tend to become certainty.

Even if RAID5 is implemented with hot spare (unused HDD for immediate automatic recovery) the chances are still against you.

Recovery may take hours especially when array is in use while stress from heavy IO operations during regeneration may contribute to the failure of the remaining disk(s). The likelihood of failure is even greater if failed disk was from the same batch as the remaining ones.

Hot spare helps but it is not a solution to lack of reliability in RAID5.

Consider another problem: if you have working RAID5 array with aging disks can you take the risk and try replacing HDDs proacitvely? Since it cannot be done safely the maintenance effort will be greater and therefore more expensive as it will imply backups and planning for possible downtime.

It’s been said that RAID5 is not a substitute for backup.
True but downtime costs and backups need time to restore.

With 3 HDDs RAID5 may be a feasible alternative to RAID0. More than 3 disks in RAID5 is a bad idea.

Conclusion:

The conclusion is simple: if redundancy or safety of your data matters, never ever use RAID5.

When HDD failures are expected RAID6 is an adequate solution for redundancy: RAID6 can survive two simultaneous disk failures. When one HDD fails RAID6 behaves like RAID5 still protecting your data with enough redundancy to recover from another failure.

此条目发表在Common分类目录,贴了, 标签。将固定链接加入收藏夹。