|By Martin Petersen||
|January 3, 2008 02:00 PM EST||
On the storage side many installations employ RAID (Redundant Array of Inexpensive Disk) technology to protect against disk failure. In the case of hardware RAID, the array firmware will often use advanced checksumming techniques and media scrubbing to detect and potentially correct errors. The disk drives also feature sophisticated error corrective measures, and storage protocols such as Fibre Channel and iSCSI feature a Cyclic Redundancy Check (CRC) that guards against data corruption on the wire.
At the top of the I/O stack, modern filesystems such as Oracle's btrfs use checksumming techniques on both data and filesystem metadata. This allows the filesystem code to detect data that has gone bad either on disk or in transit. The filesystem can then take corrective action, fail the I/O request, or notify the user.
A common trait in most of the existing protective measures is that they work in their own isolated domains or at best between two adjacent nodes in the I/O path. There has been no common method for ensuring true end-to-end data integrity…until now. Before describing this new technology in detail, let’s look at how data corruption is handled by currently shipping products.