Our system provides fault tolerance by constant monitoring, replicating crucial data, and fast and automatic recovery. Chunk Replication allows us to tolerate chunk server failures. The frequency of these failures motivated a novel online repair mechanism that regularly and transparently repairs the damage and compensates for lost replicas as soon as possible. Additionally, we use check summing to detect data corruption at the disco IDE subsystem level, which becomes all too common given the number of disks in the system.