In a computer operating system, the file system is the part that writes data to disk and tracks
where the data is stored. if th. computer crashes while it's writing data, the file system's
records can become corrupt. Hours of work could be lost, or programs could stop working
properly.
At the ACM Symposium on Operating Systems Principles in October, MIT researchers will
present the firsffite system that is mathematically guaranteed not to lose track of data during
crashes. Although tfre fite system is slow by today's standards, the techniques the researchers
used to verify iti performance can be extended to more sophisticated designs. Ultimately,
formal verification could make it much easier to develop reliable, efficient file systems'
"What many people worry about is building these file systems to be reliable, both when they're
operating noimally but also in the case of crashes, power failure, software bugs, hardware
.i.orr, what have you," says Nickolai Zeldovich, an associate professor of computer science and
engineering and one of three MIT computer-science professors on the new paper' "Making sure
thit the file system can recover from a crash at any point is tricky because there are so many
different places that you could crash. You literally have to consider every instruction or every
disk operation and think, 'Well, what if I crash now? What now? What now?' And so empirically,
people have found lots of bugs in file systems that have to do with crash recovery, and they keep
finding them, even in very well tested file systems, because it's just so hard to do.