The Build Log
Often micro-optimizations like the above are less impactful than structural optimizations where you change the algorithm or approach. This was the case with Ninja’s build log.
One part of the Linux kernel build system tracks the commands used to generate outputs. Consider a motivating example: you compile an input foo.c into an output foo.o, and then change the build file such that it should be rebuilt with different compilation flags. For the build system to know that it needs to rebuild the output, it must either note that foo.o depends on the build files themselves (which, depending on the organization of the project, might mean that a change to the build files would cause the entire project to rebuild), or record the commands used to generate each output and compare them for each build.
The kernel (and consequently the Chrome Makefiles and Ninja) takes the latter approach. While building, Ninja writes out a build log that records the full commands used to generate each output.9 Then for each subsequent build, Ninja loads the previous build log and compares the new build’s commands to the build log’s commands to detect changes. This, like loading build files or path canonicalization, was another hot point in profiles.
After making a few smaller optimizations Nico Weber, a prolific contributor to Ninja, implemented a new format for the build log. Rather than recording commands, which are frequently very long and take a lot of time to parse, Ninja instead records a hash of the command. In subsequent builds, Ninja compares the hash of the command that is about to be run to the logged hash. If the two hashes differ, the output is out of date. This approach was very successful. Using hashes reduced the size of the build log dramatically–from 200 MB to less than 2 MB on Mac OS X–and made it over 20 times faster to load.