Tuesday, May 3, 2011

File up-to-date checks

What determines whether a file is up-to-date in a build?
Quite simply, it is up-to-date when its inputs have not changed since the previous build. If a file's inputs have changed, that file is said to be dirty.

For example, with a C++ object file foo.obj generated from foo.cpp, the following are its inputs:
  1. Contents of foo.cpp.
  2. Contents of header files that it last included.
  3. Command-line parameters (aka command options) to the compiler.
  4. The compiler.
  5. Environment variables that influence the compiler.
In order to really get this right, each object file must have some dependency-file representing all this info at the end of the last successful compilation.
Some build systems (like make and MSBuild) use heuristics like "if any-input-modified-time > any-output-modified-time then consider output-file dirty". This works in common cases, but fails if a source-control system puts a file-date back in time.
Ultimately, the information used to determine whether a file has changed is a policy decision, best made by the user. It's quite clearly a policy since there is a continuum of trade-offs between performance versus accuracy, with no clear "right" answer.

Scons generalizes this policy with the Decider function. There is pre-written support for:
  1. MD5 signature : very accurate but requires reading all bytes of every input
  2. Timestamp checking : less accurate since touching a file triggers rebuilding its dependent targets, but low overhead to do the check (fstat is fast)
QRBuild has taken a page from the Scons book, even going so far as to calling the interface IFileDecider. By default, a date+size decider is used. An MD5 decider is also available.

Notice that I didn't mention a change to the list of header files. #1 and #2 are sufficient to catch that, because the only way to change the list of headers is by either modifying the source file, or modifying one of the headers that was included in the last compilation. Many build systems successfully exploit this property, not least of all make with gcc.

QRBuild handles these checks in the engine, by requiring all Translations to implement functions that return lists of explicit inputs and outputs, implicit inputs, and canonicalized "cacheable" translation parameters. Splitting responsibilities and providing this as a standard feature frees individual Translation writers from having to repeat this work in each Translation class.

Frequently, you will find build systems that don't take compiler options into account. The result is frustrating -- for example, you might edit a simple Makefile and run make, only to find that nothing was recompiled! Custom build steps in Visual Studio projects often suffer the same issue.

Some authors work around this by adding the Makefile/build-scripts to the target files' dependency lists. While this is technically correct, it is a vastly sub-optimal solution. Since a single build script usually controls many independent targets, a modification to the command options of one target will cause all other independent targets to rebuild as well.

Lastly, you will find it very rare to include the build tools themselves into the dependency list. To some degree this is not as big a deal, since choice of build tool rarely changes. But it's easy to add at least the required executable to each target's dependency list as basic insurance. The QRBuild MsvcCompile translation class handles this by adding the VcBinDir and toolchain to the cacheable translation parameters.

No comments:

Post a Comment