Sunday, May 1, 2011

File-level dependencies == parallelism

I mentioned in a previous post that file-level dependency ordering is what defines a build system.
Let's see why this is with a counter-example : MSBuild.

MSBuild works by running a list of Targets in dependency order. Each Target executes one or more tasks in sequence. So overall, all Task executions are completely serialized within a single project's execution. A fancy C++ project sequence will go in phases like this:
  1. Generate source file gen1.cpp from gen1.pl
  2. Generate source file gen2.cpp from gen2.py
  3. Compile all C++ sources. (gen1.cpp, gen2.cpp, a.cpp, b.cpp, c.cpp)
  4. Link object files.
  5. Copy linker output to final destination.
According to MSBuild's programming model, each step must fully complete before the next step may begin. In this case, the opportunity to parallelize the code-generation from #1 and #2 are lost. Additionally, #1 and #2 could be parallelized with the compilation of a.cpp, b.cpp, and c.cpp; this opportunity is lost as well.
With max parallelization of 3 set on the CL task, you might see this play out:

Time01234
Proc #2..c.cpp..
Proc #1..b.cppgen2.cpp.
Proc #0gen1.plgen2.pya.cppgen1.cppprog.exe

A build system that supports file-level dependencies overcomes this kind of wasteful serialization. The code-generation can occur while independent compilation occurs. For example, a Makefile can achieve the following:

Time01234
Proc #2a.cppgen1.cpp...
Proc #1gen2.pyc.cpp...
Proc #0gen1.plb.cppgen2.cppprog.exe.

Of course, both of these examples are over-simplified -- build steps are quantized into equal chunks of time. In a real build, variable times for each build step will cause things to overlap in the most efficient way possible, resulting in optimal behavior under varying conditions.

No comments:

Post a Comment