Saturday, April 30, 2011

What is a Build?

What is it about build software that separates from other software?

Builds generally convert source code into binaries.

That's a pretty loose definition. But it could cover all kinds of things.
Here are some condensed examples.

Example #1 : batch file (or any shell script)

    gcc -c main.c -o main.o
gcc -c util.c -o util.o
ld main.o util.o -o program.exe

Example #2 : MSBuild

    <Project DefaultTargets="Build" [snip]>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.Defaults.props" />
<PropertyGroup>
<ConfigurationType>Application</ConfigurationType>
<OutputPath>program.exe</OutputPath>
</PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ItemGroup>
<ClCompile Include="main.c" />
<ClCompile Include="util.c" />
</ItemGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
</Project>

Example #3 : Makefile

    program.exe : main.o util.o
ld main.o util.o -o program.exe
main.o : main.c
gcc -c main.c -o main.o
util.o : util.c
gcc -c util.c -o util.o

Example #4 : Scons

    src_files = ['main.c', 'util.c']
Program('program.exe', src_files)

Example #5 : QRBuild

    [PrimaryProject(typeof(MainVariant))]
public class Bar : CppProject
{
protected override void AddToGraph()
{
Compile(@"main.c");
Compile(@"util.c");
var ld = Link(@"program.exe");
DefaultTargets.Targets.Add(ld.Params.OutputFilePath);
}
}

These scripts all accomplish the same goal of generating program.exe from source code. However, they are each accomplishing this goal in very different ways.

The batch file always executes all three steps.

The MSBuild script is based on a dependency graph of "Targets". An MSBuild Target is basically just a sequence of actions.

  1. The project is instructed to execute the "Build" target first.
  2. The target named "ClCompile" (hidden in the .targets file) is named as a dependency of "Build". The "Link" target is also named as a dependency of "Build" and it depends on the "ClCompile" target. MSBuild topologically sorts the Targets and then executes them serially.
  3. The "ClCompile" is executed first. This target processes the Item group called "ClCompile" using the "CL" task. (there is no special relationship between the target name and Item group name) As mentioned above, Target execution is serial. Therefore, all CL task invocations occur within the ClCompile target invocation, before any other targets run.
  4. The "Link" target is now run, and program.exe is created.

The Makefile specifies a set of file-level dependencies. In make, a file is also termed a "target".

  1. When the user invokes 'make', the first defined target (program.exe) is the default.
  2. Make examines the makefile to determine what needs to be built (in what order) to create program.exe. In the example, it creates a graph where main.o and util.o must be created first, then program.exe will be linked using the object files as input.
  3. Make executes the build graph in dependency order.

The Scons script instances a Program Builder. The Program is defined to have file-level dependencies on the source-files.

Finally, the QRBuild script defines file-level dependencies. The real work is hidden behind the Compile and Link functions, where compilation and linking "Translations" are being added to a BuildGraph.


What happens if we run the same build script twice in a row?

The batch file re-runs, and takes the full amount of time to run all the steps sequentially. All other builds will report "up-to-date".


Builds only do work when files need to be built again.


The batch file can't really be considered a build. All the rest are capable of checking file dates. If an output is older than an input, then the output must be generated again. In MSBuild, such checks require the use of "Inputs" and "Outputs" attributes in the Target definition. In the absence of that, the Task is responsible for doing the checks. In make, Scons, and QRBuild, the same file-level dependency knowledge used to determine the build graph are also used for up-to-date checks.

What happens if we need to generate one of the source files?

Now things are getting interesting! Let's revise all these build scripts and see what needs to be done.

Example #1 : batch file (or any shell script)

    perl generate_main.pl > main.c
gcc -c main.c -o main.o
gcc -c util.c -o util.o
ld main.o util.o -o program.exe

Example #2 : MSBuild

    <Project DefaultTargets="Build" [snip]>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.Defaults.props" />
<PropertyGroup>
<ConfigurationType>Application</ConfigurationType>
<OutputPath>program.exe</OutputPath>
</PropertyGroup>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
<ItemGroup>
<ClCompile Include="main.c" />
<ClCompile Include="util.c" />
</ItemGroup>
<Target Name="GenerateMain"
Inputs="generate_main.pl" Outputs="main.c"
BeforeTargets="ClCompile" >
<Exec Command="perl generate_main.pl > main.c" />
</Target>
<Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
</Project>

Example #3 : Makefile

    program.exe : main.o util.o
ld main.o util.o -o program.exe
main.o : main.c
gcc -c main.c -o main.o
util.o : util.c
gcc -c util.c -o util.o
main.c : generate_main.pl
perl generate_main.pl > main.c

Example #4 : Scons

    src_files = ['main.c', 'util.c']
Program('program.exe', src_files)
env = Environment()
bld = Builder(action = 'perl $SOURCE > $TARGET')
env.Append(BUILDERS = {'GenerateMain' : bld})
env.GenerateMain('generate_main.pl', 'main.c')

Example #5 : QRBuild

    [PrimaryProject(typeof(MainVariant))]
public class Bar : CppProject
{
protected override void AddToGraph()
{
new GenerateMain(this.BuildGraph, "generate_main.pl", "main.c", this.OutDir);
Compile(@"main.c");
Compile(@"util.c");
var ld = Link(@"program.exe");
DefaultTargets.Targets.Add(ld.Params.OutputFilePath);
}
}
public sealed class GenerateMain : BuildTranslation
{
public GenerateMain(BuildGraph buildGraph, string input, string output, string buildFileDir)
: base(buildGraph)
{
m_input = QRPath.GetCanonical(input);
m_output = QRPath.GetCanonical(output);
m_buildFileDir = QRPath.GetCanonical(buildFileDir);
}

public override bool Execute()
{
QRDirectory.EnsureDirectoryExistsForFile(m_output);
string cmdline = String.Format("perl {0} > {1}", m_input, m_output);
// glossing over details here
Util.ShellExec(cmdline);
}

public override string BuildFileBaseName
{
get { return Path.Combine(m_buildFileDir, Path.GetFileName(m_output)); }
}

public override string GetCacheableTranslationParameters()
{
return m_input + " " + m_output; // in real life, use a StringBuilder
}

protected override void ComputeExplicitIO(HashSet inputs, HashSet outputs)
{
inputs.Add(m_input);
outputs.Add(m_output);
}

private string m_input;
private string m_output;
private string m_buildFileDir;
}


Notice that some systems require the human to maintain more dependency information.
With the shell script, you must define a global order. Adding one additional command requires understanding the entire build script, to avoid misplacing it. With the MSBuild script, you also must define a global Targets order. It is marginally better than a batch file in this regard.
The Makefile fully liberates us from the global ordering problem. Adding lines to define how main.c is generated is completely sufficient to "wire it" into the build. No additional inter-dependencies must be described by hand.
Scons and QRBuild provide the same liberation as the Makefile in this example.

Therefore, I postulate the following:
To be called a build system, the software must perform automatic file-level dependency ordering.

This definition excludes shell scripts and MSBuild from being defined as build systems, but still leaves healthy margin for plenty of other software to qualify.

Introduction

What is this QRBuild blog? It is named after my own build system which can be found here : https://github.com/fifoforlifo/QRBuild . But it's also a place to discuss good and bad things about build systems in general.

Let's face it. Most developers don't want to think about builds, or source organization, or source-control issues on a day-to-day basis. That's not to say that they don't care; they do care deeply in most cases. But when you're on a deadline, you want those things to Just Work (TM), so you can focus on getting the real stuff done. All too often, they get in the way instead.

I'm an ordinary developer who writes system software and debugger tools in my day job.

I have always complained about build systems. But until I wrote my own, I didn't feel like I had the right to complain about existing software. I told myself, "hey, maybe there are some deep unsolvable mysteries here, that a mere bystander could never understand."

Instead, I found that there are some real solid principles that should be applied in any build framework and day-to-day build scripts. And yes, there are the occasional unsolvable problems too. I have found that in the field, it is best to avoid recreating an unsolvable problem, and there are good strategies for doing that too!