How to Build Binary Package
4 stars based on
I've been looking into how easy it is to confirm that a binary package corresponds to a source package. It turns out that it is not easy at all.
So I've written down my findings in this blog entry. I think that the topic of reproducible builds is one that is of fundamental importance to the free software and larger community; the trustworthiness of binaries based on source code is a topic quite neglected. We know about tivoization and the reality that code can be open yet unchangeable. What is not appreciated in sufficient measure is that parties can, quite unchecked, distribute binaries that do not correspond to the alleged source code.
Trust is good, but especially in a post-Snowden world, control is better. Can a person rely on binaries or should we all compile from source?
I hope to raise awareness about the need for a reproducible way to create binaries from source code. Free software means users have the four essential freedoms. Freedom 1 is the freedom to study how the program works and change it so it does your computing as you wish. It also means that the how to create binary package from source code does not do you what you do not want it to do. Instead of having to trust the supplier of the software, you can check that the software works as advertised and does not contain e.
Access to the source code is a precondition for this freedom. Many software packages are distributed in binary form and come with a license that makes the right to the source code explicit. For an executable work, complete source code means all the source code for all modules it contains, plus any associated how to create binary package from source code definition files, plus the scripts used to control compilation and installation of the executable.
A license that promises access to the source code how to create binary package from source code one thing, but an interesting question is: The straightforward way to find this out is to compile the code and check that the result is the same. Unfortunately, the result of compiling the source code depends on many things besides the source code and build scripts such as which compiler was used.
No free software license requires that this information is made available and so it would seem that it is a challenge to confirm if the given source code corresponds to the executable. Another service that most provide is compiling that software into executables and shipping those in convenient packages.
Most distributions ship two types of packages: A distribution is a how to create binary package from source code system that includes all the tools to compile source code. Those tools go beyond the tools that are how to create binary package from source code in the build scripts from the upstream developer. Distributions contain tools to create binary packages from source packages. Does this mean that it is less of a challenge to confirm if the source code corresponds to the executable?
Doing the test I have built a binary package from a source package for a number of distributions Debian, Fedora, and OpenSUSE and compared the self-built binary package with the one published by the distribution.
All tests were run on fresh, minimal installs of the latest version of each distribution using the tools that are recommended by the distributions. To keep the complexity low, one simple package was chosen: Will the self-built package be exactly the same, totally different or only slightly different?
Debian Debian was installed from a downloaded netinstall image: The system was installed on a VirtualBox machine. The version of tar that comes with Debian is 1. According to the instructions compiling the tar package from source is as simple as running:. This results in a file: The name of the file is the same as the name of the binary package published by Debian, but the size of the file is different from the size of the published package, vs Running the command again in a different empty directory gives yet another size for the deb file.
The command apt-get -b source tar is clearly not deterministic. To investigate what the differences between the packages are, they are unpacked:. The manual file is the easiest to investigate.
It turns out that it has a header with the date and time at which it was created: This section can be set with the argument --build-id of ld which defaults to taking the sha1 sum of the linked object files. The build id is derived from the object files. In the Debian build, the object files are created with debug information which is later how to create binary package from source code from the executable by stripping.
The debug information contains the build path and it is this build path which is the reason for the different build id. If tar is compiled repeatedly in the same directory how to create binary package from source code binary will be identical. A tar executable compiled in a different directory will have a different build id. Apart from these two differences, there is another common difference from the published binary package.
A deb archive is an ar archive that contains two tar archives: The ar archive and the two tar archives contain timestamps. If a build should be repeatable, the time that is stored should be a time that is taken from the provided files and not from the computer clock.
The timestamps, user and group and file mode information can be left out of archives. The binary package that was built from a Debian source package was not identical to the published binary package, but the differences are limited to timestamps and the build id in the executables. Unless the function of the executable relies on this build-id, the self-built tar executable functions in the same way as the published version.
Fedora Fedora 18 was installed from a net install. The option 'minimal system' was chosen as the software selection option. This creates a system with packages that take up MB. The tar binary and source RPMs were downloaded from the fedora repository and built with:. This is because the man file is taken from the source package: Fedora has modified the man page and ships the generated version in the source rpm.
The man pages for tar and gtar are the same file. The info files give a large diff. This is due to the presence of a timestamp and a lot of generated cross-references. The executables are also very different.
The self-compile tar is 8 bytes larger. The build id is different and there are differences scattered throughout the file. Many of these are just single bytes and probably different offsets to functions. This idea is consistent with the difference in output of readelf -a tar. All the function names are there in the same order, but many numbers are different.
Just like ar and tar files, rpm files contain timestamps which can be seen with rpm -qvlp tar The timestamps of the compiled files have the time of the build as their how to create binary package from source code stamp. The Fedora package showed more differences with the published package than the Debian package did and unlike the Debian case, not all of the differences how to create binary package from source code be explained.
The executable built from the published sources is so different from the published executable that it is not easy to know if it will function the same way. Only two files differed: The man files differed, as in the deb file, due to their timestamp.
The tar binary contained a surprise: The debug information was not stripped. Stripping the file completely reduced the difference in size to 48 bytes. The build id was different and the published version contained a. Apart from the header and the last 2k bytes the files were identical. Conclusion A cherished characteristic of computers is their deterministic behaviour: This makes it possible, in theory, to build binary packages from source packages that are bit for bit identical to the published binary packages.
In practice however, building a binary package results in a different file each time. This is mostly due to timestamps stored in the builds. They may be due to any number of differences in the build environment. If these can be eliminated, the builds will be more predictable. Binary package would need to contain a description of the environment in which they were built. Compiling software is resource intensive and it is valuable to have someone compile software for you. Unless it is possible to verify that compiled software corresponds to the source code it claims to correspond to, one has to trust the service that compiles the software.
Based on a test with a simple package, tar, there is hope that with relatively minor changes to the build tools it is possible to make bit perfect builds. One advantage of the Open Build Service has is that it documents the build environment and allows you to recreate it relatively precisely not exactly though OBS handles building the debuginfo packages so I'm guessing this is why rpmbuild didn't strip the how to create binary package from source code.
On slashdot, an anonymous coward noted also that osc is the tool of choice and even claimed to have gotten identical builds. I have not yet verified that, but it sounds great. The mentioned package, build-compare, has some scripts that are meant to compare package whilst ignoring variable parts of the build.
It's not mandatory or anything, of how to create binary package from source code It builds using a chroot, I believe, so it does indeed probably lead to very close or identical results. It is of course limited to building for the architecture of how to create binary package from source code system it is on, OBS does not have that problem as it uses clean VM's each time it builds.
That makes builds even more reliable and easy to perfectly reproduce.