Packaging Native Binaries

In the realm of software development, managing libraries in precompiled binary format is non-trivial. Java jars and .NET nupkgs have a fair bit of flexibility and customizability, and with flexibility comes complexity. However, the complexity surrounding these intermediate binaries pales in comparison to that surrounding precompiled native binaries. This complexity is so overwhelming, that historically it was considered impossible or at least impractical by most. This post highlights a few of the dimensions of complexity which exist when packaging native binaries, and explains the modern and innovative approach used by Conan.io to address them. While this post is focused on C++ as a use-case, much of it applies to other native languages as well, including Rust, Go, D, and so on.

State the Problem

In short, the current paradigm surrounding virtually all native language development is that developers must compile all libraries from source themselves on the machine they are developing. Often they must use multiple machines, and often they must compile these libraries multiple times per machine for a variety of reasons. This whole situation is a problem. Considering the relatively high value and cost of developer time, it is truly remarkable that this paradigm exists as it does. Thus, one of the primary value propositions of a native binary package manager such as Conan is to reduce the number of redundant compilations of native binaries in the world each day. Of course, this would have been achieved many years ago if it were simple, so lets get to the challenges.

Challenge - 1 Library : 1 Package : N Binaries

The practical difference between packaging libraries from intermediate languages and those from native languages is N. One of the key benefits of intermediate languages is that N = 1 (usually). The same .jar file can be used on Linux, Windows, and Mac. This is not the case with native libraries. After compilation, all native binaries end up with a specific “Application Binary Interface” (or ABI). You don’t need to fully understand ABI’s for the purposes of this article, only that any two binaries are generally only compatible if they were compiled under the same “compilation conditions.” Relevant compilation conditions include some obvious factors such as Operating System and Architecture, as well as less obvious factors such as Compiler, Compiler Version, and a LONG list of compile time options. Thus, when it comes to packaging a library in native binary format: N = A x B x C x D x ... . In other words, N can be an arbitrarily large number of binaries (potentially 100+). This is a challenge both in terms of storage space, and complexity of managing so may different binaries during development. It may seem overwhelming, but do not be discouraged.

"The journey of a thousand miles starts with one step." - Lao Tzu

Start Small, Start Local

Of course, we’re talking about developers, most of whom value their own time and have not been sitting on their hands. Developers love to engineer their own solutions to such problems. As a result, there have been countless personal and organizational strategies invented and re-invented to mitigate the redundant compilation problem. All such solutions basically boil down to some form of what we’ll call a “local binary cache”.

Developers often precompile multiple variants of the libraries they need using the settings they know they will use frequently. They then store them in local directories, using some naming convention as an index for distinguishing between unique binaries for each library. Finally, they write build scripts which take some of the “compiler conditions” as arguments, use them to locate the appropriate binary in the cache, and return the relevant paths to the build system. With this approach, they get the benefit of being able to use the same set of precompiled binaries across multiple projects. The amazing thing about this strategy, is that it’s one of those rare cases where the simplest idea turns out to be the best idea.

Native Binary Cache - A Simple Convention

The notion of a local binary cache now seems an obvious strategy for managing multiple precompiled variants a library. And, in fact it is this simple concept that is at the heart of the entire Conan package management platform. It also turns out that there were already several mature implementations of local binary caching within the ecosystems of interpreted languages. Conan simply needed to borrow some of the existing strategy from the best of these implementations, and devise some additional logic needed to deal with the various complexities of native binaries which we discussed earlier. This additional logic includes a directory structure which can handle N binaries-per-package and an extensible metadata strategy which can define all the factors relevant to ABI compatibility. Crucially, it also includes the logic needed to lazily download and cache binaries of a package individually as-needed, minimizing the amount of local storage used on a development machine which references a given package as a dependency.

A Generalized Solution to a Widespread Problem

When looked at objectively, we can see that there is nothing magical about the design and layout of the Conan repository and package structures. For Java packages, many different tools have sprung up around the Maven repository and package conventions, including Gradle, Buildr, Ivy, SBT, and others. Likewise, the ecosystem of tools for native development can easily begin extending support for the Conan conventions. As there is a great deal of innovation and competition occurring right now in the world of C/C++ packaging (as well as in Go and Rust), it seems valuable to point out that the ecosystem around the Maven structures has been healthy and successful, largely because the developers of tools adopted the existing Maven conventions rather than trying to define new ones. For example, the build system Gradle took a radically different approach from the Maven build system, and Bintray created an alternate central Maven repository called JCenter().

Again, all of the above represents a healthy level of competition thanks to a tools developers choosing to build in interoperability with the dominant package and repository format in the ecosystem. Currently, Conan is the only package manager which defines effective package and repository conventions, which feature cross-platform native binary comprehension with user-definable ABI profiles for each binary in each package. There are obviously many other package managers which DO handle precompiled native binaries, but they are tailored towards other use cases. Conan is simply the first and only one which is tailored toward the cross-platform native development use case and workflow, and it’s taken them several years to reach this point.

Possible Futures for C++

Hopefully we’ve helped to share a basic understanding of the challenges facing native binary packaging, and the Conan conventions which effectively address them. At present, Conan is being implemented in the C++ community by a small but growing number of developers, with a growing central repository of native binary packages. It has already begun to deliver on the original value proposition of reducing redundant recompilations each day. If these trends continue and Conan is adopted into a significant number of Open Source projects, the value delivered by its simple conventions will grow exponentially. Hopefully, it can also give rise to a new class of tools in the C++ ecosystem, focused on packaging, and based on these simple conventions as de-facto standards.

Possible Alternate Realities

While this is a bit of an aside, it would be remiss not to mention it. If we look outside the realm of C++, we can see that the package management systems for the more modern native languages of Go, Rust and D, are entirely focused on sharing sources and build instructions. None even attempts to address the redundant compilation problem. Thus, there could be a potential future (however unlikely) in which Conan’s native binary packaging and repository conventions were leveraged by Go, Rust, and D tooling to cache binaries. Despite strong political rhetoric that would undoubtedly cloud any discussion on such a topic, there is no technical barrier to such integration and that simple fact could shine through. In fact, for posterity it should be mentioned that Conan has provided a simple demonstration of its ability to package binaries for Go in the docs, should anyone ever ask the question: Conan for Go



© 2017. All rights reserved.

Powered by Hydejack v6.6.1