Introduction to Package Management

Package management is absolutely everywhere!

But don’t just trust me, listen to what master Yoda has to say about package management:

My ally is the package, and a powerful ally it is. Maintainers create it, distributions make it grow. Its dependencies surround us, bind us. Luminous beings are we, not this crude tar ball extracting and source code compiling. You must feel the package management flow around you. Here, between you, me, the x-server, the window manager, yes, even between the microphone and the webcam.”

– A slightly altered quote from Yoda in Episode V: The Empire Strikes Back

Ok, ok, I admit I changed a few words here and there, but still. If Yoda had been a master of GNU/Linux-based distributions, he very likely would have said something along those lines about package management. From my own experience, I can tell you that package management often is the sole reason somebody steers away from their current distribution. They move to another one because he or she needs software that is not or not yet available in the distribution’s repositories.

So what is package management, what makes it so important and how does it work? In this post, I will give you a comparatively short introduction to package management, without the need for any prior knowledge about software distribution. In addition to that you will learn the most important things about how package management works and in what forms it can be found in FOSS systems.

Let’s dive in!

What is package management?

Since you may have a different understanding of package management depending on your background in computer science, it is a good idea to start with a definition [1]:

A package-management system (package manager) is a software tool, or a set of software tools that is used to automatically manage the installation, upgradation, configuration and removal of software packages (…).

If you want to have a specific software installed on your system, you have to get the correct source files. Consequently, the developers of a software project often use the tar command to combine all the files needed for their software in a single archive file, called tar ball. The tar ball can then be downloaded via FTP, HTTP or sent to users via email . Finally, you have the files! But remember, these are source files, not conveniently pre-compiled for you. Thus you have to compile them yourself and this task alone can be quite difficult.

However, once you installed it you have already lost all control over this package. Since the files are spread across your entire system, this makes upgrading impossible because there is no trace of the files‘ locations. Additionally, the package usually does not contain everything it needs to run as its dependencies are missing. So, if the package you downloaded depends on any other software, you have to manually install the dependency and hope it does not conflict with any previously installed software [1].

What is a package?

In short, a package is just a set of coherent functionality. Imagine the following: You are trying to change the color of your fancy new gaming mouse. Since someone else has already written a small python script to do exactly that you want you want to use it. How do you get that script installed on your system and how do you make sure you have all dependencies installed that the script requires in order to run properly at the same time? Without a package you would then have to do these things manually.

The problem with dependencies

Note that the software you want to use and its dependencies are usually separate projects developed independently of each other. This means, while Software A might work with version 1 of Software B it might not work with version 1.2 or version 2.0 of the same software. Sooner or later, you may find yourself in a place called dependency hell.

You see, managing software without package managers was a cumbersome, error prone and manual process. On the other hand, from an architectural point of view, you also want to have specific functionalities bundled into packages for reuse. This way, packages provide modular software components which can be assembled to provide a user with the desired functionalities [2] [3].

The need for package management in software engineering

Historically speaking, companies used to offer their software products or components as commercial-off-the-shelf (COTS). However, this changed with the widespread adoption of Free and Open Source Software (FOSS), which opened up the centralized and closed way in which software was developed. Nowadays, component based software development has become common practice. This emphasized the problem of creating unusable or corrupted systems. Maintaining a component based system is a complex and difficult task because of the many relationships, implicit or explicit, shared among components.

These relationships between components are called dependencies. In Figure 1, you can take a look at a visualization of the dependencies of gcc, a popular C compiler.

Now, you might be tempted to think that you are able to manage these dependencies manually. And that the graph above does not look as bad as I claimed earlier without package managers. Let me show you another graph, this time showing the dependencies of vim, a very popular editor.

**Figure 2**: A visualization of the dependencies of `vim`, a popular text editor. For a detailed description what kind of relationship the different line types and colors of the arrows represent, take a look at the debtree man pages.

It gets even worse. A typical GNU/Linux-based distribution is made up of several thousand packages, of which each package usually has a dependency relationship by itself. Luckily for us, distributions provide these packages for us by using repositories, often in different package releases. We can rely on the distributor to collect packages from upstream sources and that the packages get approved. Approved meaning compiled, integration tested and dependency managed before we can install them on our systems. The success of FOSS distributions is partly attributed to the availability of large collections of software packages [4].

Different options of package managers

So far, we have learned why we need package managers and that all of us have most likely already been in contact with at least one package manager. Let’s now turn things from abstract to concrete and take a look at some of the package managers available to us. This depends on the distribution we are using.

Often, package managers are classified into frontend and backend managers. For example, the Advanced Packaging Tool (APT) acts as the frontend for the low-level Debian based DPKG packaging tool. This way, APT can be more user friendly by providing command line tools for searching, managing and querying information about packages. It can then resolve the dependencies automatically. Its access to the package sources, the repositories, can be configured. A repository is essentially just a directory listing, consisting of software packages with an index file. Many package managers also provide their functionalities in a GUI. Synaptic, which is the GNOME front end for APT, is a popular example for a graphical package manager. In Debian, there are currently over 62.000 packages available to you [5].

Structure of a package

A package manager may use its own file extension like .deb or .rpm. These are usually archives with three main components embedded into them. Figure 3 provides an overview of the content of a package [6].

**Figure 3**: A package consists of three parts if we abstract over format-specific details [6]

The first part, the set of files (1), is common in software packaging solutions. It represents the filesystem encoding of what the package is delivering: executable binaries, data, documentation and so on. Configuration files (1.1) are meant to be locally customized, with or without the help of the respective package manager. They are comprised of a subset of the shipped files, identifying those affecting the runtime behavior of the package. Overwriting them is dangerous because they may contain local changes which would be lost when updating the package [5].

The meta-information (2) varies and is dependent on the distribution the package is used for. Usually, a unique ID, a version number, information about the maintainer of the package as well as a package description are listed. Most notably, distributions use meta-information to declare inter-package relationships (2.1). This way, the package can clearly communicate that they may not be installed at the same time as a certain other package, or that they depend on the presence of others [5] [6].

Lastly, packages contain a set of executable configuration scripts (3). This way, package maintainers can attach actions to hooks executed by the installer. Usually, actions are being utilized in the form of POSIX shell scripts.

Now that we have understood what package management is and what a package is, it becomes very clear what a distribution actually is: A collection of coherently maintained packages [6].

Conclusion

We started with my claim that package management is everywhere and that it surrounds us almost every time we interact with a system. It has become clear now how important the topic package management actually is. It is our best defense against unusable or corrupted systems and helps us automate the dependency resolving process.

Acknowledgments

The author thanks Frank Blendinger for his feedback and for proof reading this article

For more information on the topic go to MobileApps & WebApps | Method Park by UL.

References

[1] Varun A, Rahul M Patil, Pratibha Kantanavar, Shobha
G (2019) A comparative Study of various Linux
Package-Management Systems. In: Advances in
Computational Sciences and Technology, Volume 12,
Number 1, pp. 37-44.

[2] Decan A, Mens T, Grosjean P (2019) An empirical
comparison of dependency network evolution in
seven software packaging ecosystems. In: Empirical
Software Engineering, Issue 1.

[3] Pietro Abate, Roberto Di Cosmo, Ralf Treinen, Stefano
Zacchiroli (2013) A modular package manager
architecture. In: Information and Software
Technology, Volume 55, Issue 2, pages 459-474.

[4] Fabio Mancinelli, Jaap Boender, Roberto Di Cosmo,
Jérôme Vouillon, Berke Durak, et al.. (2006)
Managing the Complexity of Large Free and Open
Source Package-Based Software Distributions. 21st
IEEE/ACM International Conference on Automated
Software Engineering, Sep 2006, Tokyo, Japan.

[5] Aoki, Osamu (2021) Debian Reference, Chapter 2. Debian package management. Retrieved from https://www.debian.org/doc/manuals/debian-reference/ch02.en.html on March, 30, 2021.

[6] Roberto Di Cosmo, Stefano Zacchiroli, Paulo Trezentos
(2008) Package upgrades in FOSS distributions:
details and challenges. In: Proceedings of the 1st
International Workshop on Hot Topics in Software
Upgrades (HotSWUp ’08). Association for Computing
Machinery, New York, NY, USA, Article 7, 1–5.