Packaging system effort

Simon 'corecode' Schubert corecode at fs.ei.tum.de
Thu Feb 26 04:42:32 PST 2004


Hey people,

As some of you might know I've written up my thoughts about what a 
proper packaging system should provide etc. It has become a bit 
longish, but I just had to add everything in my mind (at least 
concerning packaging), I feared otherwise people would start 
bikeshedding about things I had thought about but didn't write down.

This text is not about implementation, not even a little bit (if there 
is some, ignore it). This is intentional. I think we first need to come 
to a conclusion *what* we want and after that start thinking about 
*how* we will implement it.

Uhm, there was some talk about a June release, no? :) So timeline is 
pretty narrow (if we want to have our own packaging system ready for 
then). I propose some weeks (1-2) for discussion about these points 
here; in this time we need to come to a conclusion about what we want 
and what of that needs to be in in the first release. After that some 
weeks for implementation proposals and discussion (proposals should be 
written down completely and then submitted to the list so that we can 
all discuss about various complete concepts rather than about fragments 
which could be implemented). After that, start hacking [yes, me too] :)

Now here it comes. Oh yea, current version will be available at 
<http://chlamydia.fs.ei.tum.de/~corecode/packaging.txt> and Justin will 
put up the text (after it has been polished) on the blog and the main 
page too I guess.

Thanks for taking your time for reading and commenting(!)...

cheers
  simon
Thoughts about a packaging system
---------------------------------
$Revision: 1.2 $
$Date: 2004/02/25 15:02:27 $
A package building and installation system (referenced as packaging 
system now)
should provide several features which come to mind after some time of 
using
other (partly incapable) systems and thinking about usability.

This is what I did for the last months, still I failed to write down my
thoughts, now I'll try and write down the mess in my head :)
My current knowledge concerning packaging systems is rather limited to 
Debian's
dpkg, FreeBSD's ports, NetBSD's pkgsrc and Gentoo's portage. As of such 
I'll
only reference to these systems for comparison with the One True System 
(OTS)
I'll describe here.

The packaging system should be OS agnostic in almost all parts, the 
package
descriptions in large parts. This is a requirement if the system wants 
to be
OTS, or - at least - provide its functionality to a wider range of OSes 
(see
pkgsrc/zoularis, portage to some extent).

The packaging system must provide the necessary infrastructure to hold
descriptions for multiple versions of one ``program'' without lots of 
overhead
and easy (read: easy, sane choice of versions) usage. This 
functionality is
needed for both specialized deployment/installation strategies and 
multiple
OS/arch support as described below. Portage provides such a feature, 
ports
doesn't really (ok, there are -devel and versioned directories, still I 
see this
more as a bandaid).

Multiple architectures must be supported. This means build quirks for 
special
architectures and the need for multiple versions of one program, as 
e.g. i386
might be supported the best whereas amd64 might only be unstable for 
the same
version or won't build at all.

It is highly desirable to be able to install multiple versions of one 
program at
the same time. Besides means to enable this in the filesystem (symlinks,
variable symlinks, VFS voodoo, etc) - which might not be available on 
all target
platforms - this also adds some more questions concerning the logic of 
newly
installed packages. Imagine two perl versions installed: 5.6 and 5.8; 
which
version should the newly installed spamassassin depend upon?

This brings us to another point: Clean build environments, environments 
which
only contain the build and/or (to be discussed) runtime requirements 
(note that
dependencies are the opposite - a misnomer in ports/pkgsrc), so that 
there is a
guarantee that various configure scripts (or whatever) don't suck in 
optionally
supported components and create not registered requirements. This also 
needs
special filesystem voodoo, VFS might be a nice thing to use, pkgsrc 
does this
via buildlink's symlink system.

As the system should be easily usable and not just academic, easy tools 
are
strongly needed. This includes tools for all sorts of maintenance: from 
updating
descriptions over searching to upgrading installed packages. It is most
desirable to also provide graphical tools (ncurses, X11, web), or at 
least
provide infrastructure so that third parties can easily develop such.

The system should be able to track various kinds of requirements and act
differently upon them: Build time requirements might easily be garbage 
collected
because they are not needed once packages are built; runtime 
requirements (e.g.
shared libraries) might not be in use any more and could thus be 
cleaned too
(compare portage's world file).

Concerning shared libraries/runtime requirements: When runtime or build 
time
(harder case) requirements are being updated it is not always needed to 
update
their dependants too. For example, this could be the case for security 
fixes in
shares libraries; if the shared object major version changes or a 
dependant is
linked statically to this shared library, this - of course - can't be 
applied.

Of course the system must provide an advanced requirement and collision 
system
which also provides room for meta requirements (MTA, web server, 
whatever;
compare to dpkg and portage). This also means ability to fuzzily 
specify version
numbers (>=2.0, everything but 1.4) and - where applicable - package 
flags (see
below).

It is also desirable that the system can dynamically include additional 
optional
requirements if the host system provides this (e.g. optional GNOME, 
IPv6 etc);
either automatically or semi-automatically. This choice could possibly
additionally be handled with package flag settings as described next.

A very strong must have is a unified package flags system. Ports 
provides
package flags (e.g. USE_LDAP), but these are not unified and per port 
only.
Specifying in make.conf helps a bit, but lacking a global registry this 
can be
painful. Portage provides a better way by use flags but one package 
only flags
are not handled correctly. There needs to be a (small sized and 
thoughtfully
selected) global flags registry which contains more that yes/no: On a 
server
system, I most certainly don't want any X11 stuff being sucked in when
installing a package, so "never ever" is a needed state. Sometimes I 
might not
want X11 support if optional but don't care when a package requiring X11
installs this too; this corresponds to a "better not" state. And, of 
course
there also is a "if it can support it, use it" state.

Packages themselves need the ability to use package local flags too. 
This might
be the case for e.g. subversion ("I don't want the DAV server stuff") 
or PHP
("Well, support this and that..."). Ports supports this but it's not 
unified and
not easy to use. Not everybody wants to more Makefile to find out all 
flags that
can be used, nor can this be used for recursive requirements. A unified 
system
is needed which allows the user to customize the packages in an easy 
(graphical,
for example) and unobtrusive (all questions asked before unattended
installation) way. Nevertheless, some users don't want these choices, 
they "just
want" some package, so there need to be sane defaults which will be 
used if the
user chooses not to answer any questions at all.

Package flags and binary packages don't really mix (compare portage: no 
binary
packages at all), so another feature is needed: package flavors (compare
OpenBSD's ports, I heard). A package flavor is a predefined, sane set 
of package
flags which can be automatically built into a binary package. This also 
allows
to give the user a kind of flexibility without the need to cope with all
possible flags.

All these flags of course need to be registered for all installed 
packages so
that this recorded preference is used in upgrades. If package 
flags/flavors
changed their meaning or got added/deleted it might be desirable for 
the user to
get asked to review the settings; if flags/flavors didn't change, the 
system
should be able to use old recorded settings.

The system should also be able to support split packages: Some packages
(especially X11) are so big so that it's highly desirable to split 
them. Ports
does this by creating several "independent" packages which just happen 
to use
the same source code. OpenBSD's ports natively produce several binary 
packages
for one port, as I heard. The way this is being implemented needs to be 
subject
of further discussion.

Debian's way of providing -dev packages which have been splitted off is 
always a
highly controversial point of discussion. This is why I want to comment 
on this
here. For a source based system as ports etc. having header files 
available is
unavoidable, and also when using binary packages the bloat due to 
header files
and static libs is small. Still there may be cases where every file 
must be
considered, so having the possibility to prune development files if 
feasible
might be a nice add-on. Same goes for foreign gettext language files 
etc. This
could all be implemented via global package flags. What I'm opposing is 
the
creation or use of additional packages for -dev headers/libs. The 
number of
distinct packages should be kept to a minimum.

It is desirable to have a way to import an individually defined set of 
packages
for easy deployment of multiple systems.

The system must both support building from source and installing from
precompiled binary packages equally good and be able to use building 
from source
as fallback method if binary packages are not available in an individual
configuration. Furthermore it should be easy to build binary packages 
for
installation on another system.

As a direct conclusion the system must have strong binary package 
distribution
support. In the past a lot of people demanded a streaming binary format 
to have
the ability to install packages straight whilst downloading without 
having to
wait for the whole package. This needs to be discussed further as 
installing
while downloading is even less atomic than installing after download 
which can
lead to other major problems.

A nice feature might be the availability of relative (binary) patchsets 
between
certain versions (individually selected) to reduce consumed bandwidth 
and
installation time. For binary patching systems see the bsdiff effort.

If possible, a nice addition would be the optional integration of 
installation
and build management of the base system. Together with an advanced and 
easy
binary update system this would lead to an unified system update 
mechanism - on
the cost of losing the clear border between system and third party 
products (as
it is the situation with ports at the moment). Package flags could 
easily be
used as a way to customize the base system, as we currently use 
-DNO_CRYPTO etc.
The advantage for the user is clear: System and third party products 
appear as
the same category; the OS isn't just kernel + some userland; it appears 
as
everything provided via the packaging system (see linux world).

The system must in any case provide different update strategies which 
need to be
selectable both globally and per package. This means: on a critical 
production
server, I don't want to upgrade my software (base system and third party
products as e.g. apache) unless there is a security problem (might even 
be
classified into local/remote root/DoS) or I need new features only 
provided by a
newer version. I'll call this way of updating the very conservative 
way. Other
users might upgrade every now and then to a new version which has been 
tested
and tagged as stable working (this can be different for various 
architectures or
OS). Some other power users might upgrade to every newly released 
version
because they don't care about minor instabilities. This update 
strategies need
not only to be selected for all available packages as a whole, the user 
needs to
have the ability to individually specify them for single packages or 
groups
thereof. Ports doesn't provide this - versions are dictated by 
committers;
portage provides this feature to some small amount (accept keywords). 
Debian
runs stable/testing/unstable versions.

For this to work properly, packages need to carry information about
vulnerabilities, new features etc. so that the admin can chose whether 
an
upgrade is needed or not. This shouldn't be a whole changelog, just a 
summary of
the most interesting changes.

The use of cryptographic signatures is a hard requirement. This must be
implemented for package descriptions and for binary packages. MD5/SHA1 
is no
cryptographic signature! This could mean an openssl requirement for the
packaging system itself or the need of implementing some cryptographic
functions. The distribution and extent of default trust of a 
certificate needs
to be discussed in this context too.

Another important aspect is a powerful build system. It should be 
possible that
multiple packages are being built at the same time and get synchronized 
for
installation etc. It's just PITA if you're compiling KDE or OpenOffice 
and can't
build/install a small package like mpg123 or irssi because this might 
damage the
package db. Existing systems handle such cases nice most of the time, 
but that's
just luck. Pkgsrc implements locking as far as I remember.

Another very nice to have is native distributed build support. This is 
very much
needed if one needs to install customized packages on a slow machine
(firewall/NAT etc) or does binary package building for distribution. 
Portage
provides this kind of service via distcc and it just plainly rocks. You 
can even
build OpenOffice in reasonable time with 10 boxen compiling :) Another
possibility is the use of distributed pmake.

Display of the build progress is a nice add on for users for sure. This 
can be
both implemented in a macroscopic way (x of y packages built) and 
microscopic
(anybody wanna hack make for SIGINFO?).

Speaking of compilation for slow boxes: Cross compilation comes to 
mind. Is this
needed when distcc support exists? Discussion point here.

As times get harder and it's common that the source/configure of major 
software
get compromised the system should include the possibility (hopefully as 
default)
to build packages either as non-root or in a chroot/jail (who needs 
network
access for builds anyways?). This - of course - needs VFS magic or else 
to map
requirements into the chroot.

It should be possible to build and install packages as an unprivileged 
user.
Sometimes local security policy or laziness of an admin demands the 
installation
of a package into the user's home dir. A nice point would be native 
support for
such in the packaging system. This doesn't mean that binary packages 
need to be
relocatable into home dirs, but the system would need to provide an 
alternative
(user home) location of package registry.

An essential duty of a packaging system is the tracking of installed 
files. It
must be an easy task to remove a package and thus all its installed 
files from
the system. The system needs to provide collision management (same file
installed by several packages, VFS voodoo?) and configuration file 
awareness
(see below). Compare with portage (automatic list building) and ports 
(ugly
manually generated plists).

All config files that might be potentially modified by users (read: 
all) need to
be treated in a special way: they may not be overwritten, yet new 
versions
shouldn't be discarded. There must be an easy way for the user to merge 
own
changes and upstream changes. If the config didn't change since last
modification the system should be intelligent enough to suppress 
obsolete merge
actions. On temporary deinstall of a package, existing config files 
shouldn't be
removed but on the user's request the system should be able to purge 
remaining
config files. Compare with port's .sample files and portage's config 
file
protection system (path bound, fails e.g. on TeX stuff).

The packaging system descriptions shouldn't consume too much space in 
general
and inodes in specific. It's just horrible to have a myriad of small 
files and
directories in your /usr (or whatever) wasting a big deal of inodes. 
This goes
for end users. Package maintainers could have a different view of the
description which could be collapsed into less files later. A possible 
approach
could be one description file per available package and version plus
approximately one patch file for each version (vs. patch-per-file in 
ports).

This leads to patches. As the system should aim to be OS agnostic in 
most parts
this also counts for patches. These should be specially crafted so that 
they at
best don't interfere with the build process on other platforms/OSes. 
This means
extensive use of #if defined(__MyOS__) etc.

This portability is the key to a close communication and development 
with the
upstream authors. It should be policy that patches are to be written as 
cleanly
as possible and have always to be submitted upstream. The packaging 
system might
provide help in or even enforce this process. Having patches go in 
upstream
reduces needed files, enhances overall acceptance of the packaging 
system and
also provides people not using the packaging system with features and 
fixes.

The system should provide support for bug tracking so that users can 
easily
check for known bugs and report new ones or add followup information for
existing ones. As the bug tracking system should be closely integrated 
with the
system, bugs need be associated with packages or specific package 
versions. This
helps maintainers and committers to follow user input better than the 
GNATS/CVS
decoupling ports is currently using.

It goes without saying that the packaging system implementation must 
only have
low/moderate requirements for needed tools and processing power. This 
means the
system should be buildable with only POSIX tools and a moderately new 
C/C++/ObjC
compiler. If a scripting language is being used it should be one of the 
really
popular ones: sh, perl, python, tcl.

The system should be able to bootstrap itself. This means it shouldn't 
depend on
the system tools be included with the host OS. If the system undergoes 
changes
the tools need to change too. As seen with ports several times having 
the tools
in the base OS only complicates stuff and leads to legacy issues (e.g. 
tbz/tgz
issue). Pkgsrc provides this in a nice way. Bootstrapping also means
registering/tracking installation of the package system (and 
requirements of
it). This seems like a chicken-and-egg problem but it can and should be 
solved.

Package descriptions must be easy to generate and easy to be used. This 
could
lead to different views of the packages - one maintainer side and one 
consumer
side - which can be converted into each other. For e.g. ports a big 
show stopper
concerning format conversion (remember pkg_info) and fast processing 
(see INDEX
generation) is the fact that Makefiles are indeed interpreted and not 
just
parsed. This means slowdown in processing and also problems in automatic
conversions - you never know how creative a maintainer was in (ab-) 
using make.
This leads me to one conclusion: Don't use an interpreted file format. 
Use a
standardized description format which only needs to get parsed. This is 
much
faster and more portable if multiple programs intend to work with the
descriptions. Not having a turing-complete language to use when writing 
a
description might at first need some change in thinking (when moving 
from ports
or portage) and will involve the need of writing more text/data (like 
no more
using ${variable:C/pattern/replacements/}) but will help overall 
cleanness.

To prevent the need of writing common things all over again and thus the
possibility of inconsistencies, the system needs to provide 
infrastructure to
group common settings as templates. I'll call it package classes for 
now. This
similar to portage's eclasses and port's special .mk files. Creation of 
these
classes shouldn't take place when only few consumers exist as too many 
existing
classes destroy cleanness and transparency. It must be possible for a 
package to
use more than one class at the same time.

Basic Principe: The last instance of decision is always the user - but 
she
shouldn't have to be in most cases.

. .. still to come:
{}
vim:tw=80:

--
/"\   http://corecode.ath.cx/#donate
\ /
 \     ASCII Ribbon Campaign
/ \  Against HTML Mail and News
Attachment:
PGP.sig
-------------- next part --------------
A non-text attachment was scrubbed...
Name: pgp00020.pgp
Type: application/octet-stream
Size: 186 bytes
Desc: "Description: This is a digitally signed message part"
URL: <http://lists.dragonflybsd.org/pipermail/kernel/attachments/20040226/6ce1bf7f/attachment-0017.obj>


More information about the Kernel mailing list