head 1.1; branch 1.1.1; access ; symbols MAXIMUM_RPM_1_0:1.1.1.1 VENDOR:1.1.1; locks ; strict; comment @# @; 1.1 date 2001.08.28.12.07.08; author rse; state Exp; branches 1.1.1.1; next ; 1.1.1.1 date 2001.08.28.12.07.08; author rse; state Exp; branches ; next ; desc @@ 1.1 log @Initial revision @ text @
To answer that question, let's go back to the basics for a moment. Computers process information. In order for this to happen, there are some prerequisites:
Unless these three things come together very little is going to happen, information processing-wise. But each of these items have their own requirements that need to be satisfied before things can get exciting.
Take the computer, for example. While it needs things like electricity and a cool, dry place to operate, it also needs access to the other two items -- information and programs -- in order to do its thing. The way to get information and programs into a computer is to place them in the computer's mass storage. These days, mass storage invariably means a disk drive. Putting information and programs on the disk drive means that they are stored as files. So much for the computer's part in this.
OK, let's look at the information. Does information have any particular needs? Well, it needs sufficient space on the disk drive, but more importantly, it needs to be in the proper format for the program that will be processing it. That's it for information.
Finally, we have the program. What does it need? Like the information, it needs sufficient disk space on the disk drive. But there are many other things that it may need:
As you can imagine, this can get pretty complicated. It's not so bad once everything is set up properly, but how do things get set up properly in the first place? There are two possibilities:
If it seems like the first choice isn't so bad, consider how many files you'll need to keep track of. On a typical Linux system, it's not unusual to have over 20,000 different files. That's a lot of documentation reading, file copying, and configuring! And what happens when you want a newer version of a program? More of the same!
Some people think the second alternative is easier. RPM was made for them.
When you consider that computers are very good at keeping track of large amounts of data, the idea of giving your computer the job of riding herd over 20,000 files seems like a good one. And that's exactly what package management software does. But what is a ``package''?
A package in the computer sense is very similar to a package in the physical sense. Both are methods of keeping related objects together in the same place. Both need to be opened before the contents can be used. Both can have a ``packing slip'' taped to the side, identifying the contents.
Normally, package management systems take all the various files containing programs, data, documentation, and configuration information, and place them in one specially formatted file -- a package file. In the case of RPM, the package file is sometimes called a ``package'', a ``.rpm file'', or even an ``RPM''. All mean the same thing -- a package containing software meant to be installed using RPM.
What types of software are normally found in a package? There are no hard and fast rules, but normally a package's contents consist of one of the following types of software:
One of the most obvious benefits to having a package is that the package is one easily manageable chunk. If you move it from one place to another, there's no risk of any part getting left behind. But although this is the most obvious advantage, it's not the biggest one.
The biggest advantage is that the package can contain the knowledge about what it takes to install itself on your computer. And if the package contains the steps required to install itself, the package can also contain the steps required to uninstall itself. What used to be a painful manual process is now a straightforward procedure. What used to be a mass of 20,000 files becomes a couple hundred packages.
A couple hundred? Even though the use of packages has decreased the complexity of managing a system by an order of magnitude, it hasn't yet gotten to the level of being a ``no-brainer''. It's still necessary to keep track of what packages are installed on your system. And if there are some packages that require other packages in order to install or operate correctly, these should be tracked as well.
If you start looking at a computer system as a collection of packages, you'll find that a distinct set of operations will take place on those packages time and time again:
With this much activity going on, it's easy to lose track of things. What types of package information should be available to keep you informed?
Just as there are certain operations that are performed on packages, there are also certain types of information that will make it easier to make sense of the packages installed on your system:
Well, all that sounds great -- easy install, upgrade, and deletion of packages; getting package information presented several different ways; making sure packages are installed correctly; and even tracking changes to config files. But how do you do it?
As mentioned above, the obvious answer is to let the computer do it. Many groups have tried to create package management software. There are two basic approaches:
Each approach has its good and bad points. In the first method, it's easy to install new packages, somewhat difficult to remove old ones, and almost impossible to obtain any meaningful information about installed packages.
The second method makes it easy to obtain information about installed packages, and fairly easy to install and remove packages. The main problem using this method is that there may not be a well-defined way to execute any commands required during the installation or removal process.
In practice, no package management system uses one approach or the other -- all are a mixture of the two. The exact mix and design goals will dictate how well a particular package management system meets the needs of the people using it. At the time Red Hat Software started work on their Linux distribution, there were a number of package management systems in use, each with a different approach to making package management easier.
Since this is a book on the Red Hat Package Manager, a good way to see what RPM is all about is to look at the package management software that preceded RPM.
RPP was used in the first Red Hat Linux distributions. Many of RPP's features would be recognizable to anyone who has worked with RPM. Some of these innovative features are:
While RPP possessed several of the features that were important enough to continue on as parts of RPM today, it had some weaknesses, too:
Even with these problems, RPP was one of the things that made the first Red Hat Linux distributions unique. Its ability to simplify the process of installing software was a real boon to many of Red Hat's customers, particularly those with little experience in Linux.
While Red Hat Software was busy with RPP, another group of Linux devotees were hard at work with their package management system. Known as PMS, its development, lead by Rik Faith, attacked the problem of package management from a slightly different viewpoint.
Like RPP, PMS was used to package a Linux distribution. This distribution was known as the BOGUS distribution, and all the software in it was built from original unmodified sources. Any changes that were required were patched in during the processing of building the software. This is the concept of ``pristine sources'' and is PMS's most important contribution to RPM. The importance of pristine sources can not be overstated. It allows the packager to quickly release new version of software, and to immediately see what changes were made to the software.
The chief disadvantages of PMS were weak querying ability, no package verification, no multiple architecture support, and poor database design.
Later, Rik Faith and Doug Hoffman, working under contract for Red Hat Software, produced PM. The design combined all the important features of RPP and PM, including one command installation and uninstallation, scripts run before and after installation and uninstallation, package verification, advanced querying, and pristine sources. However it retained RPP's and PM's chief disadvantages: weak database design and no support for multiple architectures.
PM was very close to a viable package management system, but it wasn't quite ready for prime time. It was never used in a commercially available product.
With two major forays into package management behind them, Marc Ewing and Erik Troan went to work on a third attempt. This one would be called the Red Hat Package Manager, or RPM.
Although it built on the experiences of PM, PMS, and RPP, RPM was quite different under the hood. Written in the Perl programming language for fast development, the creation of RPM version 1 focused on addressing the flaws of its ancestors. In some cases, the flaws were eliminated, while in others, the problems remained.
Some of the successes of RPM version 1 were:
But RPM version 1 wasn't perfect. There were a number of flaws, some of them major:
Even though their Linux distribution was a success, and RPM was much of the reason for it, Marc and Erik knew that some changes were going to be necessary to carry RPM to the next level.
Looking back on their experiences with RPM version 1, Marc and Erik made a major change to RPM's design: They rewrote it entirely in C. This did wonderful things to RPM's speed and size. Querying the database was quicker now, and there was no need to have Perl around just to do package management.
In addition, the database format was redesigned to improve both performance and reliability. Displaying package information can take as little as a tenth of the time spent in RPM version 1, for example.
Realizing RPM's potential in the non-Linux arena, they also created rpmlib, a library of RPM routines that allow the use of RPM functionality in other programs. RPM's ability to function on more than one architecture was also enhanced. Finally, the package file format was made more extensible, clearing the way for future enhancements to RPM.
So is RPM perfect? No program can ever reach perfection, and RPM is no exception. But as a package manager that can run on several different types of systems, RPM has a lot to offer, and it will only get better. Let's take a look at the design criteria that drove the development of RPM.
The design goals of RPM could best be summed up with the phrase ``something for everyone''. While the main reason for the existence of RPM was to make it easier for Red Hat Software to build the several hundred packages that comprised their Linux distribution, it was not the only reason RPM was created. Let's take a look at the various requirements the Red Hat team used in their design of RPM:
As we've seen earlier in this chapter, the act of installing a package can involve many complex steps. Entrusting these steps to a person who may not have the necessary experience is a strategy for failure. So the goal for RPM was to make it as easy as possible for anyone to install packages. The same holds true for removing packages. It is a complex and error-prone operation, and one that RPM should handle for the user.
The other side of this issue is that RPM should give the package builder almost total control in terms of how the package is installed. The reason for this is simple: if the package builders do their homework, their package should install and uninstall properly.
Because software problems are a fact of life, the ability to verify the proper installation of a package is vital. If done properly, it should be possible to catch a variety of problems, including things such as missing or modified files.
While we're dedicating an entire book to package management, in reality it should be a small portion of the package builder's job. Why? They've got better things to do! If they are the people that are actually creating the software to be packaged, that's where they should be spending the majority of their time.
Even if the package builder isn't actually writing software, they still have better things to do than worry about building packages. For instance, they may be responsible for building many packages. The less time spent on building an individual package translates to more packages that can be built.
Delving a bit more into the package builder's world, it was deemed important that RPM start with the original, unmodified source code. Why is this so important?
Using the original sources makes it possible to separate the changes required to build the package from any changes implemented to fix bugs, add new features, or anything else. This is a good thing for package builders, since many of them are not the original authors of the programs they package.
This separation makes it easy, months down the road, to know exactly what changes were made in order to get the package to build. This is important when a new version of the packaged software becomes available. Many times it's only necessary to apply the original ``package building'' changes to the newer software. At worst, the changes provide a starting point to determine what sorts of things might need to be changed in the new version.
One of the tougher things for a package builder to do is to take a program, make it run on more than one type of computer, and distribute packages for each. Because RPM makes it easy to take a program's original source code, add the changes necessary to get it to build, and produce a package for each architecture in one step, it can be pretty handy.
With all the magical things we've claimed that package management software in general (and RPM in particular) can do, you'd think there was a tiny computer guru bundled in every package. However, the reality is not that magical. Here's a quick overview of the more important parts of an RPM package[fnsymbol{footnote}].
Every package built for RPM has to have a specific set of information that uniquely identifies it. We call this information a package label. Here are two sample package labels:
While these labels look like they have very little in common, in fact they all follow RPM's package labelling convention. There are three different components in every package label. Let's look at each one in order:
Every package label begins with the name of the software. The name may be derived from the name of the application packaged, or it may be a name describing a group of related programs bundled together by the package builder. The software names in the packages listed above are: nls and perl. As you can see, the software name is separated from the rest of the package label by a dash.
Next in the package label is an identifier that describes the version of the software being packaged. If the package builder bundled a number of related programs together, the software version is probably a number of their own choosing. However, if the package consists of one major application, the software version normally comes directly from the application's developer. The actual version specification is quite flexible, as can be seen in the examples above. The versions shown are: 1.0 and 5.001m. A dash separates the software version from the remainder of the package label.
The package release is the most unambiguous part of a package label. It is a number chosen by the package builder. It reflects the number of times the package has been rebuilt using the same version software. Normally, the rebuilds are due to bugs uncovered after the package has been in use for a while. By tradition, the package release starts at 1. The package releases in the example above are: 1 and 4.
Package labels are used internally by RPM. For example, if you ask RPM to list every installed package, it will respond with a list of package labels. When a package file is created, part of the filename consists of the package label. There is no technical requirement for this, but it does make it easier to keep track of things.
However, a package file may be renamed, and the new filename won't confuse RPM in the least. That's because the package label is contained within the file. For a fairly technical view of the inside of a package file, refer to Appendix [*].
Some of the information contained in a package is general in nature. This information includes such items as:
Each package also contains information about every file contained in the package. The information includes:
To summarize, a package management system uses the computer to keep track of all the various bits and pieces that comprise an application or an entire operating system. Most package management systems use a specially formatted file to keep everything together in a single, easily manageable entity, or package. Additionally, package management systems tend to provide one or more of the following functions:
RPM has been designed with Red Hat Software's past package management experiences in mind. PM and RPP provided most of these functions with varying degrees of success. Marc Ewing and Erik Troan have worked hard to make RPM better than its predecessors in every way. Now it's time to see how they did, and learn how to use RPM!