This You Gotta Understand
You're using cLIeNUX, which is a Linux/GNU/unix, and is thus well within
the broader definitions of unix, and there are several important
differences between unices and other systems that you REALLY REALLY REALLY
want to know. Some of this stuff may seem mundane, but that's because unix
has made it so. Unix has a reputation for requiring a wizard. This has a
great deal of truth to it, but not because these ideas themselves are so
difficult. They aren't. What's difficult is how it all fits together.
Hopefully this document helps with that, and hopefully cLIeNUX will help
you put it all together like you want it.
True multi-user
Unix allows numerous users to use the same system, the same CPU,
disks, and so on, without being any risk to each other's data. How a
particular system is configured may compromise that, but it is normally
possible to allow arbitrary users access to a box and not have chaos. This
is because unix requires rather sophisticated hardware that keeps each
user utterly sequestered in thier own virtual machine. This is why real
unix didn't happen on PC's 'til the 386. User access is controlled by
login, getty and friends. Once in, a user is controlled
by the fact that all processes, i.e. running programs, are subject to the
controls on the user that started them.
A file is a named sequence of arbitrary bytes
This seems mundane, but many other ways of dealing with data have
fallen by the wayside, and continue to do so. This concept keeps
everything simple, and is very important to the next item. The "file", a
named arbitrary thing, is the basic object of unix, and that's very
high-level and general.
Everything is a file
A user's access to system devices, and pseudo-devices like
/dev/random, network sockets, pipe special files, and regular files is all
in one namespace, under one consistant input-output protocol.
Everything is in one namespace under /
This leads to some long filenames, but the consistency of
/dir/dir/dir/file naming is easily a net win versus device:dir/dir/file
naming. This is particularly true when "/file" may be a device, a socket,
a pipe, and so on. The ability to have everything in one name hierarchy is
due to the concept of mounting a filesystem, and the mount
command. In other words, you don't have to name files by what device they
reside on, as is Dos. This extends to network filesystems, such that you
don't have to care what machine a file is on, you just have to be able to
get around a directory structure. One directory structure.
All processes' inputs and outputs are redirectable
The shell's redirection operators illustrate this.
If you do
get file
in cLIeNUX, "file" is "concatenated to the terminal". That's
because the default standard output of a command is the terminal it was
started from. If you then do
get * > ../bundle
for example, the shell will expand "*" to "every file in the
current directory", concatenate them all into one stream of bytes, and
output it to "bundle" in the parent directory. This is a lot of activity
for the two keystrokes "*" and ">". Also, "get" doesn't know anything at
all about this. This is done by the calling process, the shell in this
case, and the kernel.
The ">" part is the point here. The inputs and outputs of all
commands are all redirectable by the process invoking them. Any command.
If you have a reason to redirect the inputs and outputs of Netscape or an
X server or something, you can do so. It's just another unix tool. When
the kernel starts a process, it opens 3 file descriptors for it. Whether a
command uses them or not, they all have "stdin",
"stdout", and "stderr". Those are just defaults. A
command can open other file too, and if the caller knows about them, they
also can be redirected. This is confusing, because commands vary by which
standard inputs they use, but it's really very simple overall, and means a
unix user space is one big collection of interlocking parts, like plumbing
or something.
children don't modify parents
All kinds of states of things get inherited from each other in unix,
but it's all one-way. Children get thier open files, current directory,
everything from there parents. The parents get one integer back from the
child saying "I'm child #BLA". That keeps everything in order, keeps
users secure from each other, and so on. This is the reasoning behind
the behavior of the shell when invoking other shells, scripts, and
commands. The Bourne-style shell "." source operator means "run this
script as yourself, not as a child".
All user-space processes are children of init, which is process ID 1.
When a process becomes orphaned, init adopts it. Processes and
sub-processes thus form an inheritance tree rooted at init analagous to
the directory tree structure rooted at "/". The potential massiveness of
unix is kept in order this way, and makes a lot of sense once you see the
regularity in it. At any point in the process tree, the same parent-child
relationships hold, just as the directory structure is the same from any
particular sub-directory.
Implement mechanism, not policy
Unix tools and the documentation for them generally go to great pains
to not assume what you intend to do with them. This is bad for user
friendliness, but it is a net win, again, for power, maintainability and
interoperability. Very simply, unix doesn't tell you what to do. I
personally like that kind of attitude from a machine.
Regular expressions are an example of this. Why harrass an innocent user with
a phrase like "regular expressions" for a "match pattern"? BeCAUSE, a regular
expression is a formally defined type of matching pattern, that is known to
be very general, and continues to be wildly useful. Other less well-defined
"wildcards" either won't be as powerful, or will take MORE explaining for the
same usefulness.
Use the source
AT&T wasn't allowed to sell computer software back when UNIX(TM) was
developed, so they gave it away. Lots of colleges had all the source.
This meant that UNIX was, and the various unices remain, hotbeds of
development.
OK, you have things to do, and being a C weenie isn't one of them. Fine.
Having source is a fundamental advantage you want to have, even if you
don't ever want to see it on your screen. Source is the ultimate
documentation. It *is* the machine, not a description of it. That can be
priceless. Also, there's lots of info in sourcecode you can use without
comprehending every single glyph. If that wasn't the case computer
programming would have ceased completely years ago.
Good code is portable code
UNIX was the first major OS to be written in other than assembly
language, and C was written basically to make UNIX portable. C is based on
BCPL, which was notable at the time for allowing what is called
"recursion". Recursion is when a section of code named, say, BLA, says,
"OK, do BLA." It's "calling itself". Most programming languages around
1970 were designed so that recursion wouldn't happen, because it's very
tricky, usually doesn't work, and can always be done by some other means.
However, recursion is very useful for programming things that nest like
the unix process and filesystem trees. The vision the boys at Bell Labs
had when they decided on a recursion-capable system language included
things like 20 nested shells each acting like it's own little unix, and
filesystems without physical boundries. In a sense the Internet is
the filesystem they had in mind.
Portability is durability. Linux runs on, what, 8 completely different
CPU architectures? NT runs on one CPU family, yes? They *lost* the DEC
Alpha, yes? See the above ranting and raving about modularity,
simplicity, etc.
RIGHTS
Copyright 2000 Rick Hohensee
This file is released for redistribution only as part of an intact entire
cLIeNUX Core.