cLIeNUX Personal Integrated Relational Database Environment
DOC DATE
20000610
NAME
PIRDE - cLIeNUX Personal Integrated Relational Database Environment
USAGE
example
PIRDE
interface
PIRDE invokes a regular unix shell configured to emphasize use of the
compiled programs, scripts, special directories, shell subroutines
and variables that comprise the cLIeNUX
Personal Integrated Relational Database Environment.
The various PIRDE commands use all the various unix file IO
redirection facilities such as pipes.
Signals and command return values don't get much use.
pronounciation
Weyll, ah says it lahk in "Ain't that PURDY?!?" muh self.
DESCRIPTION
PIRDE, the command
PIRDE is the database-oriented top-level entry point to the
various features that make up or that are used by the cLIeNUX Personal
Integrated Relational Database Environment. It spawns a regular shell
with
PATH,
PS1 ,
various shell subroutines and so on defined to focus on the dataset in
question. Enter "help" at the PIRDE prompt for more. PIRDE itself will be
in the default cLIeNUX PATH, as will many regular unix commands PIRDE
uses. PIRDE-specific commands may or may not be in the default PATH. PIRDE
commands specific to a particular database type won't be in the cLIeNUX
PATH, but can be asserted in the PIRDE shell with the PIRDE setup
command. The default PIRDE setup is for email.
To get back to a regular shell from PIRDE do "exit", as per usual for
leaving a sh-style shell. The Bash <tab><tab> trick
of listing all commands in $PATH will tell you what commands you have in
the PIRDE context. (Hi Liena! ;o)
PIRDE, the environment
PIRDE is a true relational database system, but that's about all. cLIeNUX
PIRDE is inspired by reldb from volume 20 of the comp.sources.unix archive
on gatekeeper.dec.com, circa 1989. PIRDE is a few tiny C programs for
certain crucial relational database operations and a bunch of
Bash, sed and
awk
scripts. This approach doesn't run real fast, but a lucid database schema
minimizes wasted CPU activity up front, so the net result of the PIRDE
approach is adequate performance for reasonable datasets, good
portability, superb compactness, maintainability, versatility and almost
no learning curve, at least as compared to a monster like SQL. Big slow
tasks like generating an index database should be rare occurances. All in
all, it's the right approach for personal databases on a real OS. The many
common unix text utilities already present in cLIeNUX need very little
coaxing to coalesce into a powerful relational database system. The main
thing is designing a robust table format.
indirection
PIRDE builds on the reldb approach with several programs to indirect an
arbitrary static source dataset via an index table. Consider a typical
unix mailx-format email "folder" file. Let's call such a root dataset a
"basis". A basis could also be a directory of binary files such as audio
samples, animations, an entire filesystem, just about anything. A database
for something like personal emails can use PIRDE format index tables to
access the basis using the actual basis data without needing to modify it
at all. That also allows the PIRDE format, which is a very simple and
unix-friendly text format, to be used, independant of the nature of the
basis, except for the basis-type-specific utility to generate the index
table(s). A one meg test emails folder is referenced by a PIRDE index
table of 10 or 20 kilobytes.
In other words, if you make a PIRDE index to something, a PIRDE table
will be generated that the PIRDE commands do most manipulations on in
conjunction with PIRDE commands to access the real data via those indices.
The cLIeNUX
fetch
command is the crux of that indirection. You can also not use indexing if
the data and problem are amenable to that.
personal
PIRDE lacks a lot of the functionality often associated with the phrase
"relational database". It's just a database, not a multi-user parallel
e-commerce server. I don't know that it can't become such a beast, but I
wouldn't know what to feed it. Buffering, user permissions and so on are
whatever Linux normally provides for commands and files. Your "query
language" is a unix shell and set of unix commands and shell subroutines.
cLIeNUX goes to some length to see that you already know that language. An
SQL interface is something I do not anticipate attempting myself.
DATA FORMAT
tables
A relational database is composed of, creates, and otherwise munches on
clearly defined things called tables. This gives a reasonably clear
concept around which arbitrary data manipulations can be performed. Tables
are composed of records and fields. All records in a particular table have
the same number of fields, and the fields in all records are in the same
order. A table can therefor be thought of as a two-dimensional array of
columns and rows, the rows being the records and the fields within the
records being the columns. A table carries a description of it's internal
format in a header. The header of a PIRDE table is it's first 3 `lines'.
text fields
Records in PIRDE are newline-delimited. Fields within a record are
tab-delimited. This means that a field may not contain tabs or newlines.
This means the utility for editing a PIRDE header is your text editor.
This further means that there are only two byte values that can't be in a
PIRDE field, 9 and 10, which are ASCII for tab and unix newline. It's too
annoying to accommodate arbitrary binary data in a field defined as
"text", so PIRDE fields, and thus tables, and thus PIRDE databases as a
whole, are best though of as containing "text". There is no escape
mechanism to include tabs or newlines in fields, and I don't anticipate
such a mechanism. The simplicity of PIRDE's format is worth the
limitations, and can be side-stepped by indirecting a table to arbitrary
basis data.
Records in PIRDE are basically the same as lines of text, and for the
most part appear as such to unix text utils, including editors. Fields can
be any length, and fields in a column may vary in length, or may be
constrained to a constant length by the column `type'. Fields in an rdbms
may be empty or null, but the "place-holder" for a particular field may
not be absent from a record. You can't remove just a field from a record,
although you can blank or null it out. You can remove just a record from a
table.
A record has at least one field. In other words, within a PIRDE net
table, two adjacent newlines is considered a record with one field which
is null, or empty. The number of columns in an rdbms table is called it's
rank, so PIRDE tables are of at least rank 1.
header
The first line of a PIRDE table header contains information external to
or global to that gross table. The first string in line 1 is ".Pt",
the PIRDE `magic' string. That's for use by other programs, like
"sniff/file", as an example of what I mean by external to PIRDE. The
lines of the header are also tab delimited, but the first header line may
have any number of "fields" more than zero. Line 1 isn't actually fields,
it's just delimited the same way, and the first line must have the .Pt
field, and it must be first. Other info that may be appropriate for line 1
is whether or not the table has all regular-size records, a record length
if it is a constant, whether or not the gross table is an index into an
external basis, and so on.
field types
The second and third lines of the header do have the same number of
tab-delimited strings as the rank of the table. They match the columns
because they define the columns. The second line is the column names of
the table, and is therefor also the names of the fields within each
record. The third line of the header indicates the data size types of each
column/field. There are two types of fields in PIRDE, constant-size and
variable. Null, blank, or un-typed means "any size string". An un-typed
string may be any length. A table with one or more un-typed columns is
"irregular". A regular table can be traversed more quickly in some
circumstances.
The sizes of constant-width columns in a table is indicated directly by
the number of characters in that column-place in line 3. That is, a
constant-width 10-character column will have a 10-character field in line
3.
The HEXOFF type is a hexadecimal file offset. A HEXOFF field is
usually affiliated with a file specifier field to identify a source data
file external to the table. A HEXOFF field is 8 characters. A filename
string and one or two file offsets in hexadecimal is how "segment" expects
it's arguments. This keeps segment as efficient as possible for a discrete
command.
That is all the defined data in a header, 3 lines.
net table
The net table file of a PIRDE table is as described by the
type specifications in the header. The net table has the same number of
tabs in every line, and no metadata other than the delimiters.
example
Here are the contents of a small table, shown ala get -t, so that
tabs are visible as "\t" and newlines are visible as "$". This tiny table
is of rank 3, and has columns named "file", "mailx tag", and "body". This
was the format of an early version of an index table into a mail folder.
The types of the columns are all variable-width or "any string", although
they could be sized. They aren't in this example.
(mail.rdth)
3$
file\tmailx tag\tbody$
\t\t$
mail \t00000000\t000000f4$
mail \t000001fa\t0000038d$
mail \t00000d92\t00000f2b$
In this case "mail" is the file the other two columns hold indexes to.
The other two columns are strings representing hexadecimal offsets into
the mail file at certain points in individual email messages. These are
the HEXOFF type, which is how segment wants it's arguments. By
the time you read this the PIRDE email setup will probably be 3 tables
called "messages", "people" and "contacts", with various internal
streamlining and featurisms.
COMPONENT COMMANDS
PIRDE is composed of discrete commands, and the PIRDE-configured Bash shell.
The basic PIRDE commands are
check
deal
select
between
sort
append
and fetch
Of those, deal is probably the one that least resembles any existing
unix text utility. It's similar to SQL "view" or reldb "project".
It inputs a table, takes column number arguments, and outputs a table
with the columns in the new table derived from the original as specified
in the args.
fetch implements the PIRDE indexing technique. It is the
fetcher of arbitrary data from files that aren't PIRDE tables. It's lean
and way mean. fetch is intended to work in concert with
utilities for basis data of various types that generate index tables into
such basiis. The initial basis type of PIRDE is the unix mailx format mail
folder.
Aside from deal and segment, PIRDE-specific commands are
analagous to, and may be simple wrappers for, unix text utils, but with some
handling for PIRDE tables instead of plain text files. append is
like get/cat, paste and sort are like thier unix
namesakes, and select is grep with a tables wrapper.
It works the other way too. deal and fetch will
probably exist in plain-file forms in cLIeNUX as well as in .Pt-format
versions.
Other PIRDE-oriented commands and seedocs include check, merge, and just about any
text utility.
RIGHTS
Copyright 2000 Rick Hohensee
This document is released for redistribution only as part of an intact
entire cLIeNUX Core.