cLIeNUX Personal Integrated Relational Database Environment

DOC DATE

20000610

NAME

PIRDE - cLIeNUX Personal Integrated Relational Database Environment

USAGE

        example
        PIRDE

        interface
        PIRDE invokes a regular unix shell configured to emphasize use of the 
	compiled programs, scripts, special directories, shell subroutines
	and variables that comprise the cLIeNUX 
		Personal Integrated Relational Database Environment. 
	The various PIRDE commands use all the various unix file IO 
	redirection facilities such as pipes. 
	Signals and command return values don't get much use.  
	  
	pronounciation
	Weyll, ah says it lahk in "Ain't that PURDY?!?" muh self.

DESCRIPTION

PIRDE, the command

PIRDE is the database-oriented top-level entry point to the various features that make up or that are used by the cLIeNUX Personal Integrated Relational Database Environment. It spawns a regular shell with PATH, PS1 , various shell subroutines and so on defined to focus on the dataset in question. Enter "help" at the PIRDE prompt for more. PIRDE itself will be in the default cLIeNUX PATH, as will many regular unix commands PIRDE uses. PIRDE-specific commands may or may not be in the default PATH. PIRDE commands specific to a particular database type won't be in the cLIeNUX PATH, but can be asserted in the PIRDE shell with the PIRDE setup command. The default PIRDE setup is for email.

To get back to a regular shell from PIRDE do "exit", as per usual for leaving a sh-style shell. The Bash <tab><tab> trick of listing all commands in $PATH will tell you what commands you have in the PIRDE context. (Hi Liena! ;o)

PIRDE, the environment

PIRDE is a true relational database system, but that's about all. cLIeNUX PIRDE is inspired by reldb from volume 20 of the comp.sources.unix archive on gatekeeper.dec.com, circa 1989. PIRDE is a few tiny C programs for certain crucial relational database operations and a bunch of Bash, sed and awk scripts. This approach doesn't run real fast, but a lucid database schema minimizes wasted CPU activity up front, so the net result of the PIRDE approach is adequate performance for reasonable datasets, good portability, superb compactness, maintainability, versatility and almost no learning curve, at least as compared to a monster like SQL. Big slow tasks like generating an index database should be rare occurances. All in all, it's the right approach for personal databases on a real OS. The many common unix text utilities already present in cLIeNUX need very little coaxing to coalesce into a powerful relational database system. The main thing is designing a robust table format.
indirection
PIRDE builds on the reldb approach with several programs to indirect an arbitrary static source dataset via an index table. Consider a typical unix mailx-format email "folder" file. Let's call such a root dataset a "basis". A basis could also be a directory of binary files such as audio samples, animations, an entire filesystem, just about anything. A database for something like personal emails can use PIRDE format index tables to access the basis using the actual basis data without needing to modify it at all. That also allows the PIRDE format, which is a very simple and unix-friendly text format, to be used, independant of the nature of the basis, except for the basis-type-specific utility to generate the index table(s). A one meg test emails folder is referenced by a PIRDE index table of 10 or 20 kilobytes.

In other words, if you make a PIRDE index to something, a PIRDE table will be generated that the PIRDE commands do most manipulations on in conjunction with PIRDE commands to access the real data via those indices. The cLIeNUX fetch command is the crux of that indirection. You can also not use indexing if the data and problem are amenable to that.

personal
PIRDE lacks a lot of the functionality often associated with the phrase "relational database". It's just a database, not a multi-user parallel e-commerce server. I don't know that it can't become such a beast, but I wouldn't know what to feed it. Buffering, user permissions and so on are whatever Linux normally provides for commands and files. Your "query language" is a unix shell and set of unix commands and shell subroutines. cLIeNUX goes to some length to see that you already know that language. An SQL interface is something I do not anticipate attempting myself.

DATA FORMAT

tables
A relational database is composed of, creates, and otherwise munches on clearly defined things called tables. This gives a reasonably clear concept around which arbitrary data manipulations can be performed. Tables are composed of records and fields. All records in a particular table have the same number of fields, and the fields in all records are in the same order. A table can therefor be thought of as a two-dimensional array of columns and rows, the rows being the records and the fields within the records being the columns. A table carries a description of it's internal format in a header. The header of a PIRDE table is it's first 3 `lines'.
text fields
Records in PIRDE are newline-delimited. Fields within a record are tab-delimited. This means that a field may not contain tabs or newlines. This means the utility for editing a PIRDE header is your text editor. This further means that there are only two byte values that can't be in a PIRDE field, 9 and 10, which are ASCII for tab and unix newline. It's too annoying to accommodate arbitrary binary data in a field defined as "text", so PIRDE fields, and thus tables, and thus PIRDE databases as a whole, are best though of as containing "text". There is no escape mechanism to include tabs or newlines in fields, and I don't anticipate such a mechanism. The simplicity of PIRDE's format is worth the limitations, and can be side-stepped by indirecting a table to arbitrary basis data.

Records in PIRDE are basically the same as lines of text, and for the most part appear as such to unix text utils, including editors. Fields can be any length, and fields in a column may vary in length, or may be constrained to a constant length by the column `type'. Fields in an rdbms may be empty or null, but the "place-holder" for a particular field may not be absent from a record. You can't remove just a field from a record, although you can blank or null it out. You can remove just a record from a table.

A record has at least one field. In other words, within a PIRDE net table, two adjacent newlines is considered a record with one field which is null, or empty. The number of columns in an rdbms table is called it's rank, so PIRDE tables are of at least rank 1.

header
The first line of a PIRDE table header contains information external to or global to that gross table. The first string in line 1 is ".Pt", the PIRDE `magic' string. That's for use by other programs, like "sniff/file", as an example of what I mean by external to PIRDE. The lines of the header are also tab delimited, but the first header line may have any number of "fields" more than zero. Line 1 isn't actually fields, it's just delimited the same way, and the first line must have the .Pt field, and it must be first. Other info that may be appropriate for line 1 is whether or not the table has all regular-size records, a record length if it is a constant, whether or not the gross table is an index into an external basis, and so on.
field types
The second and third lines of the header do have the same number of tab-delimited strings as the rank of the table. They match the columns because they define the columns. The second line is the column names of the table, and is therefor also the names of the fields within each record. The third line of the header indicates the data size types of each column/field. There are two types of fields in PIRDE, constant-size and variable. Null, blank, or un-typed means "any size string". An un-typed string may be any length. A table with one or more un-typed columns is "irregular". A regular table can be traversed more quickly in some circumstances.

The sizes of constant-width columns in a table is indicated directly by the number of characters in that column-place in line 3. That is, a constant-width 10-character column will have a 10-character field in line 3.

The HEXOFF type is a hexadecimal file offset. A HEXOFF field is usually affiliated with a file specifier field to identify a source data file external to the table. A HEXOFF field is 8 characters. A filename string and one or two file offsets in hexadecimal is how "segment" expects it's arguments. This keeps segment as efficient as possible for a discrete command.

That is all the defined data in a header, 3 lines.

net table
The net table file of a PIRDE table is as described by the type specifications in the header. The net table has the same number of tabs in every line, and no metadata other than the delimiters.
example
Here are the contents of a small table, shown ala get -t, so that tabs are visible as "\t" and newlines are visible as "$". This tiny table is of rank 3, and has columns named "file", "mailx tag", and "body". This was the format of an early version of an index table into a mail folder. The types of the columns are all variable-width or "any string", although they could be sized. They aren't in this example.

	(mail.rdth)
		3$
		file\tmailx tag\tbody$
		\t\t$
		mail \t00000000\t000000f4$
		mail \t000001fa\t0000038d$
		mail \t00000d92\t00000f2b$

In this case "mail" is the file the other two columns hold indexes to. The other two columns are strings representing hexadecimal offsets into the mail file at certain points in individual email messages. These are the HEXOFF type, which is how segment wants it's arguments. By the time you read this the PIRDE email setup will probably be 3 tables called "messages", "people" and "contacts", with various internal streamlining and featurisms.

COMPONENT COMMANDS

PIRDE is composed of discrete commands, and the PIRDE-configured Bash shell. The basic PIRDE commands are
	check
	deal
	select
	between
	sort
	append
	and fetch
Of those, deal is probably the one that least resembles any existing unix text utility. It's similar to SQL "view" or reldb "project". It inputs a table, takes column number arguments, and outputs a table with the columns in the new table derived from the original as specified in the args.

fetch implements the PIRDE indexing technique. It is the fetcher of arbitrary data from files that aren't PIRDE tables. It's lean and way mean. fetch is intended to work in concert with utilities for basis data of various types that generate index tables into such basiis. The initial basis type of PIRDE is the unix mailx format mail folder.

Aside from deal and segment, PIRDE-specific commands are analagous to, and may be simple wrappers for, unix text utils, but with some handling for PIRDE tables instead of plain text files. append is like get/cat, paste and sort are like thier unix namesakes, and select is grep with a tables wrapper.

It works the other way too. deal and fetch will probably exist in plain-file forms in cLIeNUX as well as in .Pt-format versions.

Other PIRDE-oriented commands and seedocs include check, merge, and just about any text utility.

RIGHTS

Copyright 2000 Rick Hohensee
This document is released for redistribution only as part of an intact entire cLIeNUX Core.