PIRDE pluck

DOC DATE

April Fools Day 1989, cLIeNUcized June 2000

NAME

pluck - 'pluck' lines from a file.

SYNOPSIS

pluck selfile < datafile [ > result ]

pluck selfile < datafile [ > result ]

DESCRIPTION

pluck uses a selection file to decide while lines to select from a datafile. It is very fast and is often used in place of select(P) or a PIRDE sort(P)/ join(P) combination.

pluck is useful when selecting according to values in a fixed column of the datafile. The data column of interest must be the first column of the data (see deal(P)). The selection must be based on numerical values (see also BUGS, below).

The first column of selfile is assumed to contain numbers corresponding to values to be selected from the datafile.

pluck does not require any sorting of the files. However, there is a limit on the largest number which can be selected.

Bpluck (big pluck) can handle more arbitrary numbers (even text) in the selection file, but in this case there is a limit on the size of the selection file. Bpluck requires the selection file to have only one column, the column of values to be selected.

SEE ALSO

reldb(P),  join(P),  sort(P),  select(P),  fgrep(1).

BUGS AND LIMITATIONS

PIRDE pluck works by reading the numbers in the selection file and inserting '1' into a vector for elements corresponding to values to be selected. The check to see if a line is to be selected therefore only requires a single lookup into a vector. This is extremely fast, even for large datafiles, but the dimension of the vector limits the numerical size of indices to be selected.

Bpluck overcomes the limitation of pluck, by using the selection file for building a command for egrep, to search for strings at the beginning of lines in the datafile. The shell on the user's system will limit the size of the search string, which implies a limit on the number of items in the selection file.

Note that fgrep(1) can be used with the same input as bpluck. This may alleviate some of the limitations of the number of search-items in bpluck, but fgrep will of course search for the search-strings anywhere in the data file, not just in the first column.