tr

seedoc date 19990412

NAME

tr - versatile ASCII character-by-character exchanger/deleter

SYNOPSIS

tr [options] string1 [string2] <_standard_redirection_>
options The string1 and string2 arguments are simply ordered sets of characters, with some tr-specific aliases.

cLIeNUX note

I was writing this seedoc and decided tr would make a lot more sense as three separate commands. cLIeNUX now has "delete", "exchange" and "squeeze", in addition to tr. They're easier to understand, although, at this writing, they still give on-line help for tr. By the time you read this they should each have thier own seedoc. They other three are "commandname aliases" of the original tr. The commandnames actually invoke the tr code with particular options, which saves code. This also means that this seedoc is still authoratative as to the internals of all four variants.

DESCRIPTION

tr performs character-by-character conversions between standard input and standard output based on one or two ordered sets of characters specified as string arguments. The three basic conversions available are

With the delete switch only one string argument is given. If translating rather than deleting, the second set defines the character-by-character translation to be performed.

The -s and -d options tell tr which of the following operations to perform:

The --complement (-c) option replaces set1 with its complement (all of the ASCII characters that are not in set1).

SPECIFYING SETS OF CHARACTERS

string1 and string2 are lists of characters. They are not typical regular expressions or globbing patterns. Most characters simply represent themselves in these strings, but the strings can contain the following tr-specific aliases. (Some of them can be used only in string1 or string2, as noted.)

Backslash escapes. A backslash followed by a character not listed below causes an error message.

\\
A backslash.
\a
Control-G.
\b
Control-H.
\f
Control-L.
\n
Control-J.
\r
Control-M.
\t
Control-I.
\v
Control-K.
\ooo
The character with the value given by ooo, which is 1 to 3 octal digits.

Ranges. The notation `m-n' expands to all of the characters from m through n, in "asciibetical" order. m should collate before n; if it doesn't, an error results. As an example, `0-9' is the same as `0123456789'.

Repeated characters. The notation `[c*n]' in string2 expands to n copies of character c. Thus, `[y*6]' is the same as `yyyyyy'. The notation `[c*]' in string2 expands to as many copies of c as are needed to make set2 as long as set1. If n begins with a 0, it is interpreted in octal, otherwise in decimal.

Character classes. The notation `[:class-name:]' expands to all of the characters in the (predefined) class named class-name. The characters expand in no particular order, except for the `upper' and `lower' classes, which expand in ascending order. When the --delete (-d) and --squeeze-repeats (-s) options are both given, any character class can be used in string2. Otherwise, only the character classes `lower' and `upper' are accepted in string2, and then only if the corresponding character class (`upper' and `lower', respectively) is specified in the same relative position in string1. Doing this specifies case conversion. The class names are given below; an error results when an invalid class name is given.

alnum
Letters and digits.
alpha
Letters.
blank
Horizontal whitespace.
cntrl
Control characters.
digit
Digits.
graph
Printable characters, not including space.
lower
Lowercase letters.
print
Printable characters, including space.
punct
Punctuation characters.
space
Horizontal or vertical whitespace.
upper
Uppercase letters.
xdigit
Hexadecimal digits.

TRANSLATING

tr performs translation when string1 and string2 are both given and the --delete (-d) option is not given. tr translates each character of its input that is in set1 to the corresponding character in set2. Characters not in set1 are passed through unchanged. When a character appears more than once in set1 and the corresponding characters in set2 are not all the same, only the final one is used. For example, these two commands are equivalent:

tr aaa xyz
tr a z

A common use of tr is to convert lowercase characters to uppercase. This can be done in many ways. Here are three of them:

tr abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ
tr a-z A-Z
tr '[:lower:]' '[:upper:]'

When tr is performing translation, set1 and set2 should normally have the same length. If set1 is shorter than set2, the extra characters at the end of set2 are ignored.

On the other hand, making set1 longer than set2 is not portable; POSIX.2 says that the result is undefined. In this situation, the BSD tr pads set2 to the length of set1 by repeating the last character of set2 as many times as necessary. The System V tr truncates set1 to the length of set2.

By default, GNU tr handles this case like the BSD tr does. When the --truncate-set1 (-t) option is given, GNU tr handles this case like the System V tr instead. This option is ignored for operations other than translation.

Acting like the System V tr in this case breaks the relatively common BSD idiom:

tr -cs A-Za-z0-9 '\012'
because it converts only zero bytes (the first element in the complement of set1), rather than all non-alphanumerics, to newlines.  

SQUEEZING REPEATS AND DELETING

When given just the --delete (-d) option, tr removes any input characters that are in set1.

When given just the --squeeze-repeats (-s) option, tr replaces each input sequence of a repeated character that is in set1 with a single occurrence of that character.

When given both the --delete and the --squeeze-repeats options, tr first performs any deletions using set1, then squeezes repeats from any remaining characters using set2.

The --squeeze-repeats option may also be used when translating, in which case tr first performs translation, then squeezes repeats from any remaining characters using set2.

Here are some examples to illustrate various combinations of options:

Remove all zero bytes:

tr -d '\000'

Put all words on lines by themselves. This converts all non-alphanumeric characters to newlines, then squeezes each string of repeated newlines into a single newline:

tr -cs '[a-zA-Z0-9]' '[\n*]'

Convert each sequence of repeated newlines to a single newline:

tr -s '\n'

GNU tr also accepts the following options in any combination with the others.

--help
Print a usage message and exit with a status code indicating success.
--version
Print version information on standard output then exit.
 

WARNING MESSAGES

Setting the environment variable POSIXLY_CORRECT turns off several warning and error messages, for strict compliance with POSIX.2. The messages normally occur in the following circumstances:

1. When the --delete option is given but --squeeze-repeats is not, and string2 is given, GNU tr by default prints a usage message and exits, because string2 would not be used. The POSIX specification says that string2 must be ignored in this case. Silently ignoring arguments is a bad idea.

2. When an ambiguous octal escape is given. For example, \400 is actually \40 followed by the digit 0, because the value 400 octal does not fit into a single byte.

Note that GNU tr does not provide complete BSD or System V compatibility. For example, there is no option to disable interpretation of the POSIX constructs [:alpha:], [=c=], and [c*10]. Also, GNU tr does not delete zero bytes automatically, unlike traditional UNIX versions, which provide no way to preserve zero bytes.

BUGS

The switch and argument interactions, simple as they are, still manage to be rather ugly. tr would make more sense as 3 separate commands for the 3 basic conversions.