Copyright (C) 1994, Digital Equipment Corp.Modified On Tue Feb 15 10:00:03 PST 1994 By perl
UNSAFE MODULEUnsafeHash EXPORTSText ; IMPORT TextF, Word;
Hash
is the only unsafe procedure in the Text
interface,
so me move it into its own module.
We will derive an efficient version of Text.Hash
starting from the
following simple version:
res := 0; i := 0; DO i # M -> res[i MOD N] := res[i MOD N] XOR t[i]; i := i + 1 ODwhere
t is the text to be hashed M is the number of bytes in t t[i] is byte i of t res is the computed result N is the number of bytes per word res[i] is byte i of resThe numeric value of
res
will depend on whether the
machine is big-endian or little-endian; but the value
of res
regarded as a sequence of N
bytes will not.
We would like to derive a more efficient version that uses word operations. We write
rotup(w, k)to indicate the word
w
(regarded as an array of
bytes) shifted up (towards increasing indexes) by k
places, circularly.
rotup(w, k)[(i + k) MOD N] = w[i] for all iWe also write
rotdn(w, k)to indicate
up(w, -k)
.
We will also need the corresponding shift operators:
shiftup
is like rotup
except it shifts instead of rotates, that is:
shiftup(w, k)[i + k] = w[i] for all i such that i and i+k are both in [0..N-1] and all other bytes of shiftup(w, k) are 0.and
shiftdn(w, k) = shiftup(w, -k)
.
We begin by transforming the simple loop by a change of
coordinates. We temp
, which is res
rotated
so that temp[0]
corresponds to res[i MOD N]
, that is,
rotup(temp, i) = res (Q)(Note that
rotup(temp, i) = rotup(temp, i MOD N)
; in general the
second argument to rotup
and rotdn
only matters modulo N
.)
This allows us to transform the simple loop into:
res := 0; temp := 0; {Q} i := 0; DO i # M -> {Q} res[i MOD N] := res[i MOD N] XOR t[i]; temp[0] := temp[0] XOR t[i]; i := i + 1; temp := rotdn(temp, 1) ODProof that
rotdn(temp, 1)
is correct:
{Q} i := i + 1; temp := rotdn(temp, 1) {Q} == {Hoare Logic} Q => Q(i := i+1, temp := rotdn(temp, 1)) == {Carry out the substitution} Q => rotup(rotdn(temp,1), i+1) = res == {Since rotup(rotup(x, a), b) = rotup(x, a+b)} == Q => rotup(temp, i) = res == Q => Q == TRUENow we can eliminate the work on
res
, and do it only at the end:
temp := 0; i := 0; DO i # M -> temp[0] := temp[0] XOR t[i]; i := i + 1; temp := rotdn(temp, 1) OD; {Q} res := rotup(temp, M)Next, we break this loop into three pieces, the first of which processes the unaligned prefix of the text, the second of which processes the aligned full words of the text, and the last of which processes trailing subword fragment:
temp := 0; i := 0; DO i # M AND (ADR(t[i]) MOD N) # 0 -> temp[0] := t[i]; i := i + 1; temp := rotdn(temp, 1) OD; DO i + N <= M -> VAR j := i IN DO j # i + N -> temp[0] := temp[0] XOR t[j]; j := j + 1; temp := rotdn(temp, 1) OD END; i := i + N OD; DO i # M -> temp[0] := temp[0] XOR t[i]; i := i + 1; temp := rotdn(temp, 1) OD; {Q} res := rotup(temp, M)Now we will change the first loop to use word operations. This loop copies into
temp
some number of bytes from a single word of memory, preserving the
order of the bytes, and leaving the bytes in temp
so that the last byte
copied is in temp[N-1]
. We can achieve this with word operations by loading
the appropriate word into temp
, shifting down to eliminate any junk bytes
that preceed the relevant bytes, and then shifting up to eliminate any junk
bytes that follow the relevant bytes, if any. In our case, the number of
preceeding junk bytes (jpre
) is just ADR(t[0]) MOD N
, and the number of
following junk bytes (jpost
) is zero if M > N - jpre
, otherwise it is
N -jpre - M
. Thus the first loop above can be replaced by:
jpre := ADR(t[0]) MOD N; IF jpre # 0 -> jpost := MAX(0, N - jpre - M); temp := Mem[ADR(t[0])-jpre]; temp := shiftdn(temp, jpre); temp := shiftup(temp, jpost+jpre); i := N - jpre - jpost [] jpre = 0 -> SKIP FISimilarly, we can change the last loop to use word operations.
IF i # M -> jpost := N - (M - i); VAR w := Mem[ADR(t[i])] IN w := shiftup(w, jpost); temp := rotup(temp, jpost); temp := temp XOR w; END [] i = M -> SKIP FI(Note that the rotation of
temp
to rotup(temp, jpost)
could equally well
have been written rotdn(temp, M-i)
. The same rotation that brings temp
into alignment with shiftup(w, jpost)
also matches the rotation performed by
the loop we are refining.)
Finally we change the middle loop to use word operations. Its
inner loop rotates temp
by one N
times, and consequently
has no net rotation. The inner loop also XORs
t[i], ..., t[i+N-1]into
temp[0], ..., temp[N-1].respectively. Since
ADR(t[i]) MOD N = 0
, this can be
accomplished by a single word operation. The new version
of the middle loop is therefore:
DO i + N <= M -> temp := temp XOR Mem[ADR(t[i])]; i := i + N OD;In the above we write
Mem[addr]
to indicate the word whose byte's addresses
range from addr
to addr+N-1
, regarding that word as an array of bytes.
We have also (for the first time) used XOR on words instead of bytes.
Now we can translate the program into Modula-3. Here is the collected guarded command version:
temp := 0; i := 0; jpre := ADR(t[0]) MOD N; IF jpre # 0 -> jpost := MAX(0, N - jpre - M); temp := Mem[ADR(t[0])-jpre]; temp := shiftdn(temp, jpre); temp := shiftup(temp, jpost+jpre); i := N - jpre - jpost [] jpre = 0 -> SKIP FI; DO i + N <= M -> temp := temp XOR Mem[ADR(t[i])]; i := i + N OD; IF i # M -> jpost := N - (M - i); VAR w := Mem[ADR(t[i])] IN w := shiftup(w, jpost); temp := rotup(temp, jpost); temp := temp XOR w; END [] i = M -> SKIP FI; res := rotup(temp, M)Which in Modula-3 becomes:
PROCEDUREIn the Modula-3 version we have added the text length into the result before returning it, in order to get a better hash function for texts that contain long strings of repeated characters. Also, instead of multiplying byHash (t: TEXT): INTEGER = CONST N = BYTESIZE(INTEGER); VAR temp := 0; p := LOOPHOLE (ADR(t[0]), UNTRACED REF INTEGER); m := NUMBER(t^) - 1; endp := p + m; BEGIN VAR jpre := Word.And(LOOPHOLE(p, INTEGER), N-1); jpost: INTEGER; BEGIN IF jpre # 0 THEN jpost := MAX(0, N - jpre - m); temp := LOOPHOLE(p - jpre, UNTRACED REF INTEGER)^; temp := Word.Shift(Word.Shift(temp, jpre * -up1), (jpost+jpre) * up1); INC(p, N - jpre - jpost) END END; WHILE p + N < endp DO temp := Word.Xor(temp, p^); INC(p, N) END; IF littleEndian THEN IF p # endp THEN VAR jpost := N - (endp - p); w := Word.Shift(p^, Word.Shift(jpost, lgUp1)); BEGIN temp := Word.Xor(Word.Rotate(temp, Word.Shift(jpost, lgUp1)), w) END END; RETURN Word.Plus(Word.Rotate(temp, Word.Shift(m, lgUp1)), m) ELSE IF p # endp THEN VAR jpost := N - (endp - p); w := Word.Shift(p^, -Word.Shift(jpost, lgUp1)); BEGIN temp := Word.Xor(Word.Rotate(temp, -Word.Shift(jpost, lgUp1)), w) END END; RETURN Word.Plus(Word.Rotate(temp, -Word.Shift(m, lgUp1)), m) END END Hash;
up1
we have shifted by its base two logarithm
lg2Up1
. These constants are computed below:
VAR littleEndian: BOOLEAN; ref := NEW(UNTRACED REF INTEGER); up1: INTEGER; lgUp1: INTEGER; BEGIN <* ASSERT 1 = ADRSIZE(CHAR) *> <* ASSERT 0 = Word.And(BYTESIZE(INTEGER), BYTESIZE(INTEGER)-1) *> ref^ := 1; littleEndian := 1 = LOOPHOLE(ref, UNTRACED REF [0..255])^; IF littleEndian THEN up1 := BITSIZE(CHAR) ELSE up1 := -BITSIZE(CHAR) END; lgUp1 := 0; VAR k := 1; BEGIN WHILE k # ABS(up1) DO INC(lgUp1); k := k + k END END END UnsafeHash.