From msuinfo!netnews.upenn.edu!dsinc!spool.mu.edu!howland.reston.ans.net!europa.eng.gtefsd.com!news.umbc.edu!olson Thu Sep 15 13:12:41 1994 Path: msuinfo!netnews.upenn.edu!dsinc!spool.mu.edu!howland.reston.ans.net!europa.eng.gtefsd.com!news.umbc.edu!olson From: olson@umbc.edu (Bryan G. Olson; CMSC (G)) Newsgroups: sci.crypt Subject: Re: ** Help me Montgomery ? ** Date: 14 Sep 1994 18:01:46 GMT Organization: University of Maryland, Baltimore County Lines: 135 Message-ID: <357dqa$iq@news.umbc.edu> References: <35700h$nbe@lucy.ee.und.ac.za> NNTP-Posting-Host: umbc7.umbc.edu X-Newsreader: TIN [version 1.2 PL2] Jyri Hamalainen (cat@olddaisy.ee.und.ac.za) wrote: : Could someone explain the basic logic of : "Montgomery modular multiplication without trial division", : method in hardware/ software applications. Peter Montgomery himself sometimes posts to this group, but I'll give an introduction to Montgomery multiplication which may be a bit easier for computer types to follow than his paper: Montgomery, Peter L, "Modular Multiplication Without Trial Division", _Mathematics of Computation_, Vol 44 #170, April 1995, pp 519-521. To use Montgomery multiplication mod N, we define an alternate representation of the integers (0,1,2,...N-1) We translate normal integers to Montgomery representation, do our multiplications with this new representation, then translate back to the normal representation. When computing an exponent mod N, we don't translate back to normal representation between multiplications; we just leave the results in Montgomery representation until all the multiplications are done. To avoid confusion between "mod" as an operation and as notation of equivalence classes, I'll use "x=y mod n" to mean that x is the least residue of y mod n, and "x==y mod n" to mean that x is congruent mod n to y. To form the Montgomery representation, we choose some R which is greater than N and relatively prime to N, and we want division by R to be inexpensive. In crypto work our modulus N usually has no small factors, so we can choose R to be some power of 2 a little bigger than N. Division by any power of two is easy, since we just chop off low order bits. Now to represent the integers in 0..N-1, we use the Montgomery representation M(x) = xR mod N To convert from Montgomery representation to normal representation, we find the inverse of R mod N under multiplication mod N. I'll call it R'. (Actually there are infinitely many inverses; I want the least residue). Then the inverse of M(x) is M'(x) = xR' mod N Since R and N are relatively prime, these functions are one to one on { 0,1,2,...N-1 }. All these integers have a unique representation. Now we find N' such that 0 < N' Organization: Compulink Information eXchange References: <35700h$nbe@lucy.ee.und.ac.za> Date: Thu, 15 Sep 1994 11:03:47 GMT Lines: 109 In article <35700h$nbe@lucy.ee.und.ac.za> cat@olddaisy.ee.und.ac.za (Jyri Hamalainen) writes: > >Could someone explain the basic logic of > >"Montgomery modular multiplication without trial division", > >method in hardware/ software applications. > >Thank you > >jYrI > Let R be a power of the machine word size. It needs to be greater than the modulus N, and coprime to it. Let R' = R^-1 mod N N' = -(N^-1) mod R To compute TR' mod N from T of 0 <= T <= RN: function REDC(T) begin m := (T mod R) * N' mod R; t := (T + m * N)/R; if t >= N then return t-N else return t end; because m * N == T * N' * N mod R (N' * N == -1 mod R) m * N == T * -1 mod R == -T mod R So when we add m * N to T, the result is congruent to 0 mod R. Therefore the result of the division, t, is an integer. In fact, the last line of REDC does not need to be executed if we maintain two extra bits throughout the calculations. Given two numbers x and y between 0 and N-1 inclusive, let z = REDC(xy). Then z == (xy)R^-1 mod N, so (x * R^-1)(y * R^-1) == z * R^-1 mod N. (This implies that there is no need to correct for the multiplications by R^-1 during the exponentiation.) Also, addition and subtraction is unchanged in this form. Also, note that the product T * N' needs only to be carried out at half precision. In order to use REDC, we need to convert our numbers into N-residue form before beginning the calcualtion. To convert an integer x into N-residue form, compute x*R mod N. To convert an N-residue into an integer, multiply it by R^-1 mod N. Multiprecision Case Instead of computing all of m at once, we can do it a digit at a time. This allows us to use a single digit n0', which is -N^-1 mod b instead of -N^-1 mod R. b is the word size of the computer. The result will not be the same as the original algorithm, but T will still be a multiple of R. function REDC(T) begin for i := 0 to n-1 do begin m := T[i] * n0' mod b; T := T + m * N * b^i end return T/R; end Note that this is very similar to long multiplication: begin for i := 0 to n-1 do begin T := T + A[i] * B * b^i end end This takes n^2 + n multiplications. Effectively, the modular multiplication has been reduced to two long multiplications. However, still further performance improvements can be realized by interleaving the two phases of multiplication and reduction: function Mont_Product(A,B) begin T := 0; for i := to to n-1 do begin T := T + A[i] * B * b^i; m := T[i] * n0' mod b; T := T + m * N * b^i; end; return T/R; end This performance improvement is very valuable if using a DSP, because multiplies are very fast, and bookkeeping overhead is to be avoided wherever possible. It still does as many multiplies as the separate multiplication and reduction, however, so the performance gain will be small on a more conventional microprocessor. Andrew. From msuinfo!agate!howland.reston.ans.net!EU.net!ub4b!idefix.CS.kuleuven.ac.be!news.sri.ucl.ac.be!veithen Thu Sep 15 13:13:21 1994 Path: msuinfo!agate!howland.reston.ans.net!EU.net!ub4b!idefix.CS.kuleuven.ac.be!news.sri.ucl.ac.be!veithen From: veithen@dice.ucl.ac.be (Veithen Daniel) Newsgroups: sci.crypt Subject: Re: ** Help me Montgomery ? ** Date: 15 Sep 1994 10:05:31 GMT Organization: Universite Catholique de Louvain, Louvain-la-Neuve, Belgium Lines: 86 Distribution: world Message-ID: <35969b$csj@sci3.sri.ucl.ac.be> NNTP-Posting-Host: sphinx.dice.ucl.ac.be cat@olddaisy.ee.und.ac.za (Jyri Hamalainen) writes: > > Could someone explain the basic logic of > > "Montgomery modular multiplication without trial division", > > method in hardware/ software applications. > > Thank you > > jYrI > Try the following articles about Montgomery's algorithm : @inproceedings{21, author = "Duss\'{e}, S.R. and {Kaliski Jr.}, B.S.", address = "New York", booktitle = "- Proc. Advances in Cryptology - EUROCRYPT '90", editor = "Damg{\aa}rd, I.", pages = "230--244", publisher = "Springer Verlag", series = "Lecture Notes in Computer Sciences", title = "{A cryptographic library for the Motorola DSP56000}", volume = "473", year = "1991", ISBN = "" } @inproceedings{n29, author = "Eldridge, S.E. and Walter, C.D.", booktitle = "IEEE transactions on computers", month = jun, number = "6", organization = "IEEE", pages = "693--699", title = "Hardware implementation of {Montgomery's} modular multiplication algorithm", volume = "42", year = "1993", ISBN = "" } @inproceedings{n14, author = "Walter, C.D.", booktitle = "IEEE transactions on computers", month = mar, number = "3", organization = "IEEE", pages = "376--378", title = "Systolic modular multiplication", volume = "42", year = "1993", ISBN = "" } @inproceedings{10, author = "Arazi, B.", booktitle = "IEEE Journal on selected areas in communications", month = jun, number = "6", organization = "IEEE", pages = "761--769", title = "Double-precision modular multiplication based on a single-precision modular multiplier and a standard CPU" volume = "11", year = "1993", ISBN = "" } Daniel Veithen Laboratoire de Microelectronique de l'Universite Catholique de Louvain Batiment Maxwell tf: (+32) 10 47 81 37 Place du Levant, 3 fax: (+32) 10 47 86 67 B-1348 Louvain-La-Neuve e-mail: veithen@dice.ucl.ac.be Belgium