Lehmer’s improvement of the Euclidean algorithms is based on the observation
that the initial part of the quotient sequence depends only on the most
significant parts of the inputs. The variant of Lehmer’s algorithm used in GMP
splits off the most significant two limbs, as suggested, e.g., in “A
Double-Digit Lehmer-Euclid Algorithm” by Jebelean (see References). The
quotients of two double-limb inputs are collected as a 2 by 2 matrix with
single-limb elements. This is done by the function mpn_hgcd2. The
resulting matrix is applied to the inputs using mpn_mul_1 and
mpn_submul_1. Each iteration usually reduces the inputs by almost one
limb. In the rare case of a large quotient, no progress can be made by
examining just the most significant two limbs, and the quotient is computed
using plain division.
The resulting algorithm is asymptotically O(N^2), just as the Euclidean
algorithm and the binary algorithm. The quadratic part of the work are
the calls to mpn_mul_1 and mpn_submul_1. For small sizes, the
linear work is also significant. There are roughly N calls to the
mpn_hgcd2 function. This function uses a couple of important
optimizations:
mpn_hgcd (see next
section). This means that when called with the most significant two limbs of
two large numbers, the returned matrix does not always correspond exactly to
the initial quotient sequence for the two large numbers; the final quotient
may sometimes be one off.