For inputs larger than GCD_DC_THRESHOLD, GCD is computed via the HGCD
(Half GCD) function, as a generalization to Lehmer’s algorithm.
Let the inputs a,b be of size N limbs each. Put S = floor(N/2) + 1. Then HGCD(a,b) returns a transformation matrix T with non-negative elements, and reduced numbers (c;d) = T^{-1} (a;b). The reduced numbers c,d must be larger than S limbs, while their difference abs(c-d) must fit in S limbs. The matrix elements will also be of size roughly N/2.
The HGCD base case uses Lehmer’s algorithm, but with the above stop condition
that returns reduced numbers and the corresponding transformation matrix
half-way through. For inputs larger than HGCD_THRESHOLD, HGCD is
computed recursively, using the divide and conquer algorithm in “On
Schönhage’s algorithm and subquadratic integer GCD computation” by Möller
(see References). The recursive algorithm consists of these main
steps.
GCD is then implemented as a loop around HGCD, similarly to Lehmer’s
algorithm. Where Lehmer repeatedly chops off the top two limbs, calls
mpn_hgcd2, and applies the resulting matrix to the full numbers, the
sub-quadratic GCD chops off the most significant third of the limbs (the
proportion is a tuning parameter, and 1/3 seems to be more efficient
than, e.g., 1/2), calls mpn_hgcd, and applies the resulting
matrix. Once the input numbers are reduced to size below
GCD_DC_THRESHOLD, Lehmer’s algorithm is used for the rest of the work.
The asymptotic running time of both HGCD and GCD is O(M(N)*log(N)), where M(N) is the time for multiplying two N-limb numbers.