If the exact division algorithm is done with a full subtraction at each stage and the dividend isn’t a multiple of the divisor, then low zero limbs are produced but with a remainder in the high limbs. For dividend a, divisor d, quotient q, and b = 2^mp_bits_per_limb, this remainder r is of the form
a = q*d + r*b^n
n represents the number of zero limbs produced by the subtractions, that being the number of limbs produced for q. r will be in the range 0<=r<d and can be viewed as a remainder, but one shifted up by a factor of b^n.
Carrying out full subtractions at each stage means the same number of cross products must be done as a normal division, but there’s still some single limb divisions saved. When d is a single limb some simplifications arise, providing good speedups on a number of processors.
The functions mpn_divexact_by3, mpn_modexact_1_odd and the
internal mpn_redc_X functions differ subtly in how they return r,
leading to some negations in the above formula, but all are essentially the
same.
Clearly r is zero when a is a multiple of d, and this leads to divisibility or congruence tests which are potentially more efficient than a normal division.
The factor of b^n on r can be ignored in a GCD when d is
odd, hence the use of mpn_modexact_1_odd by mpn_gcd_1 and
mpz_kronecker_ui etc (see Greatest Common Divisor).
Montgomery’s REDC method for modular multiplications uses operands of the form of x*b^-n and y*b^-n and on calculating (x*b^-n)*(y*b^-n) uses the factor of b^n in the exact remainder to reach a product in the same form (x*y)*b^-n (see Modular Powering).
Notice that r generally gives no useful information about the ordinary
remainder a mod d since b^n mod d could be anything.  If
however b^n ≡ 1 mod d, then r is the negative of the
ordinary remainder.  This occurs whenever d is a factor of
b^n-1, as for example with 3 in mpn_divexact_by3.  For a 32 or
64 bit limb other such factors include 5, 17 and 257, but no particular use
has been found for this.