mpn_addmul_1 and mpn_submul_1 are the most important routines
for overall GMP performance.  All multiplications and divisions come down to
repeated calls to these.  mpn_add_n, mpn_sub_n,
mpn_lshift and mpn_rshift are next most important.
On some CPUs assembly versions of the internal functions
mpn_mul_basecase and mpn_sqr_basecase give significant speedups,
mainly through avoiding function call overheads.  They can also potentially
make better use of a wide superscalar processor, as can bigger primitives like
mpn_addmul_2 or mpn_addmul_4.
The restrictions on overlaps between sources and destinations
(see Low-level Functions) are designed to facilitate a variety of
implementations.  For example, knowing mpn_add_n won’t have partly
overlapping sources and destination means reading can be done far ahead of
writing on superscalar processors, and loops can be vectorized on a vector
processor, depending on the carry handling.