直接按升序计算数字因子而不进行排序?

时间:2015-05-01 18:35:57

标签: algorithm primes enumeration factors

是否有一种有效的算法来按升序排列数字 n 的因子,而不进行排序? “有效”我的意思是:

  1. 该算法通过从 n 的素数幂分解开始,避免了对除数的强力搜索。

  2. 算法的运行时复杂度为O( d log 2 d )或更好,其中 d 是除数名词

  3. 算法的空间复杂度为O( d )。

  4. 该算法避免了排序操作。也就是说,因子按顺序生成而不是按顺序生成并随后排序。虽然枚举使用简单的递归方法然后排序是O( d log 2 d ),但是排序所涉及的所有内存访问都有非常难看的成本。

  5. 一个简单的例子是 n = 360 =2³×3²×5,其 d = 24个因子:{1,2,3,4,5,6 ,8,9,10,12,15,18,20,24,30,36,40,45,60,72,90,120,180,360}。

    更严重的例子是 n = 278282512406132373381723386382308832000 =2⁸×3⁴×5³×7²×11²×13²×17×19×23×29×31×37×41×43×47×53 ×59×61×67×71×73×79,其中 d = 318504960因子(显然这里列出太多了!)。顺便提一下,这个数字具有最大数量的因子,最多可达2 ^ 128。

    我可以发誓几周前我看到了这种算法的描述,带有示例代码,但现在我似乎无法在任何地方找到它。它使用了一些魔术技巧,在每个素数因子的输出列表中维护一个祖先索引列表。 (更新:我用汉明数字混淆因子生成,运算方式类似。)

    更新

    我最终在运行时使用了一个O( d )的解决方案,具有极低的内存开销,可以就地创建O( d )输出,并且比我所知道的任何其他方法快得多。我已经发布了这个解决方案作为答案,使用C源代码。它是另一个贡献者Will Ness在Haskell中提供的一个优化算法的优化简化版本。我选择了Will的答案作为公认的答案,因为它提供了一个非常优雅的解决方案,符合最初规定的所有要求。

2 个答案:

答案 0 :(得分:5)

这个答案给出了一个C实现,我相信它是最快且最节省内存的。

算法概述。此算法基于Will Ness在another answer中引入的自下而上合并方法,但进一步简化,以便合并的列表实际上并不存在存在于记忆的任何地方每个列表的head元素都被整理并保存在一个小数组中,而列表中的所有其他元素都是根据需要即时构建的。使用“幻像列表” - 运行代码的想象力 - 大大减少了内存占用,以及读取和写入的内存访问量,并且还改善了空间局部性,从而显着提高了速度算法。每个级别的因子将按顺序直接写入输出数组中的最终静止位置。

基本思想是使用数学归纳法对主要功率因子分解产生因子。例如,为了产生360的因子,计算72的因子,然后乘以相关的5的幂,在这种情况下{1,5};为了产生72的因子,计算8的因子,然后乘以3的相关幂,在这种情况下{1,3,9};为了产生8的因子,基本情况1乘以2的相关幂,在这种情况下为{1,2,4,8}。因此,在每个归纳步骤中,在现有因子组和素数幂集之间计算笛卡尔积,并通过乘法将结果减少回整数。

下面是数字360的图示。从左到右是存储单元;一行代表一个时间步。时间表示为垂直流动。

Divisors of 360

空间复杂性。产生因子的临时数据结构非常小:只使用O(log 2( n ))空间,其中<​​em> n 是其生成因子的数字。例如,在C中该算法的128位实现中,诸如333,939,014,887,358,848,058,068,063,658,770,598,400(其基数2对数为≈127.97)之类的数字分配5.1 GB来存储其318,504,960因子的列表,但仅使用360 字节产生该列表的额外开销。任何128位数字最多需要大约5 KB的开销。

运行时复杂性。运行时完全取决于素数幂分解的指数(例如,主要签名)。为了获得最佳结果,应该首先处理最大的指数,并且最后的指数应该最后处理,以便运算符在最后阶段由低复杂性合并主导,实际上通常会变成二进制合并。渐近运行时为O( c n d n )),其中 d n )是 n 的除数,其中 c n )是一些常数,取决于 n 的主要签名。也就是说,与主要签名相关联的每个等价类具有不同的常量。由小指数主导的Prime签名具有较小的常数;由大指数主导的主要签名具有更大的常数。因此,运行时复杂性通过主要签名进行聚类。

图表。运行时性能在3.4 GHz上进行了分析。英特尔酷睿i7为66,591个 n 值的样本提供 d n )因子,用于唯一 d (< em> n )高达1.6亿。该图的 n 的最大值为14,550,525,518,294,259,162,294,162,737,840,640,000,在2.35秒内产生159,744,000个因子。

Total runtime

每秒产生的排序因子的数量基本上是上述的反转。通过主要签名进行聚类在数据中是显而易见的。性能还受L1,L2和L3高速缓存大小以及物理RAM限制的影响。

Factors produced per second

源代码。下面附带的是C编程语言(特别是C11)中的概念验证程序。它已经在使用Clang / LLVM的x86-64上进行了测试,尽管它也适用于GCC。

/*==============================================================================

DESCRIPTION

   This is a small proof-of-concept program to test the idea of generating the
   factors of a number in ascending order using an ultra-efficient sortless
   method.


INPUT

   Input is given on the command line, either as a single argument giving the
   number to be factored or an even number of arguments giving the 2-tuples that
   comprise the prime-power factorization of the desired number.  For example,
   the number

      75600 = 2^4 x 3^3 x 5^2 x 7

   can be given by the following list of arguments:

      2 4 3 3 5 2 7 1

   Note:  If a single number is given, it will require factoring to produce its
   prime-power factorization.  Since this is just a small test program, a very
   crude factoring method is used that is extremely fast for small prime factors
   but extremely slow for large prime factors.  This is actually fine, because
   the largest factor lists occur with small prime factors anyway, and it is the
   production of large factor lists at which this program aims to be proficient.
   It is simply not interesting to be fast at producing the factor list of a
   number like 17293823921105882610 = 2 x 3 x 5 x 576460797370196087, because
   it has only 32 factors.  Numbers with tens or hundreds of thousands of
   factors are much more interesting.


OUTPUT

   Results are written to standard output.  A list of factors in ascending order
   is produced, followed by runtime required to generate the list (not including
   time to print it).


AUTHOR

   Todd Lehman
   2015/05/10

*/

//-----------------------------------------------------------------------------
#include <inttypes.h>
#include <limits.h>
#include <stdbool.h>
#include <stdlib.h>
#include <stdio.h>
#include <stdarg.h>
#include <string.h>
#include <ctype.h>
#include <time.h>
#include <math.h>
#include <assert.h>

//-----------------------------------------------------------------------------
typedef  unsigned int  uint;
typedef  uint8_t       uint8;
typedef  uint16_t      uint16;
typedef  uint32_t      uint32;
typedef  uint64_t      uint64;
typedef  __uint128_t   uint128;

#define  UINT128_MAX  (uint128)(-1)

#define  UINT128_MAX_STRLEN  39

//-----------------------------------------------------------------------------
#define  ARRAY_CAPACITY(x)  (sizeof(x) / sizeof((x)[0]))

//-----------------------------------------------------------------------------
// This structure encode a single prime-power pair for the prime-power
// factorization of numbers, for example 3 to the 4th power.

#pragma pack(push, 8)
typedef struct
{
  uint128  p;  // Prime.
  uint8    e;  // Power (exponent).
}
PrimePower;   // 24 bytes using 8-byte packing
#pragma pack(pop)

//-----------------------------------------------------------------------------
// Prime-power factorization structure.
//
// This structure is sufficient to represent the prime-power factorization of
// all 128-bit values.  The field names ω and Ω are dervied from the standard
// number theory functions ω(n) and Ω(n), which count the number of unique and
// non-unique prime factors of n, respectively.  The field name d is derived
// from the standard number theory function d(n), which counts the number of
// divisors of n, including 1 and n.
//
// The maximum possible value here of ω is 26, which occurs at
// n = 232862364358497360900063316880507363070 = 2 x 3 x 5 x 7 x 11 x 13 x 17 x
// 19 x 23 x 29 x 31 x 37 x 41 x 43 x 47 x 53 x 59 x 61 x 67 x 71 x 73 x 79 x
// 83 x 89 x 97 x 101, which has 26 unique prime factors.
//
// The maximum possible value of Ω here is 127, which occurs at n = 2^127 and
// n = 2^126 x 3, both of which have 127 non-unique prime factors.
//
// The maximum possible value of d here is 318504960, which occurs at
// n = 333939014887358848058068063658770598400 = 2^9 x 3^5 x 5^2 x 7^2 x 11^2 x
// 13^2 x 17 x 19 x 23 x 29 x 31 x 37 x 41 x 43 x 47 x 53 x 59 x 61 x 67 x 71 x
// 73 x 79.
//
#pragma pack(push, 8)
typedef struct
{
  PrimePower  f[32];  // Primes and their exponents.
  uint8       ω;      // Count of prime factors without multiplicity.
  uint8       Ω;      // Count of prime factors with multiplicity.
  uint32      d;      // Count of factors of n, including 1 and n.
  uint128     n;      // Value of n on which all other fields depend.
}
PrimePowerFactorization;  // 656 bytes using 8-byte packing
#pragma pack(pop)

#define  MAX_ω  26
#define  MAX_Ω  127

//-----------------------------------------------------------------------------
// Fatal error:  print error message and abort.

void fatal_error(const char *format, ...)
{
  va_list args;
  va_start(args, format);
  vfprintf(stderr, format, args);
  exit(1);
}

//------------------------------------------------------------------------------
uint128 uint128_from_string(const char *const str)
{
  assert(str != NULL);

  uint128 n = 0;
  for (int i = 0; isdigit(str[i]); i++)
    n = (n * 10) + (uint)(str[i] - '0');

  return n;
}

//------------------------------------------------------------------------------
void uint128_to_string(const uint128 n,
                       char *const strbuf, const uint strbuflen)
{
  assert(strbuf != NULL);
  assert(strbuflen >= UINT128_MAX_STRLEN + 1);

  // Extract digits into string buffer in reverse order.
  uint128 a = n;
  char *s = strbuf;
  do { *(s++) = '0' + (uint)(a % 10); a /= 10; } while (a != 0);
  *s = '\0';

  // Reverse the order of the digits.
  uint l = strlen(strbuf);
  for (uint i = 0; i < l/2; i++)
    { char t = strbuf[i]; strbuf[i] = strbuf[l-1-i]; strbuf[l-1-i] = t; }

  // Verify result.
  assert(uint128_from_string(strbuf) == n);
}

//------------------------------------------------------------------------------
char *uint128_to_static_string(const uint128 n, const uint i)
{
  static char str[2][UINT128_MAX_STRLEN + 1];
  assert(i < ARRAY_CAPACITY(str));
  uint128_to_string(n, str[i], ARRAY_CAPACITY(str[i]));
  return str[i];
}

//------------------------------------------------------------------------------
// Compute sorted list of factors, given a prime-power factorization.

uint128 *compute_factors(const PrimePowerFactorization ppf)
{
  const uint128 n =       ppf.n;
  const uint    d = (uint)ppf.d;
  const uint    ω = (uint)ppf.ω;

  if (n == 0)
    return NULL;

  uint128 *factors = malloc((d + 1) * sizeof(*factors));
  if (!factors)
    fatal_error("Failed to allocate array of %u factors.", d);
  uint128 *const factors_end = &factors[d];


  // --- Seed the factors[] array.

  factors_end[0] = 0;   // Dummy value to simplify looping in bottleneck code.
  factors_end[-1] = 1;  // Seed value.

  if (n == 1)
    return factors;


  // --- Iterate over all prime factors.

  uint range = 1;
  for (uint i = 0; i < ω; i++)
  {
    const uint128 p = ppf.f[i].p;
    const uint    e = ppf.f[i].e;

    // --- Initialize phantom input lists and output list.
    assert(e < 128);
    assert(range < d);
    uint128 *restrict in[128];
    uint128 pe[128], f[128];
    for (uint j = 0; j <= e; j++)
    {
      in[j] = &factors[d - range];
      pe[j] = (j == 0)? 1 : pe[j-1] * p;
      f[j] = pe[j];
    }
    uint active_list_count = 1 + e;
    range *= 1 + e;
    uint128 *restrict out = &factors[d - range];

    // --- Merge phantom input lists to output list, until all input lists are
    //     extinguished.
    while (active_list_count > 0)
    {
      if (active_list_count == 1)
      {
        assert(out == in[0]);
        while (out != factors_end)
          *(out++) *= pe[0];
        in[0] = out;
      }
      else if (active_list_count == 2)
      {
        // This section of the code is the bottleneck of the entire factor-
        // producing algorithm.  Other portions need to be fast, but this
        // *really* needs to be fast; therefore, it has been highly optimized.
        // In fact, it is by far most frequently the case here that pe[0] is 1,
        // so further optimization is warranted in this case.
        uint128 f0 = f[0], f1 = f[1];
        uint128 *in0 = in[0], *in1 = in[1];
        const uint128 pe0 = pe[0], pe1 = pe[1];

        if (pe[0] == 1)
        {
          while (true)
          {
            if (f0 < f1)
              { *(out++) = f0; f0 = *(++in0);
                if (in0 == factors_end) break; }
            else
              { *(out++) = f1; f1 = *(++in1) * pe1; }
          }
        }
        else
        {
          while (true)
          {
            if (f0 < f1)
              { *(out++) = f0; f0 = *(++in0) * pe0;
                if (in0 == factors_end) break; }
            else
              { *(out++) = f1; f1 = *(++in1) * pe1; }
          }
        }

        f[0] = f0; f[1] = f1;
        in[0] = in0; in[1] = in1;
      }
      else if (active_list_count == 3)
      {
        uint128 f0 = f[0], f1 = f[1], f2 = f[2];
        uint128 *in0 = in[0], *in1 = in[1], *in2 = in[2];
        const uint128 pe0 = pe[0], pe1 = pe[1], pe2 = pe[2];

        while (true)
        {
          if (f0 < f1)
          {
            if (f0 < f2)
              { *(out++) = f0; f0 = *(++in0) * pe0;
                if (in0 == factors_end) break; }
            else
              { *(out++) = f2; f2 = *(++in2) * pe2; }
          }
          else
          {
            if (f1 < f2)
              { *(out++) = f1; f1 = *(++in1) * pe1; }
            else
              { *(out++) = f2; f2 = *(++in2) * pe2; }
          }
        }

        f[0] = f0; f[1] = f1, f[2] = f2;
        in[0] = in0; in[1] = in1, in[2] = in2;
      }
      else if (active_list_count >= 3)
      {
        while (true)
        {
          // Chose the smallest multiplier.
          uint k_min = 0;
          uint128 f_min = f[0];
          for (uint k = 0; k < active_list_count; k++)
            if (f[k] < f_min)
              { f_min = f[k]; k_min = k; }

          // Write the output factor, advance the input pointer, and
          // produce a new factor in the array f[] of list heads.
          *(out++) = f_min;
          f[k_min] = *(++in[k_min]) * pe[k_min];
          if (in[k_min] == factors_end)
            { assert(k_min == 0); break; }
        }
      }

      // --- Remove the newly emptied phantom input list.  Note that this is
      //     guaranteed *always* to be the first remaining non-empty list.
      assert(in[0] == factors_end);
      for (uint j = 1; j < active_list_count; j++)
      {
        in[j-1] = in[j];
        pe[j-1] = pe[j];
         f[j-1] =  f[j];
      }
      active_list_count -= 1;
    }

    assert(out == factors_end);
  }


  // --- Validate array of sorted factors.
  #ifndef NDEBUG
  {
    for (uint k = 0; k < d; k++)
    {
      if (factors[k] == 0)
        fatal_error("Produced a factor of 0 at index %u.", k);

      if (n % factors[k] != 0)
        fatal_error("Produced non-factor %s at index %u.",
                    uint128_to_static_string(factors[k], 0), k);

      if ((k > 0) && (factors[k-1] == factors[k]))
        fatal_error("Duplicate factor %s at index %u.",
                    uint128_to_static_string(factors[k], 0), k);

      if ((k > 0) && (factors[k-1] > factors[k]))
        fatal_error("Out-of-order factors %s and %s at indexes %u and %u.",
                    uint128_to_static_string(factors[k-1], 0),
                    uint128_to_static_string(factors[k], 1),
                    k-1, k);
    }
  }
  #endif


  return factors;
}

//------------------------------------------------------------------------------
// Print prime-power factorization of a number.

void print_ppf(const PrimePowerFactorization ppf)
{
  printf("%s = ", uint128_to_static_string(ppf.n, 0));
  if (ppf.n == 0)
  {
    printf("0");
  }
  else
  {
    for (uint i = 0; i < ppf.ω; i++)
    {
      if (i > 0)
        printf(" x ");

      printf("%s", uint128_to_static_string(ppf.f[i].p, 0));

      if (ppf.f[i].e > 1)
        printf("^%"PRIu8"", ppf.f[i].e);
    }
  }
  printf("\n");
}

//------------------------------------------------------------------------------
int compare_powers_ascending(const void *const pf1,
                             const void *const pf2)
{
  const PrimePower f1 = *((const PrimePower *)pf1);
  const PrimePower f2 = *((const PrimePower *)pf2);

  return  (f1.e < f2.e)?  -1:
          (f1.e > f2.e)?  +1:
                           0;  // Not an error; duplicate exponents are common.
}

//------------------------------------------------------------------------------
int compare_powers_descending(const void *const pf1,
                              const void *const pf2)
{
  const PrimePower f1 = *((const PrimePower *)pf1);
  const PrimePower f2 = *((const PrimePower *)pf2);

  return  (f1.e < f2.e)?  +1:
          (f1.e > f2.e)?  -1:
                           0;  // Not an error; duplicate exponents are common.
}

//------------------------------------------------------------------------------
int compare_primes_ascending(const void *const pf1,
                             const void *const pf2)
{
  const PrimePower f1 = *((const PrimePower *)pf1);
  const PrimePower f2 = *((const PrimePower *)pf2);

  return  (f1.p < f2.p)?  -1:
          (f1.p > f2.p)?  +1:
                           0;  // Error; duplicate primes must never occur.
}

//------------------------------------------------------------------------------
int compare_primes_descending(const void *const pf1,
                              const void *const pf2)
{
  const PrimePower f1 = *((const PrimePower *)pf1);
  const PrimePower f2 = *((const PrimePower *)pf2);

  return  (f1.p < f2.p)?  +1:
          (f1.p > f2.p)?  -1:
                           0;  // Error; duplicate primes must never occur.
}

//------------------------------------------------------------------------------
// Sort prime-power factorization.

void sort_ppf(PrimePowerFactorization *const ppf,
              const bool primes_major,      // Best false
              const bool primes_ascending,  // Best false
              const bool powers_ascending)  // Best false
{
  int (*compare_primes)(const void *, const void *) =
    primes_ascending? compare_primes_ascending : compare_primes_descending;

  int (*compare_powers)(const void *, const void *) =
    powers_ascending? compare_powers_ascending : compare_powers_descending;

  if (primes_major)
  {
    mergesort(ppf->f, ppf->ω, sizeof(ppf->f[0]), compare_powers);
    mergesort(ppf->f, ppf->ω, sizeof(ppf->f[0]), compare_primes);
  }
  else
  {
    mergesort(ppf->f, ppf->ω, sizeof(ppf->f[0]), compare_primes);
    mergesort(ppf->f, ppf->ω, sizeof(ppf->f[0]), compare_powers);
  }
}

//------------------------------------------------------------------------------
// Compute prime-power factorization of a 128-bit value.  Note that this
// function is designed to be fast *only* for numbers with very simple
// factorizations, e.g., those that produce large factor lists.  Do not attempt
// to factor large semiprimes with this function.  (The author does know how to
// factor large numbers efficiently; however, efficient factorization is beyond
// the scope of this small test program.)

PrimePowerFactorization compute_ppf(const uint128 n)
{
  PrimePowerFactorization ppf;

  if (n == 0)
  {
    ppf = (PrimePowerFactorization){ .ω = 0, .Ω = 0, .d = 0, .n = 0 };
  }
  else if (n == 1)
  {
    ppf = (PrimePowerFactorization){ .f[0] = { .p = 1, .e = 1 },
                                     .ω = 1, .Ω = 1, .d = 1, .n = 1 };
  }
  else
  {
    ppf = (PrimePowerFactorization){ .ω = 0, .Ω = 0, .d = 1, .n = n };

    uint128 m = n;
    for (uint128 p = 2; p * p <= m; p += 1 + (p > 2))
    {
      if (m % p == 0)
      {
        assert(ppf.ω <= MAX_ω);
        ppf.f[ppf.ω].p = p;
        ppf.f[ppf.ω].e = 0;
        while (m % p == 0)
          { m /= p; ppf.f[ppf.ω].e += 1; }
        ppf.d *= (1 + ppf.f[ppf.ω].e);
        ppf.Ω += ppf.f[ppf.ω].e;
        ppf.ω += 1;
      }
    }
    if (m > 1)
    {
      assert(ppf.ω <= MAX_ω);
      ppf.f[ppf.ω].p = m;
      ppf.f[ppf.ω].e = 1;
      ppf.d *= 2;
      ppf.Ω += 1;
      ppf.ω += 1;
    }
  }

  return ppf;
}

//------------------------------------------------------------------------------
// Parse prime-power factorization from a list of ASCII-encoded base-10 strings.
// The values are assumed to be 2-tuples (p,e) of prime p and exponent e.
// Primes must not exceed 2^128 - 1 and must not be repeated.  Exponents must
// not exceed 2^8 - 1, but can of course be repeated.  The constructed value
// must not exceed 2^128 - 1.

PrimePowerFactorization parse_ppf(const uint pairs, const char *const values[])
{
  assert(pairs <= MAX_ω);

  PrimePowerFactorization ppf;

  if (pairs == 0)
  {
    ppf = (PrimePowerFactorization){ .ω = 0, .Ω = 0, .d = 0, .n = 0 };
  }
  else
  {
    ppf = (PrimePowerFactorization){ .ω = 0, .Ω = 0, .d = 1, .n = 1 };

    for (uint i = 0; i < pairs; i++)
    {
      ppf.f[i].p = uint128_from_string(values[(i*2)+0]);
      ppf.f[i].e = (uint8)strtoumax(values[(i*2)+1], NULL, 10);

      // Validate prime value.
      if (ppf.f[i].p < 2)  // (Ideally this would actually do a primality test.)
        fatal_error("Factor %s is invalid.",
                    uint128_to_static_string(ppf.f[i].p, 0));

      // Accumulate count of unique prime factors.
      if (ppf.ω > UINT8_MAX - 1)
        fatal_error("Small-omega overflow at factor %s^%"PRIu8".",
                    uint128_to_static_string(ppf.f[i].p, 0), ppf.f[i].e);
      ppf.ω += 1;

      // Accumulate count of total prime factors.
      if (ppf.Ω > UINT8_MAX - ppf.f[i].e)
        fatal_error("Big-omega wverflow at factor %s^%"PRIu8".",
                    uint128_to_static_string(ppf.f[i].p, 0), ppf.f[i].e);
      ppf.Ω += ppf.f[i].e;

      // Accumulate total divisor count.
      if (ppf.d > UINT32_MAX / (1 + ppf.f[i].e))
        fatal_error("Divisor count overflow at factor %s^%"PRIu8".",
                    uint128_to_static_string(ppf.f[i].p, 0), ppf.f[i].e);
      ppf.d *= (1 + ppf.f[i].e);

      // Accumulate value.
      for (uint8 k = 1; k <= ppf.f[i].e; k++)
      {
        if (ppf.n > UINT128_MAX / ppf.f[i].p)
          fatal_error("Value overflow at factor %s.",
                      uint128_to_static_string(ppf.f[i].p, 0));
        ppf.n *= ppf.f[i].p;
      }
    }
  }

  return ppf;
}

//------------------------------------------------------------------------------
// Main control.  Parse command line and produce list of factors.

int main(const int argc, const char *const argv[])
{
  bool primes_major     = false;
  bool primes_ascending = false;
  bool powers_ascending = false;

  PrimePowerFactorization ppf;


  // --- Parse prime-power sort specifier (if present).

  uint value_base = 1;
  uint value_count = (uint)argc - 1;
  if ((argc > 1) && (argv[1][0] == '-'))
  {
    static const struct
    {
      char *str; bool primes_major, primes_ascending, powers_ascending;
    }
    sort_options[] =
    {
                        // Sorting criteria:
                        // ----------------------------------------
      { "ep", 0,0,0 },  // Exponents descending, primes descending
      { "Ep", 0,0,1 },  // Exponents ascending, primes descending
      { "eP", 0,1,0 },  // Exponents descending, primes ascending
      { "EP", 0,1,1 },  // Exponents ascending, primes ascending
      { "p",  1,0,0 },  // Primes descending (exponents irrelevant)
      { "P",  1,1,0 },  // Primes ascending (exponents irrelevant)
    };

    bool valid = false;
    for (uint i = 0; i < ARRAY_CAPACITY(sort_options); i++)
    {
      if (strcmp(&argv[1][1], sort_options[i].str) == 0)
      {
        primes_major     = sort_options[i].primes_major;
        primes_ascending = sort_options[i].primes_ascending;
        powers_ascending = sort_options[i].powers_ascending;
        valid = true;
        break;
      }
    }

    if (!valid)
      fatal_error("Bad sort specifier: \"%s\"", argv[1]);

    value_base += 1;
    value_count -= 1;
  }


  // --- Prime factorization from either a number or a raw prime factorization.

  if (value_count == 1)
  {
    uint128 n = uint128_from_string(argv[value_base]);
    ppf = compute_ppf(n);
  }
  else
  {
    if (value_count % 2 != 0)
      fatal_error("Odd number of arguments (%u) given.", value_count);
    uint pairs = value_count / 2;
    ppf = parse_ppf(pairs, &argv[value_base]);
  }


  // --- Sort prime factorization by either the default or the user-overridden
  //     configuration.

  sort_ppf(&ppf, primes_major, primes_ascending, powers_ascending);
  print_ppf(ppf);


  // --- Run for (as close as possible to) a fixed amount of time, tallying the
  //     elapsed CPU time.

  uint128 iterations = 0;
  double cpu_time = 0.0;
  const double cpu_time_limit = 0.10;
  uint128 memory_usage = 0;
  while (cpu_time < cpu_time_limit)
  {
    clock_t clock_start = clock();
    uint128 *factors = compute_factors(ppf);
    clock_t clock_end = clock();
    cpu_time += (double)(clock_end - clock_start) / (double)CLOCKS_PER_SEC;
    memory_usage = sizeof(*factors) * ppf.d;

    if (++iterations == 0) //1)
    {
      for (uint32 i = 0; i < ppf.d; i++)
        printf("%s\n", uint128_to_static_string(factors[i], 0));
    }

    if (factors) free(factors);
  }


  // --- Print the average amount of CPU time required for each iteration.

  uint memory_scale = (memory_usage >= 1e9)? 9:
                      (memory_usage >= 1e6)? 6:
                      (memory_usage >= 1e3)? 3:
                                             0;
  char *memory_units = (memory_scale == 9)? "GB":
                       (memory_scale == 6)? "MB":
                       (memory_scale == 3)? "KB":
                                            "B";

  printf("%s  %"PRIu32" factors  %.6f ms  %.3f ns/factor  %.3f %s\n",
         uint128_to_static_string(ppf.n, 0),
         ppf.d,
         cpu_time/iterations * 1e3,
         cpu_time/iterations * 1e9 / (double)(ppf.d? ppf.d : 1),
         (double)memory_usage / pow(10, memory_scale),
         memory_units);

  return 0;
}

答案 1 :(得分:3)

[我发布这个答案只是为了完整性的形式。我已经选择了别人的答案作为公认的答案。]

算法概述。在搜索生成内存因子列表的最快方法(在我的情况下为64位无符号值)时,我选择了一种实现两个因子的混合算法-dimensional bucket sort,它利用了排序键的内部知识(即,它们只是整数,因此可以计算)。具体方法更接近“ProxMapSort”,但有两个级别的键(主要,次要)而不是一个。 主键只是值的base-2对数。 次要密钥是在第二层存储桶中产生合理传播所需的值的最高有效位数。因素首先产生于未分类因素的临时工作阵列。接下来,分析它们的分布并分配和填充一系列桶索引。最后,使用插入排序将因子直接存储在最终排序的数组中。绝大多数桶只有1个,2个或3个值。示例在源代码中给出,该代码附在本答案的底部。

空间复杂性。内存利用率大约是基于Quicksort的解决方案的4倍,但这实际上相当微不足道,因为在最坏的情况下使用的最大内存(对于64位输入)是5.5 MB,其中4.0 MB仅保留少量几毫秒。

运行时复杂性。性能远远优于手动编码的基于Quicksort的解决方案:对于具有大量因素的数字,它的速度大约是其速度的2.5倍。在我的3.4 GHz。英特尔i7,它以0.0052秒的排序顺序生成184,420个因子18,401,055,938,125,660,800,或每个因子约96个时钟周期,或每秒约3500万个因子。

图表。内存和运行时性能分析了数量高达2⁶⁴-1的主要签名等值类的47,616个规范代表。这些是64位搜索空间中所谓的“高度可分数”。

对于非平凡因子计数,总运行时间比基于Quicksort的解决方案好2.5倍,如下面的log-log图所示:

Total Runtime

每秒产生的排序因子的数量基本上是上述的反转。在每个因素基础上的表现在大约2000个因素的最佳点之后下降,但不是很多。性能受L1,L2和L3高速缓存大小的影响,以及被分解的数量的唯一素因子的数量,其大致上升为输入值的对数。

Sorted Factors Per Second

峰值内存使用率是此对数 - 对数图中的直线,因为它与因子数的基数2对数成比例。请注意,峰值内存使用仅在很短的时间内使用;短暂的工作数组在几毫秒内被丢弃。在丢弃临时数组之后,剩下的是最终的因子列表,这与基于Quicksort的解决方案中的最小使用量相同。

Memory Usage

源代码。下面附带的是C编程语言(特别是C11)中的概念验证程序。它已经在使用Clang / LLVM的x86-64上进行了测试,尽管它也适用于GCC。

 /*==============================================================================

 DESCRIPTION

    This is a small proof-of-concept program to test the idea of "sorting"
    factors using a form of bucket sort.  The method is essentially a 2D version
    of ProxMapSort that has tuned for vast, nonlinear distributions using two
    keys (major, minor) rather than one.  The major key is simply the floor of
    the base-2 logarithm of the value, and the minor key is derived from the most
    significant bits of the value.


 INPUT

    Input is given on the command line, either as a single argument giving the
    number to be factored or an even number of arguments giving the 2-tuples that
    comprise the prime-power factorization of the desired number.  For example,
    the number

       75600 = 2^4 x 3^3 x 5^2 x 7

    can be given by the following list of arguments:

       2 4 3 3 5 2 7 1

    Note:  If a single number is given, it will require factoring to produce its
    prime-power factorization.  Since this is just a small test program, a very
    crude factoring method is used that is extremely fast for small prime factors
    but extremely slow for large prime factors.  This is actually fine, because
    the largest factor lists occur with small prime factors anyway, and it is the
    production of large factor lists at which this program aims to be proficient.
    It is simply not interesting to be fast at producing the factor list of a
    number like 17293823921105882610 = 2 x 3 x 5 x 576460797370196087, because
    it has only 32 factors.  Numbers with tens or hundreds of thousands of
    factors are much more interesting.


 OUTPUT

    Results are written to standard output.  A list of factors in ascending order
    is produced, followed by runtime (in microseconds) required to generate the
    list (not including time to print it).


 STATISTICS

    Bucket size statistics for the 47616 canonical representatives of the prime
    signature equivalence classes of 64-bit numbers:

    ==============================================================
    Bucket size     Total count of factored       Total count of
         b          numbers needing size b      buckets of size b
    --------------------------------------------------------------
         1               47616 (100.0%)         514306458  (76.2%)
         2               47427  (99.6%)         142959971  (21.2%)
         3               43956  (92.3%)          16679329   (2.5%)
         4               27998  (58.8%)            995458   (0.1%)
         5                6536  (13.7%)             33427  (<0.1%)
         6                 400   (0.8%)               729  (<0.1%)
         7                  12  (<0.1%)                18  (<0.1%)
    --------------------------------------------------------------
         ~               47616 (100.0%)         674974643 (100.0%)
    --------------------------------------------------------------

    Thus, no 64-bit number (of the input set) ever requires more than 7 buckets,
    and the larger the bucket size the less frequent it is.  This is highly
    desirable.  Note that although most numbers need at least 1 bucket of size 5,
    the vast majority of buckets (99.9%) are of size 1, 2, or 3, meaning that
    insertions are extremely efficient.  Therefore, the use of insertion sort
    for the buckets is clearly the right choice and is arguably optimal for
    performance.


 AUTHOR

    Todd Lehman
    2015/05/08

 */

 #include <inttypes.h>
 #include <limits.h>
 #include <stdbool.h>
 #include <stdlib.h>
 #include <stdio.h>
 #include <stdarg.h>
 #include <string.h>
 #include <time.h>
 #include <math.h>
 #include <assert.h>

 typedef  unsigned int  uint;
 typedef  uint8_t       uint8;
 typedef  uint16_t      uint16;
 typedef  uint32_t      uint32;
 typedef  uint64_t      uint64;

 #define  ARRAY_CAPACITY(x)  (sizeof(x) / sizeof((x)[0]))

 //-----------------------------------------------------------------------------
 // This structure is sufficient to represent the prime-power factorization of
 // all 64-bit values.  The field names ω and Ω are dervied from the standard
 // number theory functions ω(n) and Ω(n), which count the number of unique and
 // non-unique prime factors of n, respectively.  The field name d is derived
 // from the standard number theory function d(n), which counts the number of
 // divisors of n, including 1 and n.
 //
 // The maximum possible value here of ω is 15, which occurs for example at
 // n = 7378677391061896920 = 2^3 x 3^2 x 5 x 7 x 11 x 13 x 17 x 19 x 23 x 29
 // 31 x 37 x 41 x 43 x 47, which has 15 unique prime factors.
 //
 // The maximum possible value of Ω here is 63, which occurs for example at
 // n = 2^63 and n = 2^62 x 3, both of which have 63 non-unique prime factors.
 //
 // The maximum possible value of d here is 184320, which occurs at
 // n = 18401055938125660800 = 2^7 x 3^4 x 5^2 x 7^2 x 11 x 13 x 17 x 19 x 23 x
 // 29 x 31 x 37 x 41.
 //
 // Maximum possible exponents when exponents are sorted in decreasing order:
 //
 //    Index   Maximum   Bits   Example of n
 //    -----   -------   ----   --------------------------------------------
 //        0        63      6   (2)^63
 //        1        24      5   (2*3)^24
 //        2        13      4   (2*3*5)^13
 //        3         8      4   (2*3*5*7)^8
 //        4         5      3   (2*3*5*7*11)^5
 //        5         4      3   (2*3*5*7*11*13)^4
 //        6         3      2   (2*3*5*7*11*13*17)^3
 //        7         2      2   (2*3*5*7*11*13*17*19)^2
 //        8         2      2   (2*3*5*7*11*13*17*19*23)^2
 //        9         1      1   (2*3*5*7*11*13*17*19*23*29)^1
 //       10         1      1   (2*3*5*7*11*13*17*19*23*29*31)^1
 //       11         1      1   (2*3*5*7*11*13*17*19*23*29*31*37)^1
 //       12         1      1   (2*3*5*7*11*13*17*19*23*29*31*37*41)^1
 //       13         1      1   (2*3*5*7*11*13*17*19*23*29*31*37*41*43)^1
 //       14         1      1   (2*3*5*7*11*13*17*19*23*29*31*37*41*43*47)^1
 //    -----   -------   ----   --------------------------------------------
 //       15        63     37
 //
 #pragma pack(push, 8)
 typedef struct
 {
   uint8   e[16];  // Exponents.
   uint64  p[16];  // Primes in increasing order.
   uint8   ω;      // Count of prime factors without multiplicity.
   uint8   Ω;      // Count of prime factors with multiplicity.
   uint32  d;      // Count of factors of n, including 1 and n.
   uint64  n;      // Value of n on which all other fields of this struct depend.
 }
 PrimePowerFactorization;  // 176 bytes with 8-byte packing
 #pragma pack(pop)

 #define  MAX_ω  15
 #define  MAX_Ω  63

 //-----------------------------------------------------------------------------
 // Fatal error:  print error message and abort.

 void fatal_error(const char *format, ...)
 {
   va_list args;
   va_start(args, format);
   vfprintf(stderr, format, args);
   exit(1);
 }

 //-----------------------------------------------------------------------------
 // Compute 64-bit 2-adic integer inverse.

 uint64 uint64_inv(const uint64 x)
 {
   assert(x != 0);

   uint64 y = 1;
   for (uint i = 0; i < 6; i++)  // 6 = log2(log2(2**64)) = log2(64)
     y = y * (2 - (x * y));

   return y;
 }

 //------------------------------------------------------------------------------
 // Compute 2 to arbitrary power.  This is just a portable and abstract way to
 // write a left-shift operation.  Note that the use of the UINT64_C macro here
 // is actually required, because the result of 1U<<x is not guaranteed to be a
 // 64-bit result; on a 32-bit compiler, 1U<<32 is 0 or is undefined.

 static inline
 uint64 uint64_pow2(x)
 {
   return UINT64_C(1) << x;
 }

 //------------------------------------------------------------------------------
 // Deduce native word size (int, long, or long long) for 64-bit integers.
 // This is needed for abstracting certain compiler-specific intrinsic functions.

 #if UINT_MAX == 0xFFFFFFFFFFFFFFFFU
   #define UINT64_IS_U
 #elif ULONG_MAX == 0xFFFFFFFFFFFFFFFFUL
   #define UINT64_IS_UL
 #elif ULLONG_MAX == 0xFFFFFFFFFFFFFFFFULL
   #define UINT64_IS_ULL
 #else
   //error "Unable to deduce native word size of 64-bit integers."
 #endif

 //------------------------------------------------------------------------------
 // Define abstracted intrinsic function for counting leading zeros.  Note that
 // the value is well-defined for nonzero input but is compiler-specific for
 // input of zero.

 #if   defined(UINT64_IS_U) && __has_builtin(__builtin_clz)
   #define UINT64_CLZ(x) __builtin_clz(x)
 #elif defined(UINT64_IS_UL) && __has_builtin(__builtin_clzl)
   #define UINT64_CLZ(x) __builtin_clzl(x)
 #elif defined(UINT64_IS_ULL) && __has_builtin(__builtin_clzll)
   #define UINT64_CLZ(x) __builtin_clzll(x)
 #else
   #undef UINT64_CLZ
 #endif

 //------------------------------------------------------------------------------
 // Compute floor of base-2 logarithm y = log_2(x), where x > 0.  Uses fast
 // intrinsic function if available; otherwise resorts to hand-rolled method.

 static inline
 uint uint64_log2(uint64 x)
 {
   assert(x > 0);

   #if defined(UINT64_CLZ)
     return 63 - UINT64_CLZ(x);
   #else
     #define S(k) if ((x >> k) != 0) { y += k; x >>= k; }
     uint y = 0; S(32); S(16); S(8); S(4); S(2); S(1); return y;
     #undef S
   #endif
 }

 //------------------------------------------------------------------------------
 // Compute major key, given a nonzero number.  The major key is simply the
 // floor of the base-2 logarithm of the number.

 static inline
 uint major_key(const uint64 n)
 {
   assert(n > 0);
   uint k1 = uint64_log2(n);
   return k1;
 }

 //------------------------------------------------------------------------------
 // Compute minor key, given a nonzero number, its major key, k1, and the
 // bit-size b of major bucket k1.  The minor key, k2, is is computed by first
 // removing the most significant 1-bit from the number, because it adds no
 // information, and then extracting the desired number of most significant bits
 // from the remainder.  For example, given the number n=1463 and a major bucket
 // size of b=6 bits, the keys are computed as follows:
 //
 //    Step 0:  Given number              n = 0b10110110111 = 1463
 //
 //    Step 1:  Compute major key:        k1 = floor(log_2(n)) = 10
 //
 //    Step 2:  Remove high-order 1-bit:  n' = 0b0110110111 = 439
 //
 //    Step 3:  Compute minor key:        k2 = n' >> (k1 - b)
 //                                          = 0b0110110111 >> (10 - 6)
 //                                          = 0b0110110111 >> 4
 //                                          = 0b011011
 //                                          = 27

 static inline
 uint minor_key(const uint64 n, const uint k1, const uint b)
 {
   assert(n > 0); assert(k1 >= 0); assert(b > 0);
   const uint k2 = (uint)((n ^ uint64_pow2(k1)) >> (k1 - b));
   return k2;
 }

 //------------------------------------------------------------------------------
 // Raw unsorted factor.

 #pragma push(pack, 4)

 typedef struct
 {
   uint64  n;   // Value of factor.
   uint32  k1;  // Major key.
   uint32  k2;  // Minor key.
 }
 UnsortedFactor;

 #pragma pop(pack)

 //------------------------------------------------------------------------------
 // Compute sorted list of factors, given a prime-power factorization.

 static uint64 memory_usage;

 uint64 *compute_factors(const PrimePowerFactorization ppf)
 {
   memory_usage = 0;

   if (ppf.n == 0)
     return NULL;

   uint64 *sorted_factors = calloc(ppf.d, sizeof(*sorted_factors));
   if (!sorted_factors)
     fatal_error("Failed to allocate array of %"PRIu32" factors.", ppf.d);
   memory_usage += ppf.d * sizeof(*sorted_factors);

   UnsortedFactor *unsorted_factors = malloc(ppf.d * sizeof(*unsorted_factors));
   if (!unsorted_factors)
     fatal_error("Failed to allocate array of %"PRIu32" factors.", ppf.d);
   memory_usage += ppf.d * sizeof(*unsorted_factors);


   // These arrays are indexed by the major key of a number.
   uint32 major_counts[64];   // Counts of factors in major buckets.
   uint32 major_spans[64];    // Counts rounded up to power of 2.
   uint32 major_bits[64];     // Base-2 logarithm of bucket size.
   uint32 major_indexes[64];  // Indexes into minor array.
   memset(major_counts,  0, sizeof(major_counts));
   memset(major_spans,   0, sizeof(major_spans));
   memset(major_bits,    0, sizeof(major_bits));
   memset(major_indexes, 0, sizeof(major_indexes));


   // --- Step 1:  Produce unsorted list of factors from prime-power
   //     factorization.  At the same time, count groups of factors by their
   //     major keys.
   {
     // This array is for counting in the multi-radix number system dictated by
     // the exponents of the prime-power factorization.  An invariant is that
     // e[i] <= ppf.e[i] for all i (0 < i <ppf.ω).
     uint8 e[MAX_ω];
     for (uint i = 0; i < ppf.ω; i++)
       e[i] = 0;

     // Initialize inverse-prime-powers.  This array allows for division by
     // p[i]**e[i] extremely quickly in the main loop below.  Note that 2-adic
     // inverses are not defined for even numbers (of which 2 is the only prime),
     // so powers of 2 must be handled specially.
     uint64 pe_inv[MAX_ω];
     for (uint i = 0; i < ppf.ω; i++)
     {
       uint64 pe = 1; for (uint j = 1; j <= ppf.e[i]; j++) pe *= ppf.p[i];
       pe_inv[i] = uint64_inv(pe);
     }

     uint64 n = 1;  // Current factor accumulator.
     for (uint k = 0; k < ppf.d; k++)   // k indexes into unsorted_factors[].
     {
       //printf("unsorted_factors[%u] = %"PRIu64"   j = %u\n", k, n, j);
       assert(ppf.n % n == 0);
       unsorted_factors[k].n = n;

       uint k1 = major_key(n);
       assert(k1 < ARRAY_CAPACITY(major_counts));
       unsorted_factors[k].k1 = k1;
       major_counts[k1] += 1;

       // Increment the remainder of the multi-radix number e[].
       for (uint i = 0; i < ppf.ω; i++)
       {
         if (e[i] == ppf.e[i])  // Carrying is occurring.
         {
           if (ppf.p[i] == 2)
             n >>= ppf.e[i];  // Divide n by 2 ** ppf.e[i].
           else
             n *= pe_inv[i];  // Divide n by ppf.p[i] ** ppf.e[i].

           e[i] = 0;
         }
         else  // Carrying is not occurring.
         {
           n *= ppf.p[i];
           e[i] += 1;
           break;
         }
       }
     }
     assert(n == 1);  // n always cycles back to 1, not to ppf.n.

     assert(unsorted_factors[ppf.d-1].n == ppf.n);
   }


   // --- Step 2:  Define the major bits array, the major spans array, the major
   //     index array, and count the total spans.

   uint32 total_spans = 0;
   {
     uint32 k = 0;
     for (uint k1 = 0; k1 < ARRAY_CAPACITY(major_counts); k1++)
     {
       uint32 count = major_counts[k1];
       uint32 bits = (count <= 1)? count : uint64_log2(count - 1) + 1;
       major_bits[k1] = bits;
       major_spans[k1] = (count > 0)? (UINT32_C(1) << bits) : 0;
       major_indexes[k1] = k;
       k += major_spans[k1];
     }
     total_spans = k;
   }


   // --- Step 3:  Allocate and populate the minor counts array.  Note that it
   //     must be initialized to zero.

   uint32 *minor_counts = calloc(total_spans, sizeof(*minor_counts));
   if (!minor_counts)
     fatal_error("Failed to allocate array of %"PRIu32" counts.", total_spans);
   memory_usage += total_spans * sizeof(*minor_counts);

   for (uint k = 0; k < ppf.d; k++)
   {
     const uint64 n = unsorted_factors[k].n;
     const uint k1 = unsorted_factors[k].k1;
     const uint k2 = minor_key(n, k1, major_bits[k1]);
     assert(k2 < major_spans[k1]);
     unsorted_factors[k].k2 = k2;
     minor_counts[major_indexes[k1] + k2] += 1;
   }


   // --- Step 4:  Define the minor indexes array.
   //
   // NOTE:  Instead of allocating a separate array, the earlier-allocated array
   // of minor indexes is simply repurposed here using an alias.

   uint32 *minor_indexes = minor_counts;  // Alias the array for repurposing.

   {
     uint32 k = 0;
     for (uint i = 0; i < total_spans; i++)
     {
       uint32 count = minor_counts[i];  // This array is the same array...
       minor_indexes[i] = k;            // ...as this array.
       k += count;
     }
   }


   // --- Step 5:  Populate the sorted factors array.  Note that the array must
   //              be initialized to zero earlier because values of zero are used
   //              as sentinels in the bucket lists.

   for (uint32 i = 0; i < ppf.d; i++)
   {
     uint64 n = unsorted_factors[i].n;
     const uint k1 = unsorted_factors[i].k1;
     const uint k2 = unsorted_factors[i].k2;

     // Insert factor into bucket using insertion sort (which happens to be
     // extremely fast because we know the bucket sizes are always very small).
     uint32 k;
     for (k = minor_indexes[major_indexes[k1] + k2];
          sorted_factors[k] != 0;
          k++)
     {
       assert(k < ppf.d);
       if (sorted_factors[k] > n)
         { uint64 t = sorted_factors[k]; sorted_factors[k] = n; n = t; }
     }
     sorted_factors[k] = n;
   }


   // --- Step 6:  Validate array of sorted factors.
   {
     for (uint32 k = 1; k < ppf.d; k++)
     {
       if (sorted_factors[k] == 0)
         fatal_error("Produced a factor of 0 at index %"PRIu32".", k);

       if (ppf.n % sorted_factors[k] != 0)
         fatal_error("Produced non-factor %"PRIu64" at index %"PRIu32".",
                     sorted_factors[k], k);

       if (sorted_factors[k-1] == sorted_factors[k])
         fatal_error("Duplicate factor %"PRIu64" at index %"PRIu32".",
                     sorted_factors[k], k);

       if (sorted_factors[k-1] > sorted_factors[k])
         fatal_error("Out-of-order factors %"PRIu64" and %"PRIu64" "
                     "at indexes %"PRIu32" and %"PRIu32".",
                     sorted_factors[k-1], sorted_factors[k], k-1, k);
     }
   }


   free(minor_counts);
   free(unsorted_factors);

   return sorted_factors;
 }

 //------------------------------------------------------------------------------
 // Compute prime-power factorization of a 64-bit value.  Note that this function
 // is designed to be fast *only* for numbers with very simple factorizations,
 // e.g., those that produce large factor lists.  Do not attempt to factor
 // large semiprimes with this function.  (The author does know how to factor
 // large numbers efficiently; however, efficient factorization is beyond the
 // scope of this small test program.)

 PrimePowerFactorization compute_ppf(const uint64 n)
 {
   PrimePowerFactorization ppf;

   if (n == 0)
   {
     ppf = (PrimePowerFactorization){ .ω = 0, .Ω = 0, .d = 0, .n = 0 };
   }
   else if (n == 1)
   {
     ppf = (PrimePowerFactorization){ .p = { 1 }, .e = { 1 },
                                      .ω = 1, .Ω = 1, .d = 1, .n = 1 };
   }
   else
   {
     ppf = (PrimePowerFactorization){ .ω = 0, .Ω = 0, .d = 1, .n = n };

     uint64 m = n;
     for (uint64 p = 2; p * p <= m; p += 1 + (p > 2))
     {
       if (m % p == 0)
       {
         assert(ppf.ω <= MAX_ω);
         ppf.p[ppf.ω] = p;
         ppf.e[ppf.ω] = 0;
         while (m % p == 0)
           { m /= p; ppf.e[ppf.ω] += 1; }
         ppf.d *= (1 + ppf.e[ppf.ω]);
         ppf.Ω += ppf.e[ppf.ω];
         ppf.ω += 1;
       }
     }
     if (m > 1)
     {
       assert(ppf.ω <= MAX_ω);
       ppf.p[ppf.ω] = m;
       ppf.e[ppf.ω] = 1;
       ppf.d *= 2;
       ppf.Ω += 1;
       ppf.ω += 1;
     }
   }

   return ppf;
 }

 //------------------------------------------------------------------------------
 // Parse prime-power factorization from a list of ASCII-encoded base-10 strings.
 // The values are assumed to be 2-tuples (p,e) of prime p and exponent e.
 // Primes must not exceed 2^64 - 1.  Exponents must not exceed 2^8 - 1.  The
 // constructed value must not exceed 2^64 - 1.

 PrimePowerFactorization parse_ppf(const uint pairs, const char *const values[])
 {
   assert(pairs <= MAX_ω);

   PrimePowerFactorization ppf;

   if (pairs == 0)
   {
     ppf = (PrimePowerFactorization){ .ω = 0, .Ω = 0, .d = 0, .n = 0 };
   }
   else
   {
     ppf = (PrimePowerFactorization){ .ω = 0, .Ω = 0, .d = 1, .n = 1 };

     for (uint i = 0; i < pairs; i++)
     {
       ppf.p[i] = (uint64)strtoumax(values[(i*2)+0], NULL, 10);
       ppf.e[i] =  (uint8)strtoumax(values[(i*2)+1], NULL, 10);

       // Validate prime value.
       if (ppf.p[i] < 2)  // (Ideally this would actually do a primality test.)
         fatal_error("Factor %"PRIu64" is invalid.", ppf.p[i]);

       // Accumulate count of unique prime factors.
       if (ppf.ω > UINT8_MAX - 1)
         fatal_error("Small-omega overflow at factor %"PRIu64"^%"PRIu8".",
                     ppf.p[i], ppf.e[i]);
       ppf.ω += 1;

       // Accumulate count of total prime factors.
       if (ppf.Ω > UINT8_MAX - ppf.e[i])
         fatal_error("Big-omega wverflow at factor %"PRIu64"^%"PRIu8".",
                     ppf.p[i], ppf.e[i]);
       ppf.Ω += ppf.e[i];

       // Accumulate total divisor count.
       if (ppf.d > UINT32_MAX / (1 + ppf.e[i]))
         fatal_error("Divisor count overflow at factor %"PRIu64"^%"PRIu8".",
                     ppf.p[i], ppf.e[i]);
       ppf.d *= (1 + ppf.e[i]);

       // Accumulate value.
       for (uint8 k = 1; k <= ppf.e[i]; k++)
       {
         if (ppf.n > UINT64_MAX / ppf.p[i])
           fatal_error("Value overflow at factor %"PRIu64".", ppf.p[i]);
         ppf.n *= ppf.p[i];
       }
     }
   }

   return ppf;
 }

 //------------------------------------------------------------------------------
 // Main control.  Parse command line and produce list of factors.

 int main(const int argc, const char *const argv[])
 {
   PrimePowerFactorization ppf;

   uint values = (uint)argc - 1;  // argc is always guaranteed to be at least 1.

   if (values == 1)
   {
     ppf = compute_ppf((uint64)strtoumax(argv[1], NULL, 10));
   }
   else
   {
     if (values % 2 != 0)
       fatal_error("Odd number of arguments (%u) given.", values);
     uint pairs = values / 2;
     ppf = parse_ppf(pairs, &argv[1]);
   }

   // Run for (as close as possible to) a fixed amount of time, tallying the
   // elapsed CPU time.
   uint64 iterations = 0;
   double cpu_time = 0.0;
   const double cpu_time_limit = 0.05;
   while (cpu_time < cpu_time_limit)
   {
     clock_t clock_start = clock();
     uint64 *factors = compute_factors(ppf);
     clock_t clock_end = clock();
     cpu_time += (double)(clock_end - clock_start) / (double)CLOCKS_PER_SEC;

     if (++iterations == 1)
     {
       for (uint32 i = 0; i < ppf.d; i++)
         printf("%"PRIu64"\n", factors[i]);
     }

     if (factors) free(factors);
   }

   // Print the average amount of CPU time required for each iteration.
   uint mem_scale = (memory_usage >= 1e9)? 9:
                    (memory_usage >= 1e6)? 6:
                    (memory_usage >= 1e3)? 3:
                                           0;
   char *mem_units = (mem_scale == 9)? "GB":
                     (mem_scale == 6)? "MB":
                     (mem_scale == 3)? "KB":
                                        "B";

   printf("%"PRIu64"  %"PRIu32" factors  %.6f ms  %.3f ns/factor  %.3f %s\n",
          ppf.n,
          ppf.d,
          cpu_time/iterations * 1e3,
          cpu_time/iterations * 1e9 / (double)(ppf.d? ppf.d : 1),
          (double)memory_usage / pow(10, mem_scale),
          mem_units);

   return 0;
 }