Question

在另一个StackExchange中，以下算法（基于算术编码）被提出作为生成7面骰子结果的有效方法，当给出的所有内容都是6面骰子时：< / p>

int rand7()
{
  static double a=0, width=7;  // persistent state

  while ((int)(a+width) != (int)a)
  {
    width /= 6;
    a += (rand6()-1)*width;
  }

  int n = (int)a;
  a -= n; 
  a *= 7; width *= 7;
  return (n+1);
}

不是真正的数学家，我会尽力解释这个算法的工作原理：

在每次调用rand7()时，width的比例为7 ^s / 6 ^t和a是一个非负值，其a + width属性位于基本情况之后的区间[0,7]中。输入while循环时，width是可以添加到a的最大值。如果floor(a + width)与floor(a)不同，则随机选择{0，width * 1/6，width * 1/3，width * 1 / 2，width * 2/3，width * 5/6}被添加到a，而指数t增加1（减少{的值{1}}以6的力量。请注意，在迭代之后，width位于区间[0,7]中的属性保持不变。当a + width小于width的差值时，迭代停止。循环将更多的熵添加到ceil(a) - a中，只要这样做实际上可以影响掷骰的结果，并且直观地说，这是使用base构建[0,7]范围内的随机实数。 6 。离开循环后，模具辊被取为a，floor(a) + 1被减少到其小数部分。此时a位于区间[0,1]中。为了准备下一个电话并保持不变属性，a + width和a的比例扩大了7倍（对于width，这会增加指数width 1）。

以上解释了归纳步骤的工作原理。基础案例的分析留给感兴趣的读者练习。

当然，从效率的角度来看，浮点运算的使用会立即突然出现性能拖累（假设s的性能已经足够并且本身无法改进）。在保持这种算术编码算法的同时，删除浮点使用的最佳方法是什么？

Answer 1

跟进我做的评论，这里是算法的定点版本。它使用无符号的4.60（也就是说，数字的小数部分有60位），这比你从double获得的位数多一点：

int rand7fixed() {
    static uint64_t a = 0;
    static uint64_t width = 7UL<<60;
    static const uint64_t intmask = 0xfUL<<60;

    while (((a+width)&intmask) != (a&intmask)) {
      width /= 6;
      a += (rand6()-1)*width;
    }

    int n = a >> 60;
    a &=~intmask;
    a *= 7;
    width *= 7;
    return n+1;
}

以上结果比OP中的双版本快约三分之一（参见下面的基准测试结果和注释）。我怀疑时间浪费不是浮点运算，而是从double到int的转换。

正如R ..指出的那样，这并没有解决偏见;它只是减少它。一个简单且合理快速的无偏算法将重复生成两个rand6()值h和l，直到其中至少有一个为非零，然后返回h*6+l % 7：

int rand7_2() {
  int hi, lo;
  do {
    hi = rand6() - 1;
    lo = rand6() - 1;
  } while (hi + lo);
  return (hi * 6 + lo) % 7 + 1;
}

如果您觉得需要减少调用rand6的次数，您可以使用6 ¹²仅稍微超过7 ¹¹的事实一次生成11个7模辊。为了消除偏差，仍然需要丢弃一些12个6辊组;丢弃集的频率为(6¹²−7¹¹)/6¹²)，或大约为11的1，因此平均而言，每7卷需要大约1.19个6卷。你可以通过使用25个6卷来生成23个7卷（每7卷有1.13个6卷）做得更好，但这并不适合64位算术，所以调用{{通过以128位进行计算会使1}}被咀嚼。

这是11/12解决方案：

rand6

理论上你应该能够将比率降低到int rand7_12() { static int avail = 0; static uint32_t state = 0; static const uint32_t discard = 7*7*7*7*7*7*7*7*7*7*7; // 7 ** 11 static const int out_per_round = 11; static const int in_per_round = 12; if (!avail) { do { state = rand6() - 1; for (int needed = in_per_round - 1; needed; --needed) state = state * 6 + rand6() - 1; } while (state >= discard); avail = out_per_round; } int rv = state % 7; state /= 7; --avail; return rv + 1; }，即约1.086。例如，你可以通过从972个6卷生成895个7卷，在1600个集中丢弃大约一个，平均为1.087 6卷/ 7卷，但是你需要2513位算术来保持国家。

我用非常精确的基准测试了所有四个函数，调用rand7 700,000,000次，然后打印结果的直方图。结果：

log₇6

上面的基础rand6（）实现是使用User time with Algorithm User time rand6() calls cycling rand6() ---------- --------- ------------- --------------- double 32.6 secs 760223193 13.2 secs fixed 29.4 secs 760223194 7.9 secs 2 for 1 40.2 secs 1440004276 12 for 11 23.7 secs 840670008（64位Mersenne Twister）作为PRNG的Gnu标准c ++库uniform_int_distribution<>(1,6)。为了更好地处理标准库中花费的时间，我还使用简单的循环计数器作为伪随机数生成器运行测试;剩余的13.2秒和7.9秒代表（大致）在算法本身花费的时间，从中我们可以说定点算法的速度提高了大约40％。（很难在组合算法上获得良好的读数，因为固定序列使得分支预测更容易并且减少了rand6调用的次数，但两者都花了不到5秒。）

最后，如果有人想要检查偏差的话，直方图（还包括mt19937_64的运行以供参考）：

std::uniform_int_distribution(1,7)

Answer 2

[编辑]

需要改进远远低于该方法，但以下是一个简单的无偏方法。这是低效的，因为它至少两次调用rand6()。（假设rand6()是公正的）

int rand7simple(void) {
  int product;
  do {
    int a = rand6() - 1;
    int b = rand6() - 1;
    product = a*6 + b;  // produce unbiased distributed numbers 0 .. 35
  } while (product >= 35);  // Redo 1 in 36 times
  // produce unbiased distributed numbers 0 .. 34
  return product%7 + 1;
}

对于rand7()的每6次调用，rand6()应调用7次。初始化一个宽整数状态以最小化偏差。

稍后需要测试。 GTG

int rand7(void) {
  static int count = -1;
  static unsigned long long state = 0;
  if (count < 0) {
    count = 0;
    for (int i=0; i<25; i++) {
      state *= 6;
      state += rand6();
    }
  int retval = state % 7;
  state /= 7;

  int i = (count >= 6) + 1; 
  if (++count > 6) count = 0;
  while (i-- > 0) {
    state *= 6;
    state += rand6();
  }
  return retval;
}

通过6面模具实现提高7面模具辊模拟的性能

2 个答案: