Question

我正在使用Ruby解决一些Project Euler问题，特别是在这里我说的是问题25（Fibonacci序列中包含1000位数的第一项的索引是什么？）。

起初，我使用的是Ruby 2.2.3，我将问题编码为：

number = 3
a = 1
b = 2

while b.to_s.length < 1000
  a, b = b, a + b
  number += 1
end
puts number

但后来我发现版本2.4.2有一个名为digits的方法，这正是我所需要的。我转换为代码：

while b.digits.length < 1000

当我比较这两种方法时，digits要慢得多。

时间

./025/problem025.rb 0.13s user 0.02s system 80% cpu 0.190 total

./025/problem025.rb 2.19s user 0.03s system 97% cpu 2.275 total

有谁知道为什么？

Answer 1

Ruby＆＃39; digits

...是implemented in rb_int_digits。
对于非小数字（即大部分数字）uses rb_int_digits_bigbase。
数字naively with division/modulo by base之后的数字提取。
所以它应该采用二次时间（至少有一个小基数，比如10）。

Ruby＆＃39; to_s

...是implemented in int_to_s。
uses rb_int2str。
对于非小数字uses rb_big2str。
uses rb_big2str1。
哪个might use big2str_gmp if available（sounds/looks like使用快GMP library）或......
... uses big2str_generic。
哪个uses big2str_karatsuba（甜蜜，我认出那个名字！）。
与looks like有什么关系......
... Karatsuba's algorithm，这是一种快速乘法算法。如果您将两个n位数字乘以您在学校学到的天真的方式，那么您需要使用n ²个位数的产品。另一方面，Karatsuba只需要 n ^1.585 ，这要好得多。我没有进一步阅读，但我怀疑Ruby在这里做的也很有效。具有基本转换算法的Eric Lippert's answer使用Karatsuba乘法并且说＆＃34;这种[基本转换]算法完全由乘法成本决定＆＃34; 。

将二次与n ^1.585的比较从1位到1000位的数字长度给出因子15：

(1..1000).sum { |i| i**2 } / (1..1000).sum { |i| i**1.585 }
=> 15.150583254950678

这也是您观察到的因素。当然，这是一个相当幼稚的比较，但是，为什么不呢。

顺便说一下，GMP显然使用/使用了"near O(n * log(n)) FFT-based multiplication algorithm"。

感谢@ Drenmi的answer，感谢我激励我深入挖掘源头。我希望我做得对，没有保证，我是Ruby的初学者。但那就是为什么我把所有链接留给你自己检查的原因:-P

Answer 2

Integer#digits不只是“分裂”数字。来自文档：

返回包含由place-value提取的数字的数组基数为int的符号。

即使省略base参数，也会完成此提取。相关来源：

# ruby/numeric.c:4809

while (!FIXNUM_P(num) || FIX2LONG(num) > 0) {
    VALUE qr = rb_int_divmod(num, base);
    rb_ary_push(digits, RARRAY_AREF(qr, 1));
    num = RARRAY_AREF(qr, 0);
}

正如您所看到的，此过程包括重复的模数算术，这可能会影响额外的运行时间。

Answer 3

许多ruby方法创建对象（strins，数组等）在红宝石中，红宝石中的对象创建是“昂贵的”。

例如to_s创建一个字符串，digits每次评估while条件时都会创建一个数组。

如果要优化示例，可以执行以下操作：

# create the smallest possible 1000 digits number
max = 10**999

number = 3
a = 1
b = 2

# do not create objects in while condition
while b < max
  a, b = b, a + b
  number += 1
end
puts number

Answer 4

我没有回答你的问题，但希望针对你所解决的问题提出一个改进的算法。对于给定的十进制数字n，我已经实现了以下算法。

估计具有f或更少十进制数字的斐波纳契数（“FN”）的数量n。
计算f ^th和（f-1）^st FNs，以及f ^{th m > FN。}
如果m >= n从（f-1）^st FN向下退回，直到（f-1）^st FN少于{{ 1}}十进制数字，此时f ^th FN是具有n十进制数字的最小FN。
如果n增加f ^th FN，直到它有m < n个十进制数字，此时它是具有n十进制数字的最小FN

关键是在第一步计算近似估计值n。

<强>代码

<强>基准

在计算每个Fibonacci数时，通常会执行两个操作：

计算最后计算的斐波纳契数中的位数，如果该数字等于目标位数，则终止（由于下面说明部分中明确说明的原因，它不能是大于目标数量）;其他
计算Fibonacci序列中的下一个数字。

相比之下，我提出的方法执行第一步的次数相对较少。

相对于第二步的第一步有多重要？第一步中AVG_FNs_PER_DIGIT = 4.784971966781667 def first_fibonacci_with_n_digits(n) return [1, 1] if n == 1 idx = (n * AVG_FNs_PER_DIGIT).round fn, prev_fn = fib(idx) fn.to_s.size >= n ? fib_down(n, fn, prev_fn, idx) : fib_up(n, fn, prev_fn, idx) end def fib(idx) a = 1 b = 2 (idx - 2).times {a, b = b, a + b } [b, a] end def fib_up(n, b, a, idx) loop do a, b = b, a + b idx += 1 break [idx, b] if b.to_s.size == n end end def fib_down(n, b, a, idx) loop do a, b = b - a, a break [idx, b] if a.to_s.size == n - 1 idx -= 1 end end的使用与n.digits.size的使用情况相比如何？让我们运行一些基准来找出答案。

n.to_s.size

def use_to_s(ndigits)
  case ndigits
  when 1
    [1, 1]
  else
    a = 1
    b = 2
    idx = 3
    loop do
      break [idx, b] if b.to_s.length == ndigits
      a, b = b, a + b
      idx += 1
    end
  end
end

def use_digits(ndigits)
  case ndigits
  when 1
    [1, 1]
  else
    a = 1
    b = 2
    idx = 3
    loop do
      break [idx, b] if b.digits.size == ndigits
      a, b = b, a + b
      idx += 1
    end
  end
end

require 'fruity'

def test(ndigits)
  nfibs, last_fib = use_to_s(ndigits)
  puts "\nndigits = #{ndigits}, nfibs=#{nfibs}, last_fib=#{last_fib}"
  compare do
    try_use_to_s   { use_to_s(ndigits) }
    try_use_digits { use_digits(ndigits) }
    try_estimate   { first_fibonacci_with_n_digits(ndigits) }
  end
end

test 20
ndigits = 20, nfibs=93, last_fib=12200160415121876738
Running each test 128 times. Test will take about 1 second.
try_estimate is faster than try_use_to_s by 2x ± 0.1
try_use_to_s is faster than try_use_digits by 80.0% ± 10.0%

test 100
ndigits = 100, nfibs=476, last_fib=13447...37757 (90 digits omitted)
Running each test 16 times. Test will take about 4 seconds.
try_estimate is faster than try_use_to_s by 5x ± 0.1
try_use_to_s is faster than try_use_digits by 10x ± 1.0

test 500
ndigits = 500, nfibs=2390, last_fib=13519...63145 (490 digits omitted)
Running each test 2 times. Test will take about 27 seconds.
try_estimate is faster than try_use_to_s by 9x ± 0.1
try_use_to_s is faster than try_use_digits by 60x ± 1.0

这些结果有两个主要内容：

“try_estimate”是最快的，因为它执行第一步的次数相对较少;和
使用test 1000 ndigits = 1000, nfibs=4782, last_fib=10700...27816 (990 digits omitted) Running each test once. Test will take about 1 minute. try_estimate is faster than try_use_to_s by 12x ± 10.0 try_use_to_s is faster than try_use_digits by 120x ± 100.0的速度比to_s快得多。

除了第一个观察结果外，注意到与实际指数相比，具有给定位数的第一个FN的索引的初始估计如下：

代表20位数：96 est。对比93实际
for 100 digits：479 est。vs 476 actual
for 500 digits：2392 est。vs 2390 actual
for 1000 digits：4785 est。vs 4782 actual

偏差最多为3，意味着必须计算最多3个FN的数字位数才能获得所需的结果。

<强>解释

上面 Code 部分给出的方法的唯一解释是常量digits的推导，它用于计算第一个具有的FN的索引的估计值。指定的位数。

此常量的推导源自here给出的问题和选定答案。（Wiki for Fibonacci numbers提供了对FN数学属性的很好的概述。）

众所周知，前7个FN（包括零）有一位数;此后，FN每4或5个FN获得一个额外的数字（即，有时为4，否则为5）。因此，作为一个非常粗略的计算，我们看到要使用AVG_FNs_PER_DIGIT数字n来计算第一个FN，它将不会小于n >= 2 FN。对于4*n，那将是4,000。（实际上，4,782nd是最小的有1000位数。）换句话说，我们不需要计算前4,000个FN中的位数。但是，我们可以改进这一估计。

当n = 1000接近无穷大时，包含5个FN的范围n（10**n...10**(n+1) - 数字间隔）与包含4个FN的范围之比可以按如下方式计算：

其中LOG_10 = Math.log(10) #=> 2.302585092994046 GR = (1 + Math.sqrt(5))/2 #=> 1.618033988749895 LOG_GR = Math.log(GR) #=> 0.48121182505960347 RATIO_5to4 = (LOG_10 - 4*LOG_GR)/(5*LOG_GR - LOG_10) #=> 3.6505564183095474是Golden Ratio。

在大量的n位数间隔中，n ₄是包含4个FN的那些区间的数量，n ₅是包含5个FN的数字。因此，每个区间的平均FN数是（n ₄ * 4 + n ₅ * 5）/（n ₄ + n _{5 ）。由于n ₅ / n ₄收敛到GR，n ₅接近RATIO_5to4 * n _{4 < / sub>在限制内（丢弃舍入错误）。如果我们替换n ₅，并让}}

RATIO_5to4

我们发现每n位数间隔的平均FN数收敛为

b = 1/(1 + RATIO_5to4)
  #=> 0.21502803321833364

如果avg = b * 4 + (1-b) *5 #=> 4.784971966781667是第一个拥有fn个十进制数字的FN，那么序列中包含n的FN数量可以近似为

fn

例如，如果第一个FN的索引的估计值为1000个十进制数，则为n * avg。

Ruby的数字方法性能

4 个答案: