提高功能表现

时间:2014-07-23 17:24:13

标签: ruby performance haskell

我正在整理一个小程序,检查Brocard's Problem或所谓布朗数字的解决方案,我首先在ruby中创建草稿:

class Integer
  def factorial
    f = 1; for i in 1..self; f *= i; end; f
  end
end

boundary = 1000
m = 0

# Brown Numbers - pair of integers (m,n) where n factorial is equal with square root of m

while m <= boundary

    n = 0

    while n <= boundary
        puts "(#{m},#{n})" if ((n.factorial + 1) == (m ** 2)) 
        n += 1
    end

    m += 1
end

但是我发现Haskell更适合进行数学运算,因此我之前已经问过question并且我很快得到了关于如何将我的ruby代码转换为Haskell的答案:

results :: [(Integer, Integer)] --Use instead of `Int` to fix overflow issue
results =  [(x,y) | x <- [1..1000], y <- [1..1000] , 1 + fac x == y*y]
    where fac n = product [1..n]

我稍微改变了一下,所以我可以从我想要的任何数字运行相同的操作,因为以上操作将从11000或任何硬编码的数字,但我想能够决定它应该经过的间隔,ergo:

pairs :: (Integer, Integer) -> [(Integer, Integer)]
pairs (lower, upper) =  [(m, n) | m <- [lower..upper], n <- [lower..upper], 1 + factorial n == m*m] where factorial n = product [1..n]

如果可能的话,我想要一些关于优化的示例或指针来提高操作的速度,因为此时如果我运行此操作的时间间隔为[100..10000]则需要很长时间(我停止了)它在45分钟后。)

PS 性能优化将应用于计算的Haskell实现(pairs函数),而不是ruby,以防有些人可能想知道我正在讨论哪个函数

2 个答案:

答案 0 :(得分:2)

那么,你会如何加速ruby实现呢?即使他们使用不同的语言,也可以应用类似的优化,即memoization和更智能的算法。

1。记忆化

记忆可以防止你反复计算阶乘。

这是您的配对版本:

pairs :: (Integer, Integer) -> [(Integer, Integer)]
pairs (lower, upper) =  [(m, n) | m <- [lower..upper], n <- [lower..upper], 1 + factorial n == m*m]
    where factorial n = product [1..n]

支持阶乘的频率是多少?好吧,我们可以说它被调用至少upper - lower次,虽然可能是我们不记得之前调用的值。在这种情况下,我们需要(upper - lower)²调用阶乘。尽管因子计算相当简单,但它并不是免费的。

如果我们生成一个无数的阶乘列表并简单地选择正确的阶乘代码怎么办?

pairsMem :: (Integer, Integer) -> [(Integer, Integer)]
pairsMem (lower, upper) =  [(m, n) | m <- [lower..upper], n <- [lower..upper], 1 + factorial n == m*m]
    where factorial  = (factorials!!) . fromInteger
          factorials = scanl (*) 1 [1..]

现在factorials是列表[1,1,2,6,24,…],而factorial只是查找相应的值。两个版本如何比较?

您的版本

main = print $ pairs (0,1000)
> ghc --make SO.hs -O2 -rtsopts > /dev/null
> ./SO.hs +RTS -s
[(5,4),(11,5),(71,7)]
 204,022,149,768 bytes allocated in the heap
     220,119,948 bytes copied during GC
          41,860 bytes maximum residency (2 sample(s))
          20,308 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0     414079 colls,     0 par    2.39s    2.23s     0.0000s    0.0001s
  Gen  1         2 colls,     0 par    0.00s    0.00s     0.0001s    0.0001s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time   67.33s  ( 67.70s elapsed)
  GC      time    2.39s  (  2.23s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time   69.72s  ( 69.93s elapsed)

  %GC     time       3.4%  (3.2% elapsed)

  Alloc rate    3,030,266,322 bytes per MUT second

  Productivity  96.6% of total user, 96.3% of total elapsed

大约68秒。

pairsMem

main = print $ pairsMem (0,1000)
> ghc --make -O2 -rtsopts SO.hs > /dev/null
> ./SO.hs +RTS -s
[(5,4),(11,5),(71,7)]
     551,558,988 bytes allocated in the heap
         644,420 bytes copied during GC
         231,120 bytes maximum residency (2 sample(s))
          71,504 bytes maximum slop
               2 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0      1159 colls,     0 par    0.00s    0.01s     0.0000s    0.0001s
  Gen  1         2 colls,     0 par    0.00s    0.00s     0.0001s    0.0002s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    2.17s  (  2.18s elapsed)
  GC      time    0.00s  (  0.01s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    2.17s  (  2.18s elapsed)

  %GC     time       0.0%  (0.3% elapsed)

  Alloc rate    253,955,217 bytes per MUT second

  Productivity 100.0% of total user, 99.5% of total elapsed

大约两秒钟或仅是原始时间的3%。对于一个几乎微不足道的变化来说也不错。但是,正如您所看到的,我们使用了两倍的内存。毕竟,我们会在列表中保存阶乘。但是,分配的字节总数是非记忆变体的0.27%,因为我们不需要重新生成product

pairsMem (100,10000)

大数字怎么样?你说(100,1000)你在45分钟后停了下来。记忆版的速度有多快?

main = print $ pairsMem (100,10000)
> ghc --make -O2 -rtsopts SO.hs > /dev/null
> ./SO.hs +RTS -s
… 20 minutes later Ctrl+C…

那还需要太长时间。我们还能做些什么?

2。更智能的配对

让我们回到绘图板。您正在检查(下部,上部)中的所有对(n,m)。这合理吗?

实际上,不,因为阶乘的增长速度非常快。因此,对于任何自然数,f(m)f(m)! <= m的最大自然数。现在,对于任何m,我们只需要检查f(m)第一个因子 - 所有其他因素都会更大。

仅供记录,f(10^100)为70。

现在战略很明确:我们根据需要生成尽可能多的阶乘,只需检查m * m - 1是否在阶乘列表中:

import Data.Maybe (isJust)
import Data.List (elemIndex)

pairsList :: (Integer, Integer) -> [(Integer, Integer)]
pairsList (lower, upper) = [(m, fromIntegral ret) 
                           | m <- [lower..upper], 
                             let l = elemIndex (m*m - 1) fs,
                             isJust l,
                             let Just ret = l
                           ]
    where fs = takeWhile (<upper*upper) $ scanl (*) 1 [1..]

这个版本对pairsMemLim的影响有多好?

main = print $ pairsList (1, 10^8)
> ghc --make -O2 -rtsopts SO.hs > /dev/null
> ./SO +RTS -s
[(5,4),(11,5),(71,7)]
  21,193,518,276 bytes allocated in the heap
       2,372,136 bytes copied during GC
          58,672 bytes maximum residency (2 sample(s))
          19,580 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0     40823 colls,     0 par    0.06s    0.11s     0.0000s    0.0000s
  Gen  1         2 colls,     0 par    0.00s    0.00s     0.0001s    0.0001s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time   38.17s  ( 38.15s elapsed)
  GC      time    0.06s  (  0.11s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time   38.23s  ( 38.26s elapsed)

  %GC     time       0.2%  (0.3% elapsed)

  Alloc rate    555,212,922 bytes per MUT second

  Productivity  99.8% of total user, 99.8% of total elapsed
好吧,直到40多岁。但是,如果我们使用提供更有效查找的数据结构呢?

3。使用正确的数据结构

由于我们需要高效查询,因此我们将使用Set。该函数几乎保持不变,但fs将为Set Integer,查询通过lookupIndex完成:

import Data.Maybe (isJust)
import qualified Data.Set as S

pairsSet :: (Integer, Integer) -> [(Integer, Integer)]
pairsSet (lower, upper) = [(m, 1 + fromIntegral ret) 
                          | m <- [lower..upper], 
                            let l = S.lookupIndex (m*m - 1) fs,
                            isJust l,
                            let Just ret = l
                          ]
    where fs = S.fromList . takeWhile (<upper*upper) $ scanl (*) 1 [1..]

这里是pairsSet的表现:

 > ./SO +RTS -s
[(5,4),(11,5),(71,7)]
  18,393,520,096 bytes allocated in the heap
       2,069,872 bytes copied during GC
          58,752 bytes maximum residency (2 sample(s))
          19,580 bytes maximum slop
               1 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0     35630 colls,     0 par    0.06s    0.08s     0.0000s    0.0001s
  Gen  1         2 colls,     0 par    0.00s    0.00s     0.0001s    0.0001s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time   18.52s  ( 18.52s elapsed)
  GC      time    0.06s  (  0.08s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time   18.58s  ( 18.60s elapsed)

  %GC     time       0.3%  (0.4% elapsed)

  Alloc rate    993,405,304 bytes per MUT second

  Productivity  99.7% of total user, 99.5% of total elapsed

我们的优化之旅到此结束。顺便说一句,我们已将复杂度从(n³)降低到(n log n),因为我们的数据结构为我们提供了对数搜索。

答案 1 :(得分:0)

从你的代码中,似乎记忆可以用来加速factorial的计算。

对于每个m,代码需要计算每个n的阶乘,我认为这是不必要的。