Question

我正在学习数学课程，我们必须做一些整数分解作为问题的中间步骤。我决定编写一个Python程序来为我做这个（我们没有测试我们的因素能力，所以这完全在板上）。该计划如下：

#!/usr/bin/env python3

import math
import sys

# Return a list representing the prime factorization of n. The factorization is
#   found using trial division (highly inefficient).
def factorize(n):

    def factorize_helper(n, min_poss_factor):
        if n <= 1:
            return []
        prime_factors = []
        smallest_prime_factor = -1
        for i in range(min_poss_factor, math.ceil(math.sqrt(n)) + 1):
            if n % i == 0:
                smallest_prime_factor = i
                break
        if smallest_prime_factor != -1:
            return [smallest_prime_factor] \
                   + factorize_helper(n // smallest_prime_factor,
                                      smallest_prime_factor)
        else:
            return [n]

    if n < 0:
        print("Usage: " + sys.argv[0] + " n   # where n >= 0")
        return []
    elif n == 0 or n == 1:
        return [n]
    else:
        return factorize_helper(n, 2)

if __name__ == "__main__":
    factorization = factorize(int(sys.argv[1]))
    if len(factorization) > 0:
        print(factorization)

我一直在教自己一些Haskell，所以我决定尝试在Haskell中重写程序。该计划如下：

import System.Environment

-- Return a list containing all factors of n at least x.
factorize' :: (Integral a) => a -> a -> [a]
factorize' n x = smallestFactor
                 : (if smallestFactor == n
                    then []
                    else factorize' (n `quot` smallestFactor) smallestFactor)
    where
        smallestFactor = getSmallestFactor n x
        getSmallestFactor :: (Integral a) => a -> a -> a
        getSmallestFactor n x
            | n `rem` x == 0                          = x
            | x > (ceiling . sqrt . fromIntegral $ n) = n
            | otherwise                               = getSmallestFactor n (x+1)

-- Return a list representing the prime factorization of n.
factorize :: (Integral a) => a -> [a]
factorize n = factorize' n 2

main = do
    argv <- getArgs
    let n = read (argv !! 0) :: Int
    let factorization = factorize n
    putStrLn $ show (factorization)
    return ()

（注意：这需要64位环境。在32位上，导入Data.Int并使用Int64作为read (argv !! 0)上的类型注释

在我写完之后，我决定比较两者的性能，认识到有更好的算法，但两个程序使用的算法基本相同。例如，我做了以下几点：

$ ghc --make -O2 factorize.hs
$ /usr/bin/time -f "%Uu %Ss %E" ./factorize 89273487253497
[3,723721,41117819]
0.18u 0.00s 0:00.23

然后，计划Python程序：

$ /usr/bin/time -f "%Uu %Ss %E" ./factorize.py 89273487253497
[3, 723721, 41117819]
0.09u 0.00s 0:00.09

当然，每次运行其中一个程序时，时间会略有不同，但它们总是在这个范围内，Python程序比编译的Haskell程序快几倍。在我看来，Haskell版本应该能够更快地运行，我希望你能给我一个如何改进它的想法，以便就是这种情况。

我已经看到了一些关于优化Haskell程序的技巧，如this question的答案，但似乎无法让我的程序运行得更快。循环比递归更快吗？ Haskell的I / O特别慢吗？我在实际实现算法时犯了错误吗？理想情况下，我想要一个仍然相对容易阅读的Haskell的优化版本

Answer 1

如果你只计算limit = ceiling . sqrt . fromIntegral $ n一次，而不是每次迭代计算一次，那么我看到Haskell版本更快：

limit = ceiling . sqrt . fromIntegral $ n
smallestFactor = getSmallestFactor x

getSmallestFactor x
    | n `rem` x == 0 = x
    | x > limit      = n
    | otherwise      = getSmallestFactor (x+1)

使用这个版本，我看到了：

$ time ./factorizePy.py 89273487253497
[3, 723721, 41117819]

real    0m0.236s
user    0m0.171s
sys     0m0.062s

$ time ./factorizeHs  89273487253497
[3,723721,41117819]

real    0m0.190s
user    0m0.000s
sys     0m0.031s

Answer 2

除了Cactus制作的关键点之外，还有一些空间用于重构和严格注释，以避免产生不必要的thunk。请特别注意factorize是懒惰的：

factorize' undefined undefined = undefined : undefined

这不是必需的，并迫使GHC分配几个thunk。其他地方的懒惰也是如此。我希望你会得到更好的表现：

{-# LANGUAGE BangPatterns #-}

factorize' :: Integral a => a -> a -> [a]
factorize' n x
  | smallestFactor == n = [smallestFactor]
  | otherwise = smallestFactor : factorize' (n `quot` smallestFactor) smallestFactor
  where
    smallestFactor = getSmallestFactor n (ceiling . sqrt . fromIntegral $ n) x
    getSmallestFactor n !limit x
       | n `rem` x == 0 = x
       | x > limit = n
       | otherwise = getSmallestFactor n limit (x+1)

-- Return a list representing the prime factorization of n.
factorize :: Integral a => a -> [a]
factorize n = factorize' n 2

我让getSmallestFactor同时将n和限制作为参数。这可以防止getSmallestFactor被分配为堆上的闭包。我不确定这是否值得多余的论点改组;你可以尝试两种方式。

在幼稚的整数分解中，Haskell比Python慢吗？

2 个答案:

在幼稚的整数分解中，Haskell比Python慢​​吗？

2 个答案:

在幼稚的整数分解中，Haskell比Python慢吗？