Question

我使用过BangPatterns，Lazy ByteString。不知道还有什么办法可以提高这段代码的性能。有什么想法和建议吗？它显然不是最快的版本，因为它超过了时间限制。

-- Find the sum of all the multiples of 3 or 5 below N
-- Input Format 
-- First line contains T that denotes the number of test cases. This is followed by T lines, each containing an integer, N.

{-# LANGUAGE BangPatterns #-}
{-# OPTIONS_GHC -O2 -optc-O2 #-}
import qualified Data.ByteString.Lazy as L
import Control.Monad (mapM_)

readInt :: L.ByteString -> Int
readInt !s = L.foldl' (\x c -> 10 * x + fromIntegral c - 48) 0 s

main :: IO ()
main = do 
-- don't need the number of inputs, since it is read lazily.
-- split input by lines
  (_:ls) <- L.split 10 `fmap` L.getContents
-- length ls <= 10^5
  mapM_ (print . f . readInt) ls

-- n <= 10^9
f :: Int -> Int
f n = go 0 0
  where
    go !i !a | i == n            = a
    go !i !a | i `mod` 3 == 0
               || i `mod` 5 == 0 = go (i+1) (a+i)
    go !i !a                     = go (i+1) a

Answer 1

您在

行中使用print

mapM_ (print . f . readInt) ls

可能会引入一些开销，因为print取决于Show的{{1}}实例，这意味着会转换为效率低下的Int。

添加以下导入

String

并尝试使用类似

的内容更改该行

import qualified Data.ByteString.Builder as BB
import qualified Data.Foldable as F
import Data.List.Split (chunksOf) -- from the "split" package
import System.IO -- for stdout

来自let resultList = map (f . readInt) ls F.mapM_ (BB.hPutBuilder stdout . F.foldMap BB.intDec) (chunksOf 1000 resultList)列表的大小为1000的takes chunks，并使用高效的Builder类型和专门的hPutBuilder函数将它们写入stdout。

（我添加了分块，因为否则我担心构造Int会强制整个输入列表进入内存。我们不希望这样，因为列表被懒惰地读取。）

但是，我不确定这是否是主要的瓶颈。

Answer 2

danidiaz在某种程度上已经discussed输入和输出问题。

产生3或5的倍数的一种快速方法是使用通常用于主要筛子的“轮子”。

multiples3or5 = go 0 $ cycle [3,2,1,3,1,2,3]
  where
    go n (x : xs) = n : go (n+x) xs
    go n [] = error "impossible"

事实上，由于循环列表永远不会结束，因此使用其他类型更清晰。而且由于您使用的是Int，因此它可能也是专业的并且已经解压缩以提高性能。请注意，GHC版本7.8或更高版本不需要此上下文中的UNPACK编译指示。

data IntStream = {-# UNPACK #-} !Int :> IntStream
infixr 5 :>

wheel :: IntStream
wheel = 3 :> 2 :> 1 :> 3 :> 1 :> 2 :> 3 :> wheel

multiples3or5 = go 0 wheel
  where
    go !n (x :> xs) = n : go (n+x) xs

作为fgv commented，这属于anamorphism的性质。您可以通过编写

来看到这一点

multiples3or5 = unfoldr go (0, wheel) where
  go (!n, (x :> xs)) = Just (n, (n+x, xs))

但请注意unfoldr在基础4.8尚未正式发布之前，对于任何事情都没有足够的效率。

当打印出结果时，系统必须将很多东西除以10.我不知道这些例程是否经过特别优化，但我知道GHC的本机代码生成器不目前通过已知除数来优化除法，除非除数是2的幂。因此，您可能会发现可以使用-fllvm来提高性能，并小心使用兼容版本的LLVM。

修改

有关更好的方法，请参阅Chad Groft's answer。

Answer 3

如果您真的关心效率，请重新考虑算法。你的主要瓶颈在于你手动汇总1和 N 之间的一堆数字，无论你做什么，它都会在大型 N 上表现不佳。

相反，请以数学方式思考。所有3或5的倍数之和 N 几乎 3的所有倍数之和 N （称之为 > S _3），再加上5到 N 的所有倍数之和（称之为 S _5）。我说＆＃34;几乎＆＃34;因为有些数字会被重复计算;称他们的总和 T 。现在你想要的总和就是 S _3 + S _5 - T ，每个术语都有一个很好的封闭公式（它是什么？）。计算这三个数字要快得多。

Answer 4

在这里你没有那些“思考”导师答案的公式

sumMultiplesOf::Integral n=>n->n->n
sumMultiplesOf k n = d * (1 + d) `div` 2 * k where d = (n - 1) `div` k

sumMultiplesOf3or5::Integral n=>n->n
sumMultiplesOf3or5 n = sumMultiplesOf 3 n + sumMultiplesOf 5 n - sumMultiplesOf 15 n

改善Haskell代码性能（BangPatterns，LazyByteString）

4 个答案:

修改