Question

我受到了这篇名为“Only fast languages are interesting”的帖子的启发，以查看他在Haskell中建议的问题（从向量中总结几百万个数字）并与他的结果进行比较。

我是一个Haskell新手，所以我真的不知道如何正确计时或如何有效地做到这一点，我对此问题的第一次尝试如下。请注意，我不是在向量中使用随机数，因为我不确定如何以一种好的方式做。我也打印东西，以确保完整的评估。

import System.TimeIt

import Data.Vector as V

vector :: IO (Vector Int)
vector = do
  let vec = V.replicate 3000000 10
  print $ V.length vec
  return vec

sumit :: IO ()
sumit = do
  vec <- vector
  print $ V.sum vec

time = timeIt sumit

在GHCI中加载并运行time告诉我，运行300万个数字需要大约0.22秒，而3000万个数字需要2.69秒。

与博客作者相比，郁郁葱葱的0.02s和0.18s的结果相当糟糕，这使我相信这可以更好的方式完成。

注意：上面的代码需要运行包TimeIt。 cabal install timeit会为你找到它。

Answer 1

首先，要意识到GHCi是一个解释器，它并不是设计得非常快。要获得更有用的结果，您应该在启用优化的情况下编译代码。这可以产生巨大的差异。

此外，对于任何严格的Haskell代码基准测试，我建议使用criterion。它使用各种统计技术来确保您获得可靠的测量结果。

我修改了你的代码以使用标准并删除了print语句，这样我们就不会对I / O进行计时。

import Criterion.Main
import Data.Vector as V

vector :: IO (Vector Int)
vector = do
  let vec = V.replicate 3000000 10
  return vec

sumit :: IO Int
sumit = do
  vec <- vector
  return $ V.sum vec

main = defaultMain [bench "sumit" $ whnfIO sumit]

使用-O2进行编译，我在一个相当慢的上网本上得到了这个结果：

$ ghc --make -O2 Sum.hs
$ ./Sum 
warming up
estimating clock resolution...
mean is 56.55146 us (10001 iterations)
found 1136 outliers among 9999 samples (11.4%)
  235 (2.4%) high mild
  901 (9.0%) high severe
estimating cost of a clock call...
mean is 2.493841 us (38 iterations)
found 4 outliers among 38 samples (10.5%)
  2 (5.3%) high mild
  2 (5.3%) high severe

benchmarking sumit
collecting 100 samples, 8 iterations each, in estimated 6.180620 s
mean: 9.329556 ms, lb 9.222860 ms, ub 9.473564 ms, ci 0.950
std dev: 628.0294 us, lb 439.1394 us, ub 1.045119 ms, ci 0.950

所以我得到的平均值只有9毫秒，标准差小于1毫秒。对于更大的测试用例，我的时间约为100毫秒。

使用vector包时启用优化尤为重要，因为它大量使用 stream fusion ，在这种情况下可以完全消除数据结构，转换程序进入一个有效，紧凑的循环。

使用-fllvm选项试验新的基于LLVM的代码生成器也是值得的。 It is apparently well-suited for numeric code

Answer 2

原始文件，未编译，然后在没有优化的情况下编译，然后使用简单的优化标志进行编译：

$ runhaskell boxed.hs  
3000000
30000000
CPU time:   0.35s

$ ghc --make boxed.hs -o unoptimized 
$ ./unoptimized
3000000
30000000
CPU time:   0.34s



$ ghc --make -O2 boxed.hs 
$ ./boxed
3000000
30000000
CPU time:   0.09s

包含import qualified Data.Vector.Unboxed as V而不是import qualified Data.Vector as V的文件（Int是不可用的类型） - 首先没有优化然后使用：

$ ghc --make unboxed.hs -o unoptimized
$ ./unoptimized
3000000
30000000
CPU time:   0.27s


$ ghc --make -O2 unboxed.hs 
$ ./unboxed
3000000
30000000
CPU time:   0.04s

因此，编译，优化......并在可能的情况下使用Data.Vector.Unboxed

Answer 3

尝试使用未装箱的矢量，虽然我不确定它是否会在这种情况下产生明显的差异。另请注意，比较有点不公平，因为 vector 包应该完全优化向量（此优化称为 stream fusion ）。

Answer 4

如果使用足够大的矢量，Vector Unboxed可能变得不切实际。对我来说，如果矢量大小＆gt;纯（懒）列表更快。 50000000：

import System.TimeIt

sumit :: IO ()
sumit = print . sum $ replicate 50000000 10

main :: IO ()
main = timeIt sumit

我得到这些时间：

Unboxed Vectors
CPU time:   1.00s

List:
CPU time:   0.70s

编辑：我使用Criterion重复了基准测试并使sumit变为纯净。代码和结果如下：

代码：

import Criterion.Main

sumit :: Int -> Int
sumit m = sum $ replicate m 10

main :: IO ()
main = defaultMain [bench "sumit" $ nf sumit 50000000]

结果：

warming up
estimating clock resolution...
mean is 7.248078 us (80001 iterations)
found 24509 outliers among 79999 samples (30.6%)
  6044 (7.6%) low severe
  18465 (23.1%) high severe
estimating cost of a clock call...
mean is 68.15917 ns (65 iterations)
found 7 outliers among 65 samples (10.8%)
  3 (4.6%) high mild
  4 (6.2%) high severe

benchmarking sumit
collecting 100 samples, 1 iterations each, in estimated 46.07401 s
mean: 451.0233 ms, lb 450.6641 ms, ub 451.5295 ms, ci 0.950
std dev: 2.172022 ms, lb 1.674497 ms, ub 2.841110 ms, ci 0.950

看起来print会产生很大的不同，因为它应该是预期的！

在Haskell中做有效的数字

4 个答案: