Question

所以，我在编写一个程序时遇到了很多麻烦，这个程序是从几个大的整数向量中生成一个协方差矩阵，存储在不同的文件中。我从写作开始

mean xs = realToFrac (sum xs) / realToFrac (length xs)
cov xs ys = mean (zipWith (*) xs ys) - mean xs * mean ys
covmat vectors = [cov xs ys | ys <- vectors, xs <- vectors]

适用于小输入，但你可以看到即使只是“卑鄙”也是多么低效。它在执行总和时将所有x保留在内存中，因为它们将被“length”使用。这是可以解决的，因为：

mean xs = realToFrac thisSum / realToFrac thisLength
    where (thisSum, thisLength) = foldl' (\(s,l) y-> (s+y,l+1)) (0,0) xs

然后我遇到与“cov”相同的问题。当我用这种风格改写“cov”时，它并没有最终使用我的“平均”功能。当我写“covmat”函数时，我仍然有一个级别，这将变得非常复杂。

所以，我有两个目标似乎存在冲突：

遍历每个列表一次，而不将其保留在内存中
将“covmat”分解为更简单，更有意义的功能，特别是“cov”和“mean”

我认为没有办法将这两个目标与我对Haskell的了解统一起来。但从概念上看似乎很简单：所有这些函数都需要“监听”它们进入的相同几个列表的值。在Haskell中有没有办法像这样组织它？如果这需要不同的数据结构或额外的库，我对此持开放态度。

Answer 1

所以，我做了一些挖掘，我想出了以下内容。

编辑：Gist适用于那些喜欢SO格式的用户。

首先，一些卑鄙的实现

module Means where

import Data.List
import Control.Applicative

mean :: (Fractional a1, Real a, Foldable t) => t a -> a1
mean xs = realToFrac (sum xs) / realToFrac (length xs)

mean' :: (Foldable f, Fractional a) => f a -> a
mean' = liftA2 (/) sum (fromIntegral . length)

data Pair = Pair {-# UNPACK #-}!Int {-# UNPACK #-}!Double 

mean'' :: [Double] -> Double
mean'' xs = s / fromIntegral n
  where
    Pair n s = foldl' k (Pair 0 0) xs
    k (Pair n s) x = Pair (n+1) (s+x)

最后一个使用显式严格对构造函数。 IIRC，(,)是懒惰的，所以这应该给我们更好的表现特征。

module Covariance where

import Means

covariance :: (Fractional a, Real a1) => [a1] -> [a1] -> a
covariance xs ys = mean (zipWith (*) xs ys) - mean xs * mean ys

covariance' :: Fractional a => [a] -> [a] -> a
covariance' xs ys = mean' (zipWith (*) xs ys) - mean' xs * mean' ys

covariance'' :: [Double] -> [Double] -> Double
covariance'' xs ys = mean'' (zipWith (*) xs ys) - mean'' xs * mean'' ys

covariance''' :: [Double] -> [Double] -> Double
covariance''' xs ys =
    let mx = mean'' xs
        my = mean'' ys
    in
    sum (zipWith (\x y -> (x - mx) * (y - my)) xs ys) / fromIntegral (length xs)

我尝试了cov的几个版本，使用了不同的平均选项，然后是一个丑陋的＆＃34;性能＆＃34;版本

我将一个简单的Main与一些硬编码列表放在一起进行测试。

module Main where

import Means
import Covariance

v1 = [1000000..2000000]

v2 = [2000000..3000000]

main :: IO ()
main = do
  -- let cov = covariance v1 v2
  -- let cov = covariance' v1 v2
  -- let cov = covariance'' v1 v2
  let cov = covariance''' v1 v2
  print cov

使用-rtsopts进行编译并使用+RTS -s运行，我收到了以下分配信息。

covariance：

8.33335e10
     531,816,984 bytes allocated in the heap
     890,566,720 bytes copied during GC
     148,609,912 bytes maximum residency (11 sample(s))
      15,649,528 bytes maximum slop
             374 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       981 colls,     0 par    0.385s   0.389s     0.0004s    0.0012s
  Gen  1        11 colls,     0 par    0.445s   0.584s     0.0531s    0.2084s

  INIT    time    0.000s  (  0.002s elapsed)
  MUT     time    0.194s  (  0.168s elapsed)
  GC      time    0.830s  (  0.973s elapsed)
  EXIT    time    0.001s  (  0.029s elapsed)
module Main where
  Total   time    1.027s  (  1.172s elapsed)

  %GC     time      80.9%  (83.0% elapsed)

  Alloc rate    2,741,140,975 bytes per MUT second

  Productivity  19.1% of total user, 16.8% of total elapsed

covariance'：

8.333350000320508e10
     723,822,456 bytes allocated in the heap
     891,626,240 bytes copied during GC
     185,629,664 bytes maximum residency (11 sample(s))
       3,693,296 bytes maximum slop
             435 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0      1372 colls,     0 par    0.388s   0.392s     0.0003s    0.0010s
  Gen  1        11 colls,     0 par    0.388s   0.551s     0.0501s    0.1961s

  INIT    time    0.000s  (  0.002s elapsed)
  MUT     time    0.227s  (  0.202s elapsed)
  GC      time    0.777s  (  0.943s elapsed)
  EXIT    time    0.001s  (  0.029s elapsed)
  Total   time    1.006s  (  1.176s elapsed)

  %GC     time      77.2%  (80.2% elapsed)

  Alloc rate    3,195,430,190 bytes per MUT second

  Productivity  22.8% of total user, 19.6% of total elapsed

covariance''：

8.333350000320508e10
     456,108,392 bytes allocated in the heap
     394,432,096 bytes copied during GC
      79,295,648 bytes maximum residency (15 sample(s))
      21,161,776 bytes maximum slop
             201 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       861 colls,     0 par    0.085s   0.089s     0.0001s    0.0005s
  Gen  1        15 colls,     0 par    0.196s   0.274s     0.0182s    0.0681s

  INIT    time    0.000s  (  0.002s elapsed)
  MUT     time    0.124s  (  0.106s elapsed)
  GC      time    0.282s  (  0.362s elapsed)
  EXIT    time    0.001s  (  0.021s elapsed)
  Total   time    0.408s  (  0.491s elapsed)

  %GC     time      69.1%  (73.7% elapsed)

  Alloc rate    3,681,440,521 bytes per MUT second

  Productivity  30.9% of total user, 25.9% of total elapsed

covariance'''：

8.333349999886264e10
     336,108,336 bytes allocated in the heap
     202,943,312 bytes copied during GC
      47,493,864 bytes maximum residency (10 sample(s))
      13,578,520 bytes maximum slop
             115 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       633 colls,     0 par    0.053s   0.055s     0.0001s    0.0002s
  Gen  1        10 colls,     0 par    0.089s   0.131s     0.0131s    0.0472s

  INIT    time    0.000s  (  0.002s elapsed)
  MUT     time    0.095s  (  0.086s elapsed)
  GC      time    0.142s  (  0.186s elapsed)
  EXIT    time    0.001s  (  0.011s elapsed)
  Total   time    0.240s  (  0.286s elapsed)

  %GC     time      59.2%  (65.1% elapsed)

  Alloc rate    3,522,631,228 bytes per MUT second

  Productivity  40.8% of total user, 34.1% of total elapsed

正如您所看到的，很多分配取决于我们使用的平均值。通过使用mean''和严格的对构造函数，我们得到了最大的推动，即使使用了朴素的zipWith实现。

我正在尝试使用weigh连接实现，所以我可能会有更多的数据。

除了调整组件函数之外，我还不知道处理covmat的更高效的方法，但严格的对构造函数至少应该改善你的空间特性而不管你做了什么。

编辑：weigh结果

Case                                  Allocated    GCs
naive mean                          723,716,168  1,382
applicative mean                    723,714,736  1,382
optimized mean, naive zipWith       456,000,688    875
optimized mean, hand-tuned zipWith  336,000,672    642

第二次编辑：

我抓住了加布里埃尔的精彩foldl，看看我们可以获得什么样的表现，而不必用明确的严格对来手动调整平均值。

import qualified Control.Foldl as L

mean''' :: [Double] -> Double
mean''' = L.fold (liftA2 (/) L.sum L.genericLength)

covariance'''' :: [Double] -> [Double] -> Double
covariance'''' xs ys = mean''' (zipWith (*) xs ys) - mean''' xs * mean''' ys

covariance''''' :: [Double] -> [Double] -> Double
covariance''''' xs ys = let mx = mean''' xs
                            my = mean''' ys
                        in
                        mean''' (zipWith (\x y -> (x - mx) * (y - my)) xs ys)

分配结果：

covariance''''：

8.333350000320508e10
     336,108,272 bytes allocated in the heap
     222,635,752 bytes copied during GC
      61,198,528 bytes maximum residency (10 sample(s))
      13,627,544 bytes maximum slop
             140 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       633 colls,     0 par    0.052s   0.054s     0.0001s    0.0003s
  Gen  1        10 colls,     0 par    0.105s   0.155s     0.0155s    0.0592s

  INIT    time    0.000s  (  0.002s elapsed)
  MUT     time    0.110s  (  0.099s elapsed)
  GC      time    0.156s  (  0.209s elapsed)
  EXIT    time    0.001s  (  0.014s elapsed)
  Total   time    0.269s  (  0.323s elapsed)

  %GC     time      58.1%  (64.5% elapsed)

  Alloc rate    3,054,641,122 bytes per MUT second

  Productivity  41.8% of total user, 34.9% of total elapsed

covariance'''''：

8.333349999886264e10
     336,108,232 bytes allocated in the heap
     202,942,400 bytes copied during GC
      47,493,816 bytes maximum residency (10 sample(s))
      13,578,568 bytes maximum slop
             115 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       633 colls,     0 par    0.057s   0.059s     0.0001s    0.0003s
  Gen  1        10 colls,     0 par    0.086s   0.126s     0.0126s    0.0426s

  INIT    time    0.000s  (  0.002s elapsed)
  MUT     time    0.096s  (  0.087s elapsed)
  GC      time    0.143s  (  0.184s elapsed)
  EXIT    time    0.001s  (  0.011s elapsed)
  Total   time    0.241s  (  0.285s elapsed)

  %GC     time      59.2%  (64.7% elapsed)

  Alloc rate    3,504,449,342 bytes per MUT second

  Productivity  40.8% of total user, 34.5% of total elapsed

weigh结果：

foldl mean                          336,000,568    642
foldl mean, tuned zipWith           336,000,568    642

总之，看起来foldl实施是您最好的选择。它非常清楚它正在做什么，并且采用一些非常奇特的技巧来有效地传输输入，达到或超过我们的手动调整的结果。您可以使用另一种数据结构从所有这些中获取额外的果汁，但这对于简单列表来说是非常好的性能。：d

第三次编辑：

我之前从未使用过爱德华的folds，所以我可能会做一些非常愚蠢的事情，但我也尝试过使用这些实现。

import qualified Data.Fold as F

sumL :: Num a => F.L a a
sumL = F.L id (+) 0

lengthL :: Num b => F.L a b
lengthL = F.L id (const . (+1)) 0

mean'''' :: [Double] -> Double
mean'''' = flip F.run (liftA2 (/) sumL lengthL)

covariance'''''' :: [Double] -> [Double] -> Double
covariance'''''' xs ys = mean'''' (zipWith (*) xs ys) - mean'''' xs * mean'''' ys

covariance''''''' :: [Double] -> [Double] -> Double
covariance''''''' xs ys = let mx = mean'''' xs
                              my = mean'''' ys
                        in
                        mean'''' (zipWith (\x y -> (x - mx) * (y - my)) xs ys)

分配结果：

covariance''''''：

8.333350000320508e10
     456,108,488 bytes allocated in the heap
     394,432,096 bytes copied during GC
      79,295,648 bytes maximum residency (15 sample(s))
      21,161,776 bytes maximum slop
             201 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       861 colls,     0 par    0.089s   0.092s     0.0001s    0.0003s
  Gen  1        15 colls,     0 par    0.198s   0.276s     0.0184s    0.0720s

  INIT    time    0.000s  (  0.002s elapsed)
  MUT     time    0.135s  (  0.119s elapsed)
  GC      time    0.287s  (  0.367s elapsed)
  EXIT    time    0.001s  (  0.019s elapsed)
  Total   time    0.425s  (  0.506s elapsed)

  %GC     time      67.6%  (72.5% elapsed)

  Alloc rate    3,388,218,993 bytes per MUT second

  Productivity  32.3% of total user, 27.1% of total elapsed

covariance'''''''：

8.333349999886264e10
     456,108,552 bytes allocated in the heap
     291,275,200 bytes copied during GC
      62,670,040 bytes maximum residency (11 sample(s))
      15,029,432 bytes maximum slop
             172 MB total memory in use (0 MB lost due to fragmentation)

                                     Tot time (elapsed)  Avg pause  Max pause
  Gen  0       862 colls,     0 par    0.068s   0.070s     0.0001s    0.0003s
  Gen  1        11 colls,     0 par    0.149s   0.210s     0.0191s    0.0570s

  INIT    time    0.000s  (  0.002s elapsed)
  MUT     time    0.118s  (  0.104s elapsed)
  GC      time    0.217s  (  0.280s elapsed)
  EXIT    time    0.001s  (  0.016s elapsed)
  Total   time    0.337s  (  0.403s elapsed)

  %GC     time      64.3%  (69.6% elapsed)

  Alloc rate    3,870,870,585 bytes per MUT second

  Productivity  35.7% of total user, 29.9% of total elapsed

weigh结果：

folds mean                          456,000,784    875
folds mean, tuned zipWith           456,000,888    871

另一个编辑：我也使用folds而不是L'尝试了L选项，但结果是相同的。

将Haskell函数分解为子任务，无需额外的列表遍历

1 个答案: