Question

假设我有一个函数f，它接受一些输入并产生一个数字。在函数f内，根据输入创建列表，然后减少该列表（例如，使用foldl' g）以产生最终输出数。因为毕竟要减少中间列表，是否可以应用reduce函数g 而不表达中间列表。这里的目标是限制用于存储（或表达，如果'存储'不太准确的单词）列表的存储器。

为了说明这一点，这个函数foldPairProduct占用O(N1 * N2)空间用于中间列表（由于表达式和惰性评估，消耗的空间可能更复杂，但我认为它是成比例的或更糟）。这里N1, N2是两个输入列表的大小。

foldPairProduct :: (Num a, Ord a)  => (a -> a -> a) -> [a] -> [a] -> a
foldPairProduct f xs ys = foldl1 f [ x*y | x <- xs, y <- ys]

逻辑的另一种实现是foldPairProduct'，它占用O(2 * 2)空间。

foldPairProduct' :: Num a => (Maybe a -> Maybe a -> Maybe a) -> [a] -> [a] -> Maybe a  
foldPairProduct' _ _ [] = Nothing
foldPairProduct' _ [] _ = Nothing
foldPairProduct' f (x:xs) (y:ys) = 
  foldl1 f [Just $ x*y, foldPairProduct' f [x] ys, foldPairProduct' f xs [y], 
            foldPairProduct' f xs ys]

foldCrossProduct的情况因foldPairProduct的实施情况而异，但它接受多个列表作为输入。中间列表的空间复杂度（仍然在命令式语言中）是O(N1 * N2 * ...* Nk)，其中k是[[a]]的长度。

foldCrossProduct :: Num a => (a -> a -> a) -> [[a]]  -> a
foldCrossProduct f xss = foldl1 f (crossProduct xss)

crossProduct :: Num a => [[a]] -> [a]
crossProduct [] = []
crossProduct (xs:[]) = xs
crossProduct (xs:xss) = [x * y | x <- xs, y <- crossProduct xss]

如果我们遵循foldPairProduct'的实施理念，则空间复杂度将为k^2，这样可以提高空间效率。我的问题是：

我为一对列表实现了foldPairProduct'。但是，似乎为任意数量的列表实现它并不简单。
我不是要将Haskell与命令式语言进行比较，但是是否存在使用常量空间的实现（或者在另一个词中，不表示上述长度的中间列表）？也许莫纳德会帮助我，但我很新。
编译器真的有它的魔力吗？也就是说，它注意到列表是中间的并且要减少，并且确实找到了一种空间有效地评估它的方法。毕竟，这就是我认为惰性评估和编译器优化的设计目标。
欢迎任何评论。谢谢。

更新1

性能测试确认了基于改变输入大小foldPairProduct的{{1}}和foldCrossProduct的“空间复杂度”分析，以及观察GC复制的字节数。

性能测试会对N1, N2, N3的分析进行分析，这种分析令人惊讶地显示foldPairProduct'或更糟糕的空间使用情况。这可能是由于递归调用被低效评估。结果如下（ghc设置与Yuras相同）。

更新2

在我从评论和答案中学习后，更新了一些进一步的实验。对于N1 * N2，正在使用的总内存与Daniel Fischer所解释的空间复杂度一致。

~~对于foldPairProduct，虽然Daniel的复杂性分析对我有意义，但结果并未显示线性内存使用情况。~~ 遵循Daniel的建议，交换了foldCrossProduct和x <- xs，它确实实现了线性空间复杂度。

对于y <- crossproduct ys，n = 100,1000,10000,100000，使用的内存为2,2,3,14 MB。

foldPairProduct [1..n] [1..10000]

foldCrossProduct (max) [[1..100],[1..n], [1..1000]]

foldPairProduct [1..10000] [1..n]

n = 100
  120,883,320 bytes allocated in the heap 
   56,867,728 bytes copied during GC
      428,384 bytes maximum residency (50 sample(s)) 
       98,664 bytes maximum slop
            3 MB total memory in use (0 MB lost due to fragmentation)     

n = 1000
 1,200,999,280 bytes allocated in the heap 
   569,837,360 bytes copied during GC   
       428,384 bytes maximum residency (500 sample(s))
        99,744 bytes maximum slop 
             3 MB total memory in use (0 MB lost due to fragmentation) 
n = 10000

  12,002,152,040 bytes allocated in the heap
   5,699,468,024 bytes copied during GC 
         428,384 bytes maximum residency (5000 sample(s))
          99,928 bytes maximum slop 
               3 MB total memory in use (0 MB lost due to fragmentation)

n = 100000

 120,013,672,800 bytes allocated in the heap 
  56,997,625,608 bytes copied during GC 
         428,384 bytes maximum residency (50000 sample(s)) 
          99,984 bytes maximum slop 
               3 MB total memory in use (0 MB lost due to fragmentation)

foldPairProduct [1..n] [1..n]

n = 100

     121,438,536 bytes allocated in the heap 
          55,920 bytes copied during GC     
          32,408 bytes maximum residency (1 sample(s)) 
          19,856 bytes maximum slop  
               1 MB total memory in use (0 MB lost due to fragmentation)

n = 1000

   1,201,511,296 bytes allocated in the heap 
         491,864 bytes copied during GC     
          68,392 bytes maximum residency (1 sample(s)) 
          20,696 bytes maximum slop                   
               1 MB total memory in use (0 MB lost due to fragmentation)

n = 10000

  12,002,232,056 bytes allocated in the heap 
   5,712,004,584 bytes copied during GC     
         428,408 bytes maximum residency (5000 sample(s)) 
          98,688 bytes maximum slop 
               3 MB total memory in use (0 MB lost due to fragmentation)

n = 100000

 120,009,432,816 bytes allocated in the heap
  81,694,557,064 bytes copied during GC 
       4,028,408 bytes maximum residency (10002 sample(s))
         769,720 bytes maximum slop 
              14 MB total memory in use (0 MB lost due to fragmentation)

foldCrossProduct（max）[[1..n]，[1..100]，[1..1000]]

n = 100
 1,284,024 bytes allocated in the heap
    15,440 bytes copied during GC
    32,336 bytes maximum residency (1 sample(s))
    19,920 bytes maximum slop                  
         1 MB total memory in use (0 MB lost due to fragmentation)  

n = 1000
 120,207,224 bytes allocated in the heap  
     114,848 bytes copied during GC 
      68,336 bytes maximum residency (1 sample(s)) 
      24,832 bytes maximum slop 
           1 MB total memory in use (0 MB lost due to fragmentation)  

n = 10000

  12,001,432,024 bytes allocated in the heap 
   5,708,472,592 bytes copied during GC 
         428,336 bytes maximum residency (5000 sample(s)) 
          99,960 bytes maximum slop 
               3 MB total memory in use (0 MB lost due to fragmentation) 

n = 100000
 1,200,013,672,824 bytes allocated in the heap 
   816,574,713,664 bytes copied during GC 
         4,028,336 bytes maximum residency (100002 sample(s)) 
           770,264 bytes maximum slop 
                14 MB total memory in use (0 MB lost due to fragmentation)

foldCrossProduct（max）[[1..100]，[1..n]，[1..1000]]

n = 100
     105,131,320 bytes allocated in the heap 
      38,697,432 bytes copied during GC     
         427,832 bytes maximum residency (34 sample(s)) 
         209,312 bytes maximum slop 
               3 MB total memory in use (0 MB lost due to fragmentation)

n = 1000
   1,041,254,480 bytes allocated in the heap 
     374,148,224 bytes copied during GC 
         427,832 bytes maximum residency (334 sample(s))
         211,936 bytes maximum slop 
               3 MB total memory in use (0 MB lost due to fragmentation)

n = 10000
  10,402,479,240 bytes allocated in the heap 
   3,728,429,728 bytes copied during GC     
         427,832 bytes maximum residency (3334 sample(s))
         215,936 bytes maximum slop
               3 MB total memory in use (0 MB lost due to fragmentation)

foldPairProduct'[1..n] [1..n]

n = 100
     105,131,344 bytes allocated in the heap 
      38,686,648 bytes copied during GC  
         431,408 bytes maximum residency (34 sample(s)) 
         205,456 bytes maximum slop 
               3 MB total memory in use (0 MB lost due to fragmentation)

n = 1000
   1,050,614,504 bytes allocated in the heap
     412,084,688 bytes copied during GC 
       4,031,456 bytes maximum residency (53 sample(s)) 
       1,403,976 bytes maximum slop
              15 MB total memory in use (0 MB lost due to fragmentation)    
n = 10000
    quit after over 1362 MB total memory in use (0 MB lost due to fragmentation)

Answer 1

（好吧，我错了，它不会在恒定的空间中工作，因为其中一个列表被多次使用，因此它很可能具有线性空间复杂度）

您是否尝试在启用优化的情况下编译测试程序？你的foldPairProduct看起来对我很好，我希望它可以在恒定的空间内工作。

ADD：是的，它在恒定的空间（使用3 MB总内存）中工作：

shum@shum-laptop:/tmp/shum$ cat test.hs 

foldPairProduct f xs ys = foldl1 f [ x*y | x <- xs, y <- ys]

n :: Int
n = 10000

main = print $ foldPairProduct (+) [1..n] [1..n]
shum@shum-laptop:/tmp/shum$ ghc --make -fforce-recomp -O test.hs 
[1 of 1] Compiling Main             ( test.hs, test.o )
Linking test ...
shum@shum-laptop:/tmp/shum$ time ./test +RTS -s
2500500025000000
  10,401,332,232 bytes allocated in the heap
   3,717,333,376 bytes copied during GC
         428,280 bytes maximum residency (3335 sample(s))
         219,792 bytes maximum slop
               3 MB total memory in use (0 MB lost due to fragmentation)

                                    Tot time (elapsed)  Avg pause  Max pause
  Gen  0     16699 colls,     0 par    4.27s    4.40s     0.0003s    0.0009s
  Gen  1      3335 colls,     0 par    1.52s    1.52s     0.0005s    0.0012s

  INIT    time    0.00s  (  0.00s elapsed)
  MUT     time    2.23s  (  2.17s elapsed)
  GC      time    5.79s  (  5.91s elapsed)
  EXIT    time    0.00s  (  0.00s elapsed)
  Total   time    8.02s  (  8.08s elapsed)

  %GC     time      72.2%  (73.2% elapsed)

  Alloc rate    4,659,775,665 bytes per MUT second

  Productivity  27.8% of total user, 27.6% of total elapsed


real    0m8.085s
user    0m8.025s
sys 0m0.040s
shum@shum-laptop:/tmp/shum$

Answer 2

对名为loop fusion的列表的创建/修改/消费进行了特定优化。因为Haskell是纯粹且非严格的，所以有许多法律，例如map f . mag g == map (f . g)。

如果编译器由于某种原因无法识别代码并产生次优代码（在传递-O标志之后），我会详细研究流融合以查看阻止它的原因。

Answer 3

foldPairProduct :: (Num a, Ord a)  => (a -> a -> a) -> [a] -> [a] -> a
foldPairProduct f xs ys = foldl1 f [ x*y | x <- xs, y <- ys]

可以成为一个好记忆的公民。第二个参数ys被重复使用，因此在计算过程中必须完全在内存中，但是中间列表在消耗时会延迟产生，因此只贡献一定量的内存，总体O(length ys)空间复杂度。当然，必须有length xs * length ys个列表单元格生成和使用，因此总体分配为O(length xs * length ys) [假设每个a值使用有界空间]。 GC期间复制的字节数（以及GC所需的时间）可以通过提供更大的分配区域来大幅减少，+RTS -A1M，数字从

下降

3,717,333,376 bytes copied during GC

默认设置为

20,445,728 bytes copied during GC

以及从GC time 4.88s到GC time 0.07s和xs == ys = [1 .. 10000] :: [Int]的{{1}}的时间。

但这取决于严格性分析器的工作 - 如果它使用的类型是例如它就可以了。 f = (+)并且在编译期间已知，并且已知组合函数是严格的。如果代码不是专门的，或者如果不知道组合函数是严格的，则折叠将产生Int大小的thunk。使用更严格的O(length xs * length ys)。

可以缓解这个问题

foldl1'

直接遇到严格性不足的问题，由foldPairProduct' :: Num a => (Maybe a -> Maybe a -> Maybe a) -> [a] -> [a] -> Maybe a foldPairProduct' _ _ [] = Nothing foldPairProduct' _ [] _ = Nothing foldPairProduct' f (x:xs) (y:ys) = foldl1 f [Just $ x*y, foldPairProduct' f [x] ys, foldPairProduct' f xs [y], foldPairProduct' f xs ys]构造函数包装的值不能被编译器严格限制，因为它可能不需要整体结果，所以折叠通常会在Just下产生O(length xs * length ys)大小的thunk - 当然，对于某些Just，如f，它会表现得很好。如果使用所有值，那么要成为一个好的记忆公民，你必须使用足够严格的组合函数const，同时强制结果中f下的值（如果它是Just ）;使用Just也有帮助。这样，它可能具有foldl1'空间复杂度（列表O(length ys + length xs)和xs不止一次使用，因此可以重复使用）。

ys

虽然GHC几乎没有CSE（公共子表达式消除），但是foldCrossProduct :: Num a => (a -> a -> a) -> [[a]] -> a foldCrossProduct f xss = foldl1 f (crossProduct xss) crossProduct :: Num a => [[a]] -> [a] crossProduct [] = [] crossProduct (xs:[]) = xs crossProduct (xs:xss) = [x * y | x <- xs, y <- crossProduct xss]列表{可能}将在不同的crossProduct xss之间共享，因此产生x空间复杂度。如果列表中元素的顺序无关紧要，请重新排序到

O(N2*...*Nk)

帮助。然后crossProduct (xs:xss) = [x * y | y <- crossProduct xss, x <- xs]不需要一次在内存中，因此可以逐步生成和使用，只有crossProduct xss必须被记住，因为它被多次使用。对于递归调用，必须共享第一个剩余列表，这样会产生整体xs空间复杂度。

在Haskell中动态减少列表

3 个答案: