Question

在下面的程序中，我只希望cycle3以恒定的内存运行（它明确地结合了结）。但是，由于我无法理解的原因，cycle2也会在常量内存中运行。我希望cycle2能够完成与cycle1完全相同的工作，因为

xs' = xs ++ xs'
xs' = xs ++ xs ++ xs' -- substitute value of xs'
xs' = xs ++ xs ++ xs ++ xs' -- substitute value of xs' again
xs' = xs ++ xs ++ xs ++ ... -- and so on

有人可以解释我在这里缺少的东西吗？

module Main where

import System.Environment (getArgs)

cycle1 :: [a] -> [a]
cycle1 [] = error "empty list"
cycle1 xs = xs ++ cycle1 xs

cycle2 :: [a] -> [a]
cycle2 [] = error "empty list"
cycle2 xs = xs' where xs' = xs ++ xs'

cycle3 :: [a] -> [a]
cycle3 [] = error "empty list"
cycle3 xs = let
  xs' = go xs' xs
  in xs'
  where
    go :: [a] -> [a] -> [a]
    go start [last] = last : start
    go start (x:xs) = x : go start xs

testMem :: (Show a) => ([a] -> [a]) -> [a] -> IO ()
testMem f xs = print (xs', xs') -- prints only first val, need second val holds onto reference
  where
    xs' = f xs

main :: IO ()
main = do
  args <- getArgs
  let mCycleFunc = case args of
        ["1"] -> Just cycle1
        ["2"] -> Just cycle2
        ["3"] -> Just cycle3
        _ -> Nothing
  case mCycleFunc of
    Just cycleFunc -> testMem cycleFunc [0..8]
    Nothing -> putStrLn "Valid args are one of {1, 2, 3}."

Answer 1

每次消耗一个周期时，

cycle1都会创建一个新列表。这应该是显而易见的原因。

然而，

cycle2并没有这样做。它创建了一个变量xs'，它在自己的定义中使用。在cycle1中，每次消耗cycle1时都必须重新评估xs函数，但在cycle2中，它不具有任何递归函数。它只引用了已经具有已知值的相同变量。

Answer 2

归结为共享或不共享相同的thunk。两个相同的thunks是保证产生相同结果的thunks。在cycle1的情况下，每当您点击cycle1 xs末尾的[]时，您就会为xs创建一个新的thunk。需要为该thunk分配新的内存，并且需要从头开始计算其值，这会在您完成时分配新的列表对。

如果您将cycle2重命名为xs'（我删除了result上的错误“），我认为[]避免这种情况变得更容易理解：< / p>

cycle2 :: [a] -> [a]
cycle2 xs = result 
    where result = xs ++ result

此定义在语义上等同于cycle1（对于相同的参数产生相同的结果），但理解内存使用的关键是根据创建的thunk来查看它。当您为此函数执行已编译的代码时，它立即执行的只是为result创建一个thunk。您可以将thunk视为可变类型，或多或少像这样（完全伪造的伪代码）：

type Thunk a = union { NotDone (ThunkData a), Done a }
type ThunkData a = struct { force :: t0 -> ... -> tn -> a
                          , subthunk0 :: t0
                          , ...
                          , subthunkn :: tn }

这是一个记录，其中包含指向所需值的thunk的指针，以及指向强制这些thunk的代码的指针，或者只是计算结果。在cycle2的情况下，result的thunk指向(++)的对象代码以及xs和result的thunk。最后一位意味着result的thunk有一个指向自身的指针，这解释了恒定的空间行为;强制result的最后一步是让它回归自身。

在cycle1的情况下，另一方面，thunk具有(++)的代码，xs的thunk和 new thunk to从头开始计算cycle1 xs。原则上，编译器可能会认识到对后一个thunk的引用可以用一个替换为“父”块，但编译器不这样做;而在cycle2中，它无能为力（一个变量的实例化绑定=一个块）。

请注意，这种自引用thunk行为可以在fix的适当实现中考虑：

-- | Find the least fixed point of @f@.  This implementation should produce
-- self-referential thunks, and thus run in constant space.
fix :: (a -> a) -> a
fix f = result
    where result = f result

cycle4 :: [a] -> [a]
cycle4 xs = fix (xs++)

循环功能的意外内存使用情况

2 个答案: