Question

我在Haskell中使用多态记录遇到了一些意外的行为，其中某些值在我希望被缓存时不会被缓存。

这是一个最小的示例：

{-# LANGUAGE RankNTypes #-}
import Debug.Trace

-- Prints out two "hello"s
data Translation = Trans { m :: forall a . Floating a => a }

g :: Floating a => a -> a
g x = x + 1

f :: Floating a => a -> a
f x = trace "hello" $ x - 2.0

-- Only one "hello"
-- data Translation = Trans { m :: Float }
--
-- f :: Float -> Float
-- f x = trace "hello" $ x - 2.0

main :: IO ()
main = do
    let trans = Trans { m = f 1.5 }
    putStrLn $ show $ m trans
    putStrLn $ show $ m trans

在示例中，我认为如果值f 1.5是计算并存储在字段m中的，则下次访问该值时将不再进行计算。但是，似乎每次访问记录字段都会重新计算一次，如“ hello”被打印两次的事实所示。

另一方面，如果我们从字段中删除多态性，则将按预期方式缓存该值，并且“ hello”仅打印一次。

我怀疑这是由于类型类（被视为记录）的交互阻止了记忆。但是，我不完全理解为什么。

我意识到使用-O2进行编译可以解决问题，但是，这种现象发生在一个更大的系统中，其中使用-O2进行编译似乎没有任何效果，因此我想了解问题，因此我可以解决较大系统中的性能问题。

Answer 1

拿着我的啤酒。

{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE GADTs #-}
{-# LANGUAGE ConstraintKinds #-}
import Debug.Trace

data Dict c where Dict :: c => Dict c

-- An isomorphism between explicit dictionary-passing style (Dict c -> a)
-- and typeclass constraints (c => a) exists:
from :: (c => a) -> (Dict c -> a)
from v Dict = v

to :: (Dict c -> a) -> (c => a)
to f = f Dict

data Translation = Trans { m :: forall a . Floating a => a }

f1, f2 :: Dict (Floating a) -> a -> a
f1 = trace "hello" $ \Dict x -> x - 2.0
f2 = \Dict -> trace "hello" $ \x -> x - 2.0

main = do
    let trans1 = Trans { m = to (flip f1 1.5) }
        trans2 = Trans { m = to (flip f2 1.5) }
    putStrLn "trans1"
    print (m trans1)
    print (m trans1)
    putStrLn "trans2"
    print (m trans2)
    print (m trans2)

在运行它之前花一秒钟时间来预测将输出什么。然后去问你的GHC她是否同意你的猜测。

像泥一样清晰吗？

您需要在此处绘制的基本区别就在以下示例中：

> g = trace "a" $ \() -> trace "b" ()
> g ()
a
b
()
> g ()
b
()

还有一个单独的概念，即缓存函数并缓存其输出。简而言之，后者在GHC中从来没有做过（尽管请参见下面有关优化版本的内容的讨论）。前者听起来可能很愚蠢，但实际上却没有您想像的那么愚蠢。您可以想象编写一个函数，例如，如果collatz猜想为true，则为id，否则为not。在这种情况下，完全有意义的是只测试一次collatz猜想，然后缓存之后我们是否应该永远表现为id或not。

一旦您了解了这个基本事实，您必须相信的下一个飞跃是在GHC中，类型类约束将编译为函数。（该函数的参数是类型类字典，用于说明每个类型类方法的行为。）GHC本身会为您管理这些字典的构造和传递，在大多数情况下，这对用户是透明的。

但是这种编译策略的结果是：一个多态的但受类限制的类型是一个函数，即使其中似乎没有函数箭头 。也就是说，

f 1.5 :: Floating a => a

看起来像一个普通的旧值；但实际上，这是一个功能，它接受一个Floating a字典并产生一个a类型的值。因此，每次应用此函数时，重新计算值a的所有计算都会重新进行重做（读取：用于特定的单形类型），因为毕竟，所选择的精确值取决于关键在于类型类的方法的行为。

仅剩下一个问题，为什么优化会改变您的情况。我相信发生的事情称为“专业化”，在这种情况下，编译器将尝试注意到何时将多态事物用于静态已知的单态类型并为其进行绑定。它是这样的：

-- starting point
main = do
    let trans = \dict -> trace "hello" $ minus dict (fromRational dict (3%2)) (fromRational dict (2%1))
    print (trans dictForDouble)
    print (trans dictForDouble)

-- specialization
main = do
    let trans = \dict -> trace "hello" $ minus dict (fromRational dict (3%2)) (fromRational dict (2%1))
    let transForDouble = trans dictForDouble
    print transForDouble
    print transForDouble

-- inlining
main = do
    let transForDouble = trace "hello" $ minus dictForDouble (fromRational dict (3%2)) (fromRational dictForDouble (2%1))
    print transForDouble
    print transForDouble

在最后一个函数中，功能性消失了；当GHC应用于字典trans时，“好像” GHC缓存了dictForDouble的输出。（如果您使用优化功能进行编译并且-ddump-simpl，您会发现它甚至更进一步，通过不断传播将minus ...的内容变成D# -0.5##。太棒了！）

Answer 2

{-# LANGUAGE RankNTypes #-}

import Debug.Trace

--Does not get cached
data Translation = Trans { m :: forall a. Floating a => a }

f :: Floating a => a -> a
f x = trace "f" $ x - 2.0

由于a是一个刚性类型变量，受上下文期望的类型限制 forall a. Floating a => a您还必须缓存上下文

--Does get cached
data Translation' = Trans' { m' :: Float }

f' :: Float -> Float
f' x = trace "f'" $ x - 2.0

由于这是Float类型的值，因此只能计算一次并在以后进行缓存。

main :: IO ()
main = do
    let
        trans = Trans { m = f 1.5 }
        trans' = Trans' { m' = f' 1.5}

    putStrLn $ show $ (m trans :: Double)
    putStrLn $ show $ (m trans :: Float)
    -- ^ you can evaluate it with 2 different contexts

    putStrLn $ show $ (m' trans' :: Float)
    putStrLn $ show $ (m' trans' :: Float)
    -- ^ context fixed

请注意，无论打开还是关闭编译器优化，前一个都不会被缓存。

当它们都是Float时，您打开了优化功能，问题就不复存在了。

如果您通过优化编译较大的系统，而在某些指标上效率低下，我会怀疑问题出在其他地方。

Haskell中使用多态记录的意外缓存行为

2 个答案: