我一直在学习Haskell,并且我制作了一些可用于Int
或Float
类型列表的统计函数。
以下是两个例子:
mean :: (Real a, Floating b) => [a] -> b
mean xs = realToFrac (sum xs) / genericLength xs
sumOfSquares :: (Real a, Floating b) => [a] -> b
sumOfSquares xs = [(mean xs - realToFrac x) ^^ 2 | x <- xs]
但是,我现在想知道是否可以通过不同地编写sumOfSquares
来改善运行时。像这样:
sumOfSquares xs = [(m - realToFrac x) ^^ 2 | x <- xs] where m = mean xs
或者像这样:
sumOfSquares xs = [(m - realToFrac x) ^^ 2 | x <- xs, let m = mean xs]
我不知道Haskell如何在屏幕后面处理。在原始版本中,计算机是否为mean xs
中的每个条目计算xs
?我认为Haskell并不能在后台优化理解。我错了吗?
答案 0 :(得分:2)
desugaring会产生不同的代码。您在这里考虑了三种实现方式。如果没有列表理解,我会展示他们每个人的样子。
-- With no let / where
sumOfSquares xs = [(mean xs - realToFrac x) ^^ 2 | x <- xs]
sumOfSquares xs = map (\x -> (mean xs - realToFrac x) ^^ 2) xs
-- With let inside the comprehension
sumOfSquares xs = [(m - realToFrac x) ^^ 2 | x <- xs, let m = mean xs]
sumOfSquares xs = map (\x -> let m = mean xs in (m - realToFrac x) ^^ 2) xs
-- With where outside the comprehension
sumOfSquares xs = [(m - realToFrac x) ^^ 2 | x <- xs] where m = mean xs
sumOfSquares xs = map (\x -> (m - realToFrac x) ^^ 2) xs where m = mean xs
所以前两个基本相同。如果您拥有世界上最简单的编译器(或者您关闭了优化),那么您将获得相当低效的代码,每次迭代都会重新计算平均值。第三个是通过将计算实际放在where
块之外来完全避免这种情况。它并不保证每一步都不会重新计算该值(Haskell可以自由计算,但只要结果与惰性求值等效,它就会选择它),但它确实改进了如果你的编译器不是很聪明,那么几率。
然而,现实,GHC(和大多数其他编译器)将执行所谓的浮出优化。基本上,如果它看到let
块可以移动到重复计算之外,则可以自由地这样做,因为没有可能的副作用会在这样做时发生变化。一般而言,GHC将采用表格
\x -> let y = something in f x y
并将它们转换为
let y = something in \x -> f x y
只要x
在something
内不显示为自由变量,就允许进行此优化,并且大多数编译器都非常聪明地执行此操作,因此您不必担心。
有一个excellent StackOverflow answer关于GHC执行的许多常见优化,这个答案部分受到启发。如果你想更多地了解Haskell的效率,我建议你查看一下。