任何函数式语言编译器/运行时是否会在应用时将所有链式迭代减少为一个?从程序员的角度来看,我们可以使用诸如懒惰和流等结构来优化功能代码,但我很想知道故事的另一面。 我的功能示例是用Scala编写的,但请不要限制您对该语言的回答。
功能方式:
// I assume the following line of code will go
// through the collection 3 times, one for creating it
// one for filtering it and one for summing it
val sum = (1L to 1000000L).filter(_ % 2 == 0).sum // => 250000500000
我希望编译器优化到命令式等效于:
/* One iteration only */
long sum, i;
for (i = 1L, sum = 0L; i <= 1000000L; i++) {
if (i % 2 == 0)
sum += i;
}
答案 0 :(得分:14)
类似的代码(带有起始和结束的参数,因此无法进行编译时评估)
val :: Int -> Int -> Int
val low high = sum $ filter even [low .. high]
只用一次遍历计算总和,并在恒定的小内存中计算。 [low .. high]
是enumFromTo low high
的语法糖,enumFromTo
Int
的定义基本上是
enumFromTo x y
| y < x = []
| otherwise = go x
where
go k = k : if k == y then [] else go (k+1)
(实际上,GHC的实现使用未装箱的Int#
来提高工作人员go
的效率,但这对语义没有影响;对于其他Integral
类型,定义是类似)。
filter
的定义是
filter :: (a -> Bool) -> [a] -> [a]
filter _pred [] = []
filter pred (x:xs)
| pred x = x : filter pred xs
| otherwise = filter pred xs
和sum
:
sum l = sum' l 0
where
sum' [] a = a
sum' (x:xs) a = sum' xs (a+x)
即使没有任何优化,也要进行评估,以便进行评估
sum' (filter even (enumFromTo 1 6)) 0
-- Now it must be determined whether the first argument of sum' is [] or not
-- For that, the application of filter must be evaluated
-- For that, enumFromTo must be evaluated
~> sum' (filter even (1 : go 2)) 0
-- Now filter knows which equation to use, unfortunately, `even 1` is False
~> sum' (filter even (go 2)) 0
~> sum' (filter even (2 : go 3)) 0
-- 2 is even, so
~> sum' (2 : filter even (go 3)) 0
~> sum' (filter even (go 3)) (0+2)
-- Once again, sum asks whether filter is done or not, so filter demands another value or []
-- from go
~> sum' (filter even (3 : go 4)) 2
~> sum' (filter even (go 4)) 2
~> sum' (filter even (4 : go 5)) 2
~> sum' (4 : filter even (go 5)) 2
~> sum' (filter even (go 5)) (2+4)
~> sum' (filter even (5 : go 6)) 6
~> sum' (filter even (go 6)) 6
~> sum' (filter even (6 : [])) 6
~> sum' (6 : filter even []) 6
~> sum' (filter even []) (6+6)
~> sum' [] 12
~> 12
这当然不如循环效率低,因为对于枚举的每个元素,必须生成列表单元格,然后对于通过过滤器的每个元素,必须生成列表单元格,只是立即消耗总和。
让我们检查内存使用量是否确实很小:
module Main (main) where
import System.Environment (getArgs)
main :: IO ()
main = do
args <- getArgs
let (low, high) = case args of
(a:b:_) -> (read a, read b)
_ -> error "Want two args"
print $ sum $ filter even [low :: Int .. high]
并运行它,
$ ./sumEvens +RTS -s -RTS 1 1000000
250000500000
40,071,856 bytes allocated in the heap
12,504 bytes copied during GC
44,416 bytes maximum residency (2 sample(s))
21,120 bytes maximum slop
1 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 75 colls, 0 par 0.00s 0.00s 0.0000s 0.0000s
Gen 1 2 colls, 0 par 0.00s 0.00s 0.0002s 0.0003s
INIT time 0.00s ( 0.00s elapsed)
MUT time 0.01s ( 0.01s elapsed)
GC time 0.00s ( 0.00s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 0.01s ( 0.01s elapsed)
%GC time 6.1% (7.6% elapsed)
Alloc rate 4,367,976,530 bytes per MUT second
Productivity 91.8% of total user, 115.8% of total elapsed
它为50万个列表单元(1)分配了大约40MB并且有一些变化,但最大驻留时间约为44KB。运行它的上限为1000万,整体分配(和运行时间)增长10倍(减去常数),但最大驻留时间保持不变。
(1) GHC融合枚举和过滤器,并仅生成类型Int
范围内的偶数。不幸的是,它不能融合sum
,因为这是一个左侧折叠,而GHC的融合框架只能融合正确的折叠。
现在,为了融合sum
,必须做很多工作来教GHC用重写规则来做到这一点。幸运的是,vector
包中的许多算法已经完成,如果我们使用它,
module Main where
import qualified Data.Vector.Unboxed as U
import System.Environment (getArgs)
val :: Int -> Int -> Int
val low high = U.sum . U.filter even $ U.enumFromN low (high - low + 1)
main :: IO ()
main = do
args <- getArgs
let (low, high) = case args of
(a:b:_) -> (read a, read b)
_ -> error "Want two args"
print $ val low high
我们得到一个更快的程序,甚至不再分配任何列表单元格,管道 真的被重写到循环中:
$ ./sumFilter +RTS -s -RTS 1 10000000
25000005000000
72,640 bytes allocated in the heap
3,512 bytes copied during GC
44,416 bytes maximum residency (1 sample(s))
17,024 bytes maximum slop
1 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 0 colls, 0 par 0.00s 0.00s 0.0000s 0.0000s
Gen 1 1 colls, 0 par 0.00s 0.00s 0.0001s 0.0001s
INIT time 0.00s ( 0.00s elapsed)
MUT time 0.01s ( 0.01s elapsed)
GC time 0.00s ( 0.00s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 0.01s ( 0.01s elapsed)
%GC time 1.0% (1.2% elapsed)
Alloc rate 7,361,805 bytes per MUT second
Productivity 97.7% of total user, 111.5% of total elapsed
如果有人感兴趣,这就是GHC为({1}}的工作者所生成的核心:
val
答案 1 :(得分:2)
理论上,正如一位评论者写的那样,编译器可以在编译时将其减少到结果。用一些宏完成这一点并不是不可想象的,但在一般情况下不太可能。
如果插入.view
调用,则会在Scala中获得延迟语义,因此只会执行一次迭代,尽管不像命令式代码那样简单:
val lz = (1L to 1000000L).view.filter(_ % 2 == 0) // SeqView (lazy)!
lz.sum
P.S。你的假设是错误的,否则有三次迭代。 (1L to 1000000L)
创建一个NumericRange
,它不涉及对元素的任何迭代。因此.view
可以为您节省一次迭代次数。
答案 2 :(得分:2)
几年前我发布了两篇关于这个主题的博客文章:
http://jnordenberg.blogspot.de/2010/03/scala-stream-fusion-and-specialization.html http://jnordenberg.blogspot.de/2010/05/scala-stream-fusion-and-specialization.html
请注意,Scala编译器完成的专业化和优化从那时起已经有了很大的改进(可能也在Hotspot中),因此今天的结果可能会更好。