我想问你Haskell和C ++编译器是否可以用同样的方式优化函数调用。 请看下面的代码。在下面的示例中,Haskell明显快于C ++。
我听说Haskell可以编译为LLVM,可以通过LLVM传递进行优化。另外我听说Haskell有一些重要的优化。 但是以下示例应该能够以相同的性能工作。 我想问一下:
(我正在使用LLVM-3.2和GHC-7.6)。
C ++代码:
#include <cstdio>
#include <cstdlib>
int b(const int x){
return x+5;
}
int c(const int x){
return b(x)+1;
}
int d(const int x){
return b(x)-1;
}
int a(const int x){
return c(x) + d(x);
}
int main(int argc, char* argv[]){
printf("Starting...\n");
long int iternum = atol(argv[1]);
long long int out = 0;
for(long int i=1; i<=iternum;i++){
out += a(iternum-i);
}
printf("%lld\n",out);
printf("Done.\n");
}
使用clang++ -O3 main.cpp
haskell代码:
module Main where
import qualified Data.Vector as V
import System.Environment
b :: Int -> Int
b x = x + 5
c x = b x + 1
d x = b x - 1
a x = c x + d x
main = do
putStrLn "Starting..."
args <- getArgs
let iternum = read (head args) :: Int in do
putStrLn $ show $ V.foldl' (+) 0 $ V.map (\i -> a (iternum-i))
$ V.enumFromTo 1 iternum
putStrLn "Done."
使用ghc -O3 --make -fforce-recomp -fllvm ghc-test.hs
速度结果:
Running testcase for program 'cpp/a.out'
-------------------
cpp/a.out 100000000 0.0% avg time: 105.05 ms
cpp/a.out 200000000 11.11% avg time: 207.49 ms
cpp/a.out 300000000 22.22% avg time: 309.22 ms
cpp/a.out 400000000 33.33% avg time: 411.7 ms
cpp/a.out 500000000 44.44% avg time: 514.07 ms
cpp/a.out 600000000 55.56% avg time: 616.7 ms
cpp/a.out 700000000 66.67% avg time: 718.69 ms
cpp/a.out 800000000 77.78% avg time: 821.32 ms
cpp/a.out 900000000 88.89% avg time: 923.18 ms
cpp/a.out 1000000000 100.0% avg time: 1025.43 ms
Running testcase for program 'hs/main'
-------------------
hs/main 100000000 0.0% avg time: 70.97 ms (diff: 34.08)
hs/main 200000000 11.11% avg time: 138.95 ms (diff: 68.54)
hs/main 300000000 22.22% avg time: 206.3 ms (diff: 102.92)
hs/main 400000000 33.33% avg time: 274.31 ms (diff: 137.39)
hs/main 500000000 44.44% avg time: 342.34 ms (diff: 171.73)
hs/main 600000000 55.56% avg time: 410.65 ms (diff: 206.05)
hs/main 700000000 66.67% avg time: 478.25 ms (diff: 240.44)
hs/main 800000000 77.78% avg time: 546.39 ms (diff: 274.93)
hs/main 900000000 88.89% avg time: 614.12 ms (diff: 309.06)
hs/main 1000000000 100.0% avg time: 682.32 ms (diff: 343.11)
修改 当然,我们无法比较语言的速度,而是实现的速度。
但我很好奇Ghc和C ++编译器能否以同样的方式优化函数调用
我根据您的帮助使用新的基准和代码编辑了问题:)
答案 0 :(得分:21)
如果你的目标是让你的C ++编译器运行得那么快,那么你 我希望使用编译器可以使用的数据结构。
module Main where
import qualified Data.Vector as V
b :: Int -> Int
b x = x + 5
c x = b x + 1
d x = b x - 1
a x = c x + d x
main = do
putStrLn "Starting..."
putStrLn $ show $ V.foldl' (+) 0 $ V.map a $ V.enumFromTo 1 100000000
putStrLn "Done."
GHC能够完全消除循环,只需插入一个常量 最终的组件。在我的计算机上,现在它的运行时间为&lt; 0.002s,何时 使用与最初指定的相同的优化标志。
作为后续基于@Yuras的评论,由核心产生的核心 基于矢量的解决方案和流融合解决方案在功能上 相同。
main_$s$wfoldlM'_loop [Occ=LoopBreaker]
:: Int# -> Int# -> Int#
main_$s$wfoldlM'_loop =
\ (sc_s2hW :: Int#) (sc1_s2hX :: Int#) ->
case <=# sc1_s2hX 100000000 of _ {
False -> sc_s2hW;
True ->
main_$s$wfoldlM'_loop
(+#
sc_s2hW
(+#
(+# (+# sc1_s2hX 5) 1)
(-# (+# sc1_s2hX 5) 1)))
(+# sc1_s2hX 1)
}
$wloop_foldl [Occ=LoopBreaker]
:: Int# -> Int# -> Int#
$wloop_foldl =
\ (ww_s1Rm :: Int#) (ww1_s1Rs :: Int#) ->
case ># ww1_s1Rs 100000000 of _ {
False ->
$wloop_foldl
(+#
ww_s1Rm
(+#
(+# (+# ww1_s1Rs 5) 1)
(-# (+# ww1_s1Rs 5) 1)))
(+# ww1_s1Rs 1);
True -> ww_s1Rm
}
唯一真正的区别是选择比较操作 终止条件。两个版本都编译为紧密尾递归循环 这可以通过LLVM轻松优化。
答案 1 :(得分:10)
ghc不会融合列表(不惜一切代价避免成功?)
以下是使用stream-fusion包的版本:
module Main where
import Prelude hiding (map, foldl)
import Data.List.Stream
import Data.Stream (enumFromToInt, unstream)
import Text.Printf
import Control.Exception
import System.CPUTime
b :: Int -> Int
b x = x + 5
c x = b x + 1
d x = b x - 1
a x = c x + d x
main = do
putStrLn "Starting..."
putStrLn $ show $ foldl (+) 0 $ map (\z -> a z) $ unstream $ enumFromToInt 1 100000000
putStrLn "Done."
我没有安装llvm来与你的结果进行比较,但它比你的版本快10倍(没有llvm编译)。
我认为矢量融合的表现应该更快。
答案 2 :(得分:7)
正如其他人所指出的,你不是在比较等效的算法。由于Yuras pointed out GHC不会使列表融合。你的Haskell版本实际上将分配整个列表,它将一次懒惰地完成一个单元格,但它将完成。下面的版本在算法上更接近您的C版本。在我的系统上,它与C版本同时运行。
{-# LANGUAGE BangPatterns #-}
module Main where
import Text.Printf
import Control.Exception
import System.CPUTime
import Data.List
a,b,c :: Int -> Int
b x = x + 5
c x = b x + 1
d x = b x - 1
a !x = c x + d x
-- Don't allocate a list, iterate and increment as the C version does.
applyTo !acc !n
| n > 100000000 = acc
| otherwise = applyTo (acc + a n) (n + 1)
main = do
putStrLn "Starting..."
print $ applyTo 0 1
putStrLn "Done."
将其与time
进行比较:
ghc -O3 bench.hs -fllvm -fforce-recomp -o bench-hs && time ./bench-hs
[1 of 1] Compiling Main ( bench.hs, bench.o )
Linking bench-hs ...
Starting...
10000001100000000
Done.
./bench-hs 0.00s user 0.00s system 0% cpu 0.003 total
与C相比:
clang++ -O3 bench.cpp -o bench && time ./bench
Starting...
10000001100000000
Done.
./bench 0.00s user 0.00s system 0% cpu 0.004 total