我试图了解如何使用性能分析。这是来自USACO 2013的问题“视线”的解决方案。
import Data.Array.Unboxed
import Data.List
import Data.Int
angle !a | a > 2 * pi = a - 2 * pi
angle !a | a < 0 = a + 2 * pi
angle !a = a
tans :: Int64 -> [[Int64]] -> UArray (Int,Int) Double
tans r cs = listArray ((0,0), (length cs - 1, 1)) $ concatMap f cs where
f :: [Int64] -> [Double]
f [x,y] = [angle a2, angle a1] where
phi | y == 0 = if x < 0 then pi else 0.0
| otherwise = (fromIntegral $ signum y) * (acos $ (fromIntegral x) / d)
d = sqrt $ fromIntegral $ x*x + y*y
z = sqrt $ fromIntegral $ x*x + y*y - r*r
a1 = phi + (acos $ (fromIntegral r)/d)
a2 = phi - (acos $ (fromIntegral r)/d)
overlap !a1 !a2 !a1' !a2'
| a1 < a2 && a1' < a2' = a1 <= a2' && a1' <= a2
| a1 > a2 && a1' > a2' = overlap (a1 - 2*pi) a2 (a1' - 2*pi) a2'
| a1 > a2 && a1' <= pi = overlap (a1 - 2*pi) a2 a1' a2'
| a1 > a2 = overlap a1 (a2 + 2*pi) a1' a2'
| a1 <= pi = overlap a1 a2 (a1' - 2*pi) a2'
| otherwise = overlap a1 a2 a1' (a2' + 2 * pi)
solve cows = length $ [ 1
| i <- [0..n]
, j <- [i+1..n]
, let a1 = cows ! (i,0)
, let a2 = cows ! (i,1)
, let a1' = cows ! (j,0)
, let a2' = cows ! (j,1)
, overlap a1 a2 a1' a2' ] where
((0,0),(n,1)) = bounds cows
main = do
ls <- getContents
let ([n, r]: cows ) = map (map read . words) $ lines ls
print $ solve $ tans r cows
我正在使用http://www.usaco.org/current/data/sight.zip中的示例数据集5.in并获取以下配置文件:
$ ghc -O2 -XBangPatterns -ddump-simpl sight3.hs
$ ./sight3 < 5.in
...
Sun Dec 01 23:35 2013 Time and Allocation Profiling Report (Final)
sight3.EXE +RTS -p -hd -RTS
total time = 10.46 secs (10459 ticks @ 1000 us, 1 processor)
total alloc = 1,847,301,536 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
solve Main 65.2 30.7
overlap Main 14.4 0.0
solve.a2' Main 8.9 32.5
solve.a1' Main 8.6 32.5
main.(...) Main 2.8 4.0
individual inherited
COST CENTRE MODULE no. entries %time %alloc %time %alloc
MAIN MAIN 49 0 0.0 0.0 100.0 100.0
main Main 99 0 0.0 0.1 99.9 100.0
main.r Main 110 1 0.0 0.0 0.0 0.0
tans Main 105 1 0.0 0.0 0.0 0.1
tans.f Main 106 10000 0.0 0.1 0.0 0.1
tans.f.a1 Main 112 10000 0.0 0.0 0.0 0.0
angle Main 111 20000 0.0 0.0 0.0 0.0
tans.f.d Main 109 10000 0.0 0.0 0.0 0.0
tans.f.phi Main 108 10000 0.0 0.0 0.0 0.0
tans.f.a2 Main 107 10000 0.0 0.0 0.0 0.0
solve Main 104 1 65.2 30.7 97.1 95.7
overlap Main 117 64368980 14.4 0.0 14.4 0.0
solve.a2' Main 116 49995000 8.9 32.5 8.9 32.5
solve.a1' Main 115 49995000 8.6 32.5 8.6 32.5
solve.a2 Main 114 9999 0.0 0.0 0.0 0.0
solve.a1 Main 113 9999 0.0 0.0 0.0 0.0
solve.(...) Main 103 1 0.0 0.0 0.0 0.0
solve.n Main 102 1 0.0 0.0 0.0 0.0
main.cows Main 101 1 0.0 0.0 0.0 0.0
main.(...) Main 100 1 2.8 4.0 2.8 4.0
CAF GHC.IO.Encoding.CodePage 83 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Handle.Internals 82 0 0.0 0.0 0.0 0.0
CAF Text.Read.Lex 79 0 0.1 0.0 0.1 0.0
CAF GHC.IO.Encoding 75 0 0.0 0.0 0.0 0.0
CAF GHC.Int 71 0 0.0 0.0 0.0 0.0
CAF GHC.IO.Handle.FD 67 0 0.0 0.0 0.0 0.0
CAF:main1 Main 63 0 0.0 0.0 0.0 0.0
main Main 98 1 0.0 0.0 0.0 0.0
CAF:lvl3_r3iU Main 59 0 0.0 0.0 0.0 0.0
在solve.a1'和a2'中分配了什么?我认为严格,它不会分配任何东西(并且计算与solve.a1没有区别)
如何找出CPU用于解决的问题?我希望花在重叠上的成本最高,相比之下封闭的循环非常便宜。
(为了流浪读者,我补充一点,这纯粹是一个分析练习 - 我确实有一个快了几百倍的解决方案,但即使使用简单的列表,它仍然从剖析的角度来看很无聊)
答案 0 :(得分:3)
ghc无法通过构建列表来折叠长度计算 - 即它分配列表单元格。
如果将solve
重写为显式循环,则分配将消失:
solve cows = n `seq` go 0 0 1 n
where
(_,(n,_)) = bounds cows
go count i j n | i > n = count
| j > n = go count (i+1) (i+2) n
| overlap (cows ! (i,0)) (cows ! (i,1)) (cows ! (j,0)) (cows ! (j,1))
= go (count + 1) i (j + 1) n
| otherwise = go count i (j + 1) n
至于为什么分配归因于a1'和a2',我不知道。
Cpu的使用由go
函数主导,这可能意味着数组访问。 overlap
只占总运行时间的15%左右。
编辑:这里(不太可读)版本,其中两个数组访问移出内循环:
solve !cows = n `seq` go 0 0
where
(_,(n,_)) = bounds cows
go !count !i | i >= n = count
| otherwise = go2 count i (i+1) (cows ! (i,0)) (cows ! (i,1))
go2 !count !i !j !a1 !a2 | j > n = go count (i+1)
| overlap a1 a2 (cows ! (j,0)) (cows ! (j,1))
= go2 (count+1) i (j+1) a1 a2
| otherwise = go2 count i (j+1) a1 a2