为了了解GHC的并行策略,我编写了一个简单的粒子模拟器,在给定粒子的位置,速度和加速度的情况下,它将投射出粒子'前进的道路。
import Control.Parallel.Strategies
-- Use phantom a to store axis.
newtype Pos a = Pos Double deriving Show
newtype Vel a = Vel Double deriving Show
newtype Acc a = Acc Double deriving Show
newtype TimeStep = TimeStep Double deriving Show
-- Phantom axis
data X
data Y
-- Position, velocity, acceleration for a particle.
data Particle = Particle (Pos X) (Pos Y) (Vel X) (Vel Y) (Acc X) (Acc Y) deriving (Show)
stepParticle :: TimeStep -> Particle -> Particle
stepParticle ts (Particle x y xv yv xa ya) =
Particle x' y' xv' yv' xa' ya'
where
(x', xv', xa') = step ts x xv xa
(y', yv', ya') = step ts y yv ya
-- Given a position, velocity, and accel, calculate the pos, vel, acc after
-- a given TimeStep.
step :: TimeStep -> Pos a -> Vel a -> Acc a -> (Pos a, Vel a, Acc a)
step (TimeStep ts) (Pos p) (Vel v) (Acc a) = (Pos p', Vel v', Acc a)
where
v' = ts * a + v
p' = ts * v + p
-- Build a list of lazy infinite lists of a particles' travel
-- with each update a TimeStep apart. Evaluate each inner list in
-- parallel.
simulateParticlesPar :: TimeStep -> [Particle] -> [[Particle]]
simulateParticlesPar ts = withStrategy (parList (parBuffer 250 particleStrategy))
. fmap (simulateParticle ts)
-- Build a lazy infinite list of the particle's travel with each
-- update being a TimeStep apart.
simulateParticle :: TimeStep -> Particle -> [Particle]
simulateParticle ts m = m' : simulateParticle ts m'
where
m' = stepParticle ts m
particleStrategy :: Strategy Particle
particleStrategy (Particle (Pos x) (Pos y) (Vel xv) (Vel yv) (Acc xa) (Acc ya)) = do
x' <- rseq x
y' <- rseq y
xv' <- rseq xv
yv' <- rseq yv
xa' <- rseq xa
ya' <- rseq ya
return $ Particle (Pos x') (Pos y') (Vel xv') (Vel yv') (Acc xa') (Acc ya')
main :: IO ()
main = do
let world = replicate 100 (Particle (Pos 0) (Pos 0) (Vel 1) (Vel 1) (Acc 0) (Acc 0))
ts = TimeStep 0.1
print $ fmap (take 10000) (simulateParticlesPar ts world)
对于每个粒子,我创建了一个懒惰的无限列表,将粒子的路径投射到未来。我从100个这些粒子开始并将这些粒子向前投射,我的意图是将这些粒子并行投射(大致是每个无限列表中的一个火花)。如果我将这些列表推进到足够长的时间,我预计会有显着的加速。不幸的是,我看到了轻微的减速。
编译:{{1}}
使用1个帖子:
ghc phys.hs -rtsopts -threaded -eventlog -O2
有2个帖子:
$ ./phys +RTS -N1 -sstderr -ls > /dev/null
24,264,983,224 bytes allocated in the heap
441,881,088 bytes copied during GC
1,942,848 bytes maximum residency (104 sample(s))
75,880 bytes maximum slop
7 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 46820 colls, 0 par 0.82s 0.88s 0.0000s 0.0039s
Gen 1 104 colls, 0 par 0.23s 0.23s 0.0022s 0.0037s
TASKS: 4 (1 bound, 3 peak workers (3 total), using -N1)
SPARKS: 1025000 (25 converted, 0 overflowed, 0 dud, 28680 GC'd, 996295 fizzled)
INIT time 0.00s ( 0.00s elapsed)
MUT time 9.90s ( 10.09s elapsed)
GC time 1.05s ( 1.11s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 10.95s ( 11.20s elapsed)
Alloc rate 2,451,939,648 bytes per MUT second
Productivity 90.4% of total user, 88.4% of total elapsed
gc_alloc_block_sync: 0
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 0
我的Intel i5有2个内核和4个线程,而且它有-N4,它比-N1慢2倍(总时间~20秒)。
我花了不少时间尝试不同的策略,例如分块外部列表(因此每个spark获得多个流向前投射)并在particleStrategy中为每个字段使用rpar,但是我&#39但是还没有加速。
下面是threadscope下的事件日志的放大部分。如您所见,我几乎没有并发。大部分工作由HEC0完成,HEC1的一些活动交错进行,但一次只有一个HEC工作。这非常代表我所尝试的所有策略。
作为一个完整性检查,我已经运行了一些示例程序来自&#34; Haskell中的并行和并发编程&#34;并且看到这些程序的速度减慢,尽管我使用了相同的参数,这些参数在书中给了他们显着的加速!我开始认为我的ghc出了问题。
$ ./phys +RTS -N2 -sstderr -ls > /dev/null
24,314,635,280 bytes allocated in the heap
457,603,240 bytes copied during GC
1,962,152 bytes maximum residency (104 sample(s))
119,824 bytes maximum slop
7 MB total memory in use (0 MB lost due to fragmentation)
Tot time (elapsed) Avg pause Max pause
Gen 0 46555 colls, 46555 par 1.40s 0.85s 0.0000s 0.0048s
Gen 1 104 colls, 103 par 0.42s 0.25s 0.0024s 0.0043s
Parallel GC work balance: 16.85% (serial 0%, perfect 100%)
TASKS: 6 (1 bound, 5 peak workers (5 total), using -N2)
SPARKS: 1025000 (1023572 converted, 0 overflowed, 0 dud, 1367 GC'd, 61 fizzled)
INIT time 0.00s ( 0.00s elapsed)
MUT time 11.07s ( 11.20s elapsed)
GC time 1.82s ( 1.10s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 12.89s ( 12.30s elapsed)
Alloc rate 2,196,259,905 bytes per MUT second
Productivity 85.9% of total user, 90.0% of total elapsed
gc_alloc_block_sync: 9222
whitehole_spin: 0
gen[0].sync: 0
gen[1].sync: 2393
安装自:https://ghcformacosx.github.io/
OS X 10.10.2
更新
我在ghc跟踪器中发现了一个OS X线程RTS性能回归:https://ghc.haskell.org/trac/ghc/ticket/7602。我对指责编译器犹豫不决,但我的-N4输出支持这个假设。 &#34;并行gc字平衡&#34;太可怕了:
$ ghc --version
The Glorious Glasgow Haskell Compilation System, version 7.8.3
另一方面,我不知道这是否解释了我的threadscope输出,它显示缺乏任何并发性。