作为练习,我把这些Scala和Java Akka的例子移到了弗雷格。虽然它工作正常,但它比Scala(540ms)对应的运行速度慢(11s)。
module mmhelloworld.akkatutorialfregecore.Pi where
import mmhelloworld.akkatutorialfregecore.Akka
data PiMessage = Calculate |
Work {start :: Int, nrOfElements :: Int} |
Result {value :: Double} |
PiApproximation {pi :: Double, duration :: Duration}
data Worker = private Worker where
calculatePiFor :: Int -> Int -> Double
calculatePiFor !start !nrOfElements = loop start nrOfElements 0.0 f where
loop !curr !n !acc f = if n == 0 then acc
else loop (curr + 1) (n - 1) (f acc curr) f
f !acc !i = acc + (4.0 * fromInt (1 - (i `mod` 2) * 2) / fromInt (2 * i + 1))
onReceive :: Mutable s UntypedActor -> PiMessage -> ST s ()
onReceive actor Work{start=start, nrOfElements=nrOfElements} = do
sender <- actor.sender
self <- actor.getSelf
sender.tellSender (Result $ calculatePiFor start nrOfElements) self
data Master = private Master {
nrOfWorkers :: Int,
nrOfMessages :: Int,
nrOfElements :: Int,
listener :: MutableIO ActorRef,
pi :: Double,
nrOfResults :: Int,
workerRouter :: MutableIO ActorRef,
start :: Long } where
initMaster :: Int -> Int -> Int -> MutableIO ActorRef -> MutableIO UntypedActor -> IO Master
initMaster nrOfWorkers nrOfMessages nrOfElements listener actor = do
props <- Props.forUntypedActor Worker.onReceive
router <- RoundRobinRouter.new nrOfWorkers
context <- actor.getContext
workerRouter <- props.withRouter router >>= (\p -> context.actorOf p "workerRouter")
now <- currentTimeMillis ()
return $ Master nrOfWorkers nrOfMessages nrOfElements listener 0.0 0 workerRouter now
onReceive :: MutableIO UntypedActor -> Master -> PiMessage -> IO Master
onReceive actor master Calculate = do
self <- actor.getSelf
let tellWorker start = master.workerRouter.tellSender (work start) self
work start = Work (start * master.nrOfElements) master.nrOfElements
forM_ [0 .. master.nrOfMessages - 1] tellWorker
return master
onReceive actor master (Result newPi) = do
let (!newNrOfResults, !pi) = (master.nrOfResults + 1, master.pi + newPi)
when (newNrOfResults == master.nrOfMessages) $ do
self <- actor.getSelf
now <- currentTimeMillis ()
duration <- Duration.create (now - master.start) TimeUnit.milliseconds
master.listener.tellSender (PiApproximation pi duration) self
actor.getContext >>= (\context -> context.stop self)
return master.{pi=pi, nrOfResults=newNrOfResults}
data Listener = private Listener where
onReceive :: MutableIO UntypedActor -> PiMessage -> IO ()
onReceive actor (PiApproximation pi duration) = do
println $ "Pi approximation: " ++ show pi
println $ "Calculation time: " ++ duration.toString
actor.getContext >>= ActorContext.system >>= ActorSystem.shutdown
calculate nrOfWorkers nrOfElements nrOfMessages = do
system <- ActorSystem.create "PiSystem"
listener <- Props.forUntypedActor Listener.onReceive >>= flip system.actorOf "listener"
let constructor = Master.initMaster nrOfWorkers nrOfMessages nrOfElements listener
newMaster = StatefulUntypedActor.new constructor Master.onReceive
factory <- UntypedActorFactory.new newMaster
masterActor <- Props.fromUntypedFactory factory >>= flip system.actorOf "master"
masterActor.tell Calculate
getLine >> return () --Not to exit until done
main _ = calculate 4 10000 10000
我是否在做Akka的错误或者是否因为弗雷格的懒惰而感到缓慢?例如,当我最初在fold
中使用loop
(严格折叠)代替Worker.calculatePiFor
时,花了27秒。
依赖关系:
答案 0 :(得分:6)
我对Actors并不完全熟悉,但假设最紧密的循环确实是loop
,你可以避免将函数f
作为参数传递。
首先,传递函数的应用程序无法利用实际传递函数的严格性。相反,代码生成必须保守地假设传递的函数懒惰地获取其参数并返回惰性结果。
其次,在我们的例子中,你只使用f
一次,所以可以内联它。 (这是在你链接的文章中的scala代码中完成的。)
在下面的模仿你的示例代码中查看为尾递归生成的代码:
test b c = loop 100 0 f
where
loop 0 !acc f = acc
loop n !acc f = loop (n-1) (acc + f (acc-1) (acc+1)) f -- tail recursion
f x y = 2*x + 7*y
我们到达那里:
// arg2$f is the accumulator
arg$2 = arg$2f + (int)frege.runtime.Delayed.<java.lang.Integer>forced(
f_3237.apply(PreludeBase.INum_Int._minusƒ.apply(arg$2f, 1)).apply(
PreludeBase.INum_Int._plusƒ.apply(arg$2f, 1)
).result()
);
你在这里看到f
被称为懒惰,这导致所有参数expressios也被懒惰地计算。注意这需要的方法调用次数!
在您的情况下,代码仍应类似于:
(double)Delayed.<Double>forced(f.apply(acc).apply(curr).result())
这意味着,使用盒装值acc和curr构建两个闭包,然后计算结果,即使用未装箱的参数调用函数f
,结果再次装箱,只是为了取消装箱再次(强制)进行下一个循环。
现在比较以下内容,我们只是不通过f
,而是直接调用它:
test b c = loop 100 0
where
loop 0 !acc = acc
loop n !acc = loop (n-1) (acc + f (acc-1) (acc+1))
f x y = 2*x + 7*y
我们得到:
arg$2 = arg$2f + f(arg$2f - 1, arg$2f + 1);
好多了! 最后,在上面的例子中,我们可以完全没有函数调用:
loop n !acc = loop (n-1) (acc + f) where
f = 2*x + 7*y
x = acc-1
y = acc+1
这就得到了:
final int y_3236 = arg$2f + 1;
final int x_3235 = arg$2f - 1;
...
arg$2 = arg$2f + ((2 * x_3235) + (7 * y_3236));
请尝试一下,让我们知道会发生什么。性能的主要提升应该来自不通过f
,而内联可能会在JIT中完成。
使用fold
的额外费用可能是因为您在申请之前还必须创建一些列表。