我有演员需要做很长时间运行且计算量很大的工作,但计算本身可以逐步完成。因此,虽然完整的计算本身需要数小时才能完成,但中间结果实际上非常有用,我希望能够响应它们的任何请求。这是我想要做的伪代码:
var intermediateResult = ...
loop {
while (mailbox.isEmpty && computationNotFinished)
intermediateResult = computationStep(intermediateResult)
receive {
case GetCurrentResult => sender ! intermediateResult
...other messages...
}
}
答案 0 :(得分:6)
这样做的最佳方式非常接近你已经在做的事情:
case class Continue(todo: ToDo)
class Worker extends Actor {
var state: IntermediateState = _
def receive = {
case Work(x) =>
val (next, todo) = calc(state, x)
state = next
self ! Continue(todo)
case Continue(todo) if todo.isEmpty => // done
case Continue(todo) =>
val (next, rest) = calc(state, todo)
state = next
self ! Continue(rest)
}
def calc(state: IntermediateState, todo: ToDo): (IntermediateState, ToDo)
}
当演员向自己发送消息时,Akka的内部处理基本上会在while
循环内运行;一次性处理的消息数由actor的调度程序的throughput
设置(默认为5)确定,在此处理量之后,线程将返回到池中,并且继续作为新的顺序排入调度程序任务。因此,上述解决方案中有两个可调参数:
throughput
设置以提高吞吐量并降低公平性最初的问题似乎有数百个这样的演员在运行,大概是在没有数百个CPU的普通硬件上,所以应该设置吞吐量设置,使得每个批处理不超过ca.为10ms。
让我们与斐波那契玩一下:
Welcome to Scala version 2.10.0-RC1 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_07).
Type in expressions to have them evaluated.
Type :help for more information.
scala> def fib(x1: BigInt, x2: BigInt, steps: Int): (BigInt, BigInt) = if(steps>0) fib(x2, x1+x2, steps-1) else (x1, x2)
fib: (x1: BigInt, x2: BigInt, steps: Int)(BigInt, BigInt)
scala> def time(code: =>Unit) { val start = System.currentTimeMillis; code; println("code took " + (System.currentTimeMillis - start) + "ms") }
time: (code: => Unit)Unit
scala> time(fib(1, 1, 1000))
code took 1ms
scala> time(fib(1, 1, 1000))
code took 1ms
scala> time(fib(1, 1, 10000))
code took 5ms
scala> time(fib(1, 1, 100000))
code took 455ms
scala> time(fib(1, 1, 1000000))
code took 17172ms
这意味着在一个可能非常优化的循环中,fib_100000需要半秒钟。现在让我们与演员玩一下:
scala> case class Cont(steps: Int, batch: Int)
defined class Cont
scala> val me = inbox()
me: akka.actor.ActorDSL.Inbox = akka.actor.dsl.Inbox$Inbox@32c0fe13
scala> val a = actor(new Act {
var s: (BigInt, BigInt) = _
become {
case Cont(x, y) if y < 0 => s = (1, 1); self ! Cont(x, -y)
case Cont(x, y) if x > 0 => s = fib(s._1, s._2, y); self ! Cont(x - 1, y)
case _: Cont => me.receiver ! s
}
})
a: akka.actor.ActorRef = Actor[akka://repl/user/$c]
scala> time{a ! Cont(1000, -1); me.receive(10 seconds)}
code took 4ms
scala> time{a ! Cont(10000, -1); me.receive(10 seconds)}
code took 27ms
scala> time{a ! Cont(100000, -1); me.receive(10 seconds)}
code took 632ms
scala> time{a ! Cont(1000000, -1); me.receive(30 seconds)}
code took 17936ms
这已经很有趣了:如果每一步都有足够长的时间(在最后一行的幕后有巨大的BigInts),演员就没有多余的了。现在让我们看看在以更加批量的方式进行较小的计算时会发生什么:
scala> time{a ! Cont(10000, -10); me.receive(30 seconds)}
code took 462ms
这与上面的直接变体的结果非常接近。
几乎所有应用程序都向自己发送消息并不昂贵,只需保持实际处理步骤略大于几百纳秒。
答案 1 :(得分:4)
我从你对Roland Kuhn的评论中假设你有一些可以被认为是递归的工作,至少在块中。如果不是这种情况,我认为没有任何干净的解决方案来处理您的问题,您将不得不处理复杂的模式匹配块。
如果我的假设是正确的,我会异步安排计算,让演员可以自由回答其他消息。关键是使用Future monadic功能并具有简单的接收块。您必须处理三个消息(startComputation,changeState,getState)
您最终会收到以下内容:
def receive {
case StartComputation(myData) =>expensiveStuff(myData)
case ChangeState(newstate) = this.state = newstate
case GetState => sender ! this.state
}
然后,您可以通过定义自己的递归映射来利用Future上的map方法:
def mapRecursive[A](f:Future[A], handler: A => A, exitConditions: A => Boolean):Future[A] = {
f.flatMap { a=>
if (exitConditions(a))
f
else {
val newFuture = f.flatMap{ a=> Future(handler(a))}
mapRecursive(newFuture,handler,exitConditions)
}
}
}
拥有此工具后,一切都变得更轻松。如果您查看以下示例:
def main(args:Array[String]){
val baseFuture:Future[Int] = Promise.successful(64)
val newFuture:Future[Int] = mapRecursive(baseFuture,
(a:Int) => {
val result = a/2
println("Additional step done: the current a is " + result)
result
}, (a:Int) => (a<=1))
val one = Await.result(newFuture,Duration.Inf)
println("Computation finished, result = " + one)
}
它的输出是:
完成附加步骤:当前a为32
完成附加步骤:当前a为16
完成附加步骤:当前a为8
完成附加步骤:当前a为4
完成额外步骤:当前a为2
完成附加步骤:当前a为1
计算完成,结果= 1
您了解在expensiveStuff
方法
def expensiveStuff(myData:MyData):Future[MyData]= {
val firstResult = Promise.successful(myData)
val handler : MyData => MyData = (myData) => {
val result = myData.copy(myData.value/2)
self ! ChangeState(result)
result
}
val exitCondition : MyData => Boolean = (myData:MyData) => myData.value==1
mapRecursive(firstResult,handler,exitCondition)
}
编辑 - 更详细
如果您不想阻止以线程安全且同步的方式处理来自其邮箱的消息的Actor,您唯一能做的就是在不同的线程上执行计算。这正是一种高性能的非阻塞接收。
但是,你说得对,我提出的方法会给你带来很高的性能损失。每一步都是在不同的未来完成的,这可能根本就没有必要。因此,您可以递归处理程序以获取单线程或多线程执行。毕竟没有神奇的公式:
def recurseFuture[A](entryFuture: Future[A], handler: A => A, exitCondition: A => Boolean, maxNestedRecursion: Long = Long.MaxValue): Future[A] = {
def recurse(a:A, handler: A => A, exitCondition: A => Boolean, maxNestedRecursion: Long, currentStep: Long): Future[A] = {
if (exitCondition(a))
Promise.successful(a)
else
if (currentStep==maxNestedRecursion)
Promise.successful(handler(a)).flatMap(a => recurse(a,handler,exitCondition,maxNestedRecursion,0))
else{
recurse(handler(a),handler,exitCondition,maxNestedRecursion,currentStep+1)
}
}
entryFuture.flatMap { a => recurse(a,handler,exitCondition,maxNestedRecursion,0) }
}
我为测试目的增强了我的处理程序方法:
val handler: Int => Int = (a: Int) => {
val result = a / 2
println("Additional step done: the current a is " + result + " on thread " + Thread.currentThread().getName)
result
}
方法1:将处理程序递归到自身,以便在单个线程上执行所有操作。
println("Starting strategy with all the steps on the same thread")
val deepestRecursion: Future[Int] = recurseFuture(baseFuture,handler, exitCondition)
Await.result(deepestRecursion, Duration.Inf)
println("Completed strategy with all the steps on the same thread")
println("")
方法2:递归处理器本身的有限深度
println("Starting strategy with the steps grouped by three")
val threeStepsInSameFuture: Future[Int] = recurseFuture(baseFuture,handler, exitCondition,3)
val threeStepsInSameFuture2: Future[Int] = recurseFuture(baseFuture,handler, exitCondition,4)
Await.result(threeStepsInSameFuture, Duration.Inf)
Await.result(threeStepsInSameFuture2, Duration.Inf)
println("Completed strategy with all the steps grouped by three")
executorService.shutdown()
答案 2 :(得分:2)
你不应该使用Actors来进行长时间运行的计算,因为它们会阻塞应该运行Actors代码的线程。
我会尝试使用一个使用单独的Thread / ThreadPool进行计算的设计,并使用AtomicReferences在以下伪代码的行中存储/查询中间结果:
val cancelled = new AtomicBoolean(false)
val intermediateResult = new AtomicReference[IntermediateResult]()
object WorkerThread extends Thread {
override def run {
while(!cancelled.get) {
intermediateResult.set(computationStep(intermediateResult.get))
}
}
}
loop {
react {
case StartComputation => WorkerThread.start()
case CancelComputation => cancelled.set(true)
case GetCurrentResult => sender ! intermediateResult.get
}
}
答案 3 :(得分:1)
这是一个经典的并发问题。你想要几个例程/演员(或者你想要的任何东西)。代码大多是正确的Go,上下文的变量名称有多长。第一个例程处理查询和中间结果:
func serveIntermediateResults(
computationChannel chan *IntermediateResult,
queryChannel chan chan<-*IntermediateResult) {
var latestIntermediateResult *IntermediateResult // initial result
for {
select {
// an update arrives
case latestIntermediateResult, notClosed := <-computationChannel:
if !notClosed {
// the computation has finished, stop checking
computationChannel = nil
}
// a query arrived
case queryResponseChannel, notClosed := <-queryChannel:
if !notClosed {
// no more queries, so we're done
return
}
// respond with the latest result
queryResponseChannel<-latestIntermediateResult
}
}
}
在长时间计算中,您可以在适当的时候更新中间结果:
func longComputation(intermediateResultChannel chan *IntermediateResult) {
for notFinished {
// lots of stuff
intermediateResultChannel<-currentResult
}
close(intermediateResultChannel)
}
最后要询问当前结果,你有一个包装器可以使它变得更好:
func getCurrentResult() *IntermediateResult {
responseChannel := make(chan *IntermediateResult)
// queryChannel was given to the intermediate result server routine earlier
queryChannel<-responseChannel
return <-responseChannel
}