akka演员的增量处理

时间:2012-10-12 03:56:34

标签: scala akka actor

我有演员需要做很长时间运行且计算量很大的工作,但计算本身可以逐步完成。因此,虽然完整的计算本身需要数小时才能完成,但中间结果实际上非常有用,我希望能够响应它们的任何请求。这是我想要做的伪代码:

var intermediateResult = ...
loop {
     while (mailbox.isEmpty && computationNotFinished)
       intermediateResult = computationStep(intermediateResult)


     receive {
         case GetCurrentResult => sender ! intermediateResult
         ...other messages...
     }
 }

4 个答案:

答案 0 :(得分:6)

这样做的最佳方式非常接近你已经在做的事情:

case class Continue(todo: ToDo)
class Worker extends Actor {
  var state: IntermediateState = _
  def receive = {
    case Work(x) =>
      val (next, todo) = calc(state, x)
      state = next
      self ! Continue(todo)
    case Continue(todo) if todo.isEmpty => // done
    case Continue(todo) =>
      val (next, rest) = calc(state, todo)
      state = next
      self ! Continue(rest)
  }
  def calc(state: IntermediateState, todo: ToDo): (IntermediateState, ToDo)
}

编辑:更多背景

当演员向自己发送消息时,Akka的内部处理基本上会在while循环内运行;一次性处理的消息数由actor的调度程序的throughput设置(默认为5)确定,在此处理量之后,线程将返回到池中,并且继续作为新的顺序排入调度程序任务。因此,上述解决方案中有两个可调参数:

  • 处理单个邮件的多个步骤(如果处理步骤非常小)
  • 增加throughput设置以提高吞吐量并降低公平性

最初的问题似乎有数百个这样的演员在运行,大概是在没有数百个CPU的普通硬件上,所以应该设置吞吐量设置,使得每个批处理不超过ca.为10ms。

绩效评估

让我们与斐波那契玩一下:

Welcome to Scala version 2.10.0-RC1 (Java HotSpot(TM) 64-Bit Server VM, Java 1.7.0_07).
Type in expressions to have them evaluated.
Type :help for more information.

scala> def fib(x1: BigInt, x2: BigInt, steps: Int): (BigInt, BigInt) = if(steps>0) fib(x2, x1+x2, steps-1) else (x1, x2)
fib: (x1: BigInt, x2: BigInt, steps: Int)(BigInt, BigInt)

scala> def time(code: =>Unit) { val start = System.currentTimeMillis; code; println("code took " + (System.currentTimeMillis - start) + "ms") }
time: (code: => Unit)Unit

scala> time(fib(1, 1, 1000))
code took 1ms

scala> time(fib(1, 1, 1000))
code took 1ms

scala> time(fib(1, 1, 10000))
code took 5ms

scala> time(fib(1, 1, 100000))
code took 455ms

scala> time(fib(1, 1, 1000000))
code took 17172ms

这意味着在一个可能非常优化的循环中,fib_100000需要半秒钟。现在让我们与演员玩一下:

scala> case class Cont(steps: Int, batch: Int)
defined class Cont

scala> val me = inbox()
me: akka.actor.ActorDSL.Inbox = akka.actor.dsl.Inbox$Inbox@32c0fe13

scala> val a = actor(new Act {
  var s: (BigInt, BigInt) = _
  become {
    case Cont(x, y) if y < 0 => s = (1, 1); self ! Cont(x, -y)
    case Cont(x, y) if x > 0 => s = fib(s._1, s._2, y); self ! Cont(x - 1, y)
    case _: Cont => me.receiver ! s
   }
})
a: akka.actor.ActorRef = Actor[akka://repl/user/$c]

scala> time{a ! Cont(1000, -1); me.receive(10 seconds)}
code took 4ms

scala> time{a ! Cont(10000, -1); me.receive(10 seconds)}
code took 27ms

scala> time{a ! Cont(100000, -1); me.receive(10 seconds)}
code took 632ms

scala> time{a ! Cont(1000000, -1); me.receive(30 seconds)}
code took 17936ms

这已经很有趣了:如果每一步都有足够长的时间(在最后一行的幕后有巨大的BigInts),演员就没有多余的了。现在让我们看看在以更加批量的方式进行较小的计算时会发生什么:

scala> time{a ! Cont(10000, -10); me.receive(30 seconds)}
code took 462ms

这与上面的直接变体的结果非常接近。

结论

几乎所有应用程序都向自己发送消息并不昂贵,只需保持实际处理步骤略大于几百纳秒。

答案 1 :(得分:4)

我从你对Roland Kuhn的评论中假设你有一些可以被认为是递归的工作,至少在块中。如果不是这种情况,我认为没有任何干净的解决方案来处理您的问题,您将不得不处理复杂的模式匹配块。

如果我的假设是正确的,我会异步安排计算,让演员可以自由回答其他消息。关键是使用Future monadic功能并具有简单的接收块。您必须处理三个消息(startComputation,changeState,getState)

您最终会收到以下内容:

def receive {
  case StartComputation(myData) =>expensiveStuff(myData)
  case ChangeState(newstate) = this.state = newstate
  case GetState => sender ! this.state
}

然后,您可以通过定义自己的递归映射来利用Future上的map方法:

 def mapRecursive[A](f:Future[A], handler: A => A, exitConditions: A => Boolean):Future[A] = {
    f.flatMap {  a=>
                 if (exitConditions(a))
                   f
                 else {
                     val newFuture = f.flatMap{ a=> Future(handler(a))}
                     mapRecursive(newFuture,handler,exitConditions)
                 }

              }
  }

拥有此工具后,一切都变得更轻松。如果您查看以下示例:

def main(args:Array[String]){
    val baseFuture:Future[Int] = Promise.successful(64)
    val newFuture:Future[Int] = mapRecursive(baseFuture,
                                 (a:Int) => {
                                   val result = a/2
                                   println("Additional step done: the current a is " + result)
                                   result
                                 }, (a:Int) => (a<=1))

    val one = Await.result(newFuture,Duration.Inf)
    println("Computation finished, result = " + one)



  }

它的输出是:

  

完成附加步骤:当前a为32

     

完成附加步骤:当前a为16

     

完成附加步骤:当前a为8

     

完成附加步骤:当前a为4

     

完成额外步骤:当前a为2

     

完成附加步骤:当前a为1

     

计算完成,结果= 1

您了解在expensiveStuff方法

中也可以这样做
  def expensiveStuff(myData:MyData):Future[MyData]= {
    val firstResult = Promise.successful(myData)
    val handler : MyData => MyData = (myData) => {
      val result = myData.copy(myData.value/2)
      self ! ChangeState(result)
      result
    }
    val exitCondition : MyData => Boolean = (myData:MyData) => myData.value==1
    mapRecursive(firstResult,handler,exitCondition)
  }

编辑 - 更详细

如果您不想阻止以线程安全且同步的方式处理来自其邮箱的消息的Actor,您唯一能做的就是在不同的线程上执行计算。这正是一种高性能的非阻塞接收。

但是,你说得对,我提出的方法会给你带来很高的性能损失。每一步都是在不同的未来完成的,这可能根本就没有必要。因此,您可以递归处理程序以获取单线程或多线程执行。毕竟没有神奇的公式:

  • 如果您想异步安排并最大限度地降低成本,所有工作都应由一个线程完成
  • 然而,这可能会阻止其他工作开始,因为如果线程池中的所有线程都被占用,则期货将排队。因此,您可能希望将操作分解为多个期货,以便即使在完全使用时也可以在旧工作完成之前安排一些新工作。

def recurseFuture[A](entryFuture: Future[A], handler: A => A, exitCondition: A => Boolean, maxNestedRecursion: Long = Long.MaxValue): Future[A] = {
        def recurse(a:A, handler: A => A, exitCondition: A => Boolean, maxNestedRecursion: Long, currentStep: Long): Future[A] = {
          if (exitCondition(a))
            Promise.successful(a)
          else
            if (currentStep==maxNestedRecursion)
              Promise.successful(handler(a)).flatMap(a => recurse(a,handler,exitCondition,maxNestedRecursion,0))
            else{
              recurse(handler(a),handler,exitCondition,maxNestedRecursion,currentStep+1)
            }
        }
        entryFuture.flatMap { a => recurse(a,handler,exitCondition,maxNestedRecursion,0) }
      }

我为测试目的增强了我的处理程序方法:

  val handler: Int => Int = (a: Int) => {
      val result = a / 2
      println("Additional step done: the current a is " + result + " on thread " + Thread.currentThread().getName)
      result
    }

方法1:将处理程序递归到自身,以便在单个线程上执行所有操作。

    println("Starting strategy with all the steps on the same thread")
    val deepestRecursion: Future[Int] = recurseFuture(baseFuture,handler, exitCondition)
    Await.result(deepestRecursion, Duration.Inf)
    println("Completed strategy with all the steps on the same thread")
    println("")

方法2:递归处理器本身的有限深度

println("Starting strategy with the steps grouped by three")
val threeStepsInSameFuture: Future[Int] = recurseFuture(baseFuture,handler, exitCondition,3)
val threeStepsInSameFuture2: Future[Int] = recurseFuture(baseFuture,handler, exitCondition,4)
Await.result(threeStepsInSameFuture, Duration.Inf)
Await.result(threeStepsInSameFuture2, Duration.Inf)
println("Completed strategy with all the steps grouped by three")
executorService.shutdown()

答案 2 :(得分:2)

你不应该使用Actors来进行长时间运行的计算,因为它们会阻塞应该运行Actors代码的线程。

我会尝试使用一个使用单独的Thread / ThreadPool进行计算的设计,并使用AtomicReferences在以下伪代码的行中存储/查询中间结果:

val cancelled = new AtomicBoolean(false)
val intermediateResult = new AtomicReference[IntermediateResult]()

object WorkerThread extends Thread {
  override def run {
    while(!cancelled.get) {
      intermediateResult.set(computationStep(intermediateResult.get))
    }
  }
}

loop {
  react {
    case StartComputation => WorkerThread.start()
    case CancelComputation => cancelled.set(true)
    case GetCurrentResult => sender ! intermediateResult.get
  }
}

答案 3 :(得分:1)

这是一个经典的并发问题。你想要几个例程/演员(或者你想要的任何东西)。代码大多是正确的Go,上下文的变量名称有多长。第一个例程处理查询和中间结果:

func serveIntermediateResults(
        computationChannel chan *IntermediateResult,
        queryChannel chan chan<-*IntermediateResult) {
    var latestIntermediateResult *IntermediateResult // initial result
    for {
        select {
        // an update arrives
        case latestIntermediateResult, notClosed := <-computationChannel:
            if !notClosed {
                // the computation has finished, stop checking
                computationChannel = nil
            }
        // a query arrived
        case queryResponseChannel, notClosed := <-queryChannel:
            if !notClosed {
                // no more queries, so we're done
                return
            }
            // respond with the latest result
            queryResponseChannel<-latestIntermediateResult
        }
    }
}

在长时间计算中,您可以在适当的时候更新中间结果:

func longComputation(intermediateResultChannel chan *IntermediateResult) {
    for notFinished {
        // lots of stuff
        intermediateResultChannel<-currentResult
    }
    close(intermediateResultChannel)
}

最后要询问当前结果,你有一个包装器可以使它变得更好:

func getCurrentResult() *IntermediateResult {
     responseChannel := make(chan *IntermediateResult)
     // queryChannel was given to the intermediate result server routine earlier
     queryChannel<-responseChannel
     return <-responseChannel
}