Question

我使用akka的actor模型构建了一个分布式流媒体机器学习模型。通过向Actor发送训练实例（训练数据）来异步训练模型。对这些数据的训练占用了计算时间并改变了演员的状态。

目前我正在使用历史数据来训练模型。我想运行一组不同配置的模型，这些模型在相同的数据上进行训练，并查看不同的集合度量标准的变化情况。基本上这里是对Thread.sleep（1）以及表示计算时间和状态的数据Array进行的复杂模拟。

implicit val as = ActorSystem()

case object Report

case class Model(dataSize: Int) {
  val modelActor: ActorRef = actor(new Act {
    val data = Array.fill(dataSize)(0)
    become {
      case trainingData: Int => {
        // Screw with the state of the actor and pretend that it takes time
        Thread.sleep(1)
        data(Math.abs(Random.nextInt % dataSize)) == trainingData
      }
      case Report => {
          println(s"Finished $dataSize")
          context.stop(self)
        }
      }
    })

  def train(trainingInstance: Int) = modelActor ! trainingInstance

  def report: Unit = modelActor ! Report
}

val trainingData = Array.fill(5000)(Random.nextInt)

val dataSizeParams = (1 to 500)

接下来我使用for循环来改变参数（由dataSizeParams数组表示）

for {
  param <- dataSizeParams
} {
  // make model with params
  val model = Model(param)
  for {
    trainingInstance <- trainingData
  } {
    model.train(trainingInstance)
  }
  model.report
}

for循环绝对是做我想做的事情的错误方法。它并行启动所有不同的模型。当dataSizeParams在1到500范围内时，它运行良好，但如果我将其提高到某个高点，我的模型EACH开始占用明显的内存块。我想出的是下面的代码。基本上我有一个模型大师，可以根据他收到的Run消息的数量来控制一次运行的模型的数量。每个模型现在都包含对此主actor的引用，并在完成处理后向他发送消息：

// Alternative that doesn't use a for loop and instead controls concurrency through what I'm calling a master actor
case object ImDone
case object Run

case class Model(dataSize: Int, master: ActorRef) {
  val modelActor: ActorRef = actor(new Act {
    val data = Array.fill(dataSize)(0)
    become {
      case trainingData: Int => {
        // Screw with the state of the actor and pretend that it takes time
        Tread.sleep(1)
        data(Math.abs(Random.nextInt % dataSize)) == trainingData
      }
      case Report => {
          println(s"Finished $dataSize")
          master ! ImDone
          context.stop(self)
        }
      }
    })

  def train(trainingInstance: Int) = modelActor ! trainingInstance

  def report: Unit = modelActor ! Report
}

val master: ActorRef = actor(new Act {
  var paramRuns = dataSizeParams.toIterator
  become {
    case Run => {
      if (paramRuns.hasNext) {
        val model = Model(paramRuns.next(), self)
        for {
          trainingInstance <- trainingData
        } {
          model.train(trainingInstance)
        }
        model.report
      } else {
        println("No more to run")
        context.stop(self)
      }
    }
    case ImDone =>  {
      self ! Run
    }
  }
})

master ! Run

主代码没有任何问题（我可以看到）。我可以严格控制一次产生的模型数量，但我觉得我错过了一个更容易/干净/开箱即用的方法。另外，我想知道是否有任何巧妙的方法来限制同时运行的模型数量，比如查看系统的CPU和内存使用情况。

Answer 1

你正在寻找拉动模式的工作。我强烈推荐Akka开发者的这篇博文：

http://letitcrash.com/post/29044669086/balancing-workload-across-nodes-with-akka-2

我们在Akka的群集功能之上使用此变体来避免恶意并发。通过让工作人员拉工作而不是让主管推送工作，您可以通过简单地限制数量来优雅地控制工作量（因此，CPU和内存使用量）工人演员。

与纯路由器相比，这有一些优势：跟踪故障更容易（如该帖所述），工作不会在邮箱中萎缩（可能会丢失）。

此外，如果您正在使用远程处理，我建议您不在邮件中发送大量数据。让工作节点在触发时自己从另一个源中提取数据。我们使用S3。

控制Akka演员的产生，他们消耗了大量的记忆

1 个答案: