Question

我将辅助映射函数定义为辅助对象中的单独def，它不会“看到”前面在代码中定义的累加器。 Spark docs seams建议将“远程”函数保留在对象中，但是如何使这些函数与这些累加器一起工作？

object mainlogic{
    val counter = sc.accumulator(0)
    val data = sc.textFile(...)// load logic here
    val myrdd = data.mapPartitionsWithIndex(mapFunction)
}

object helper{
  def mapFunction(...)={
      counter+=1 // not compiling
  }
}

Answer 1

像这样的东西需要作为参数传递，就像任何其他代码一样：

object mainlogic{
    val counter = sc.accumulator(0)
    val data = sc.textFile(...)// load logic here
    val myrdd = data.mapPartitionsWithIndex(mapFunction(counter, _, _))
}

object helper{
  def mapFunction(counter: Accumulator[Int], ...)={
      counter+=1 // not compiling
  }
}

请务必记住文档中的注释：

对于累积器更新仅限内部操作，Spark 保证每个任务对累加器的更新只会是应用一次，即重新启动的任务不会更新该值。在转换时，用户应该知道每个任务的更新可能如果重新执行任务或工作阶段，则应用多次。

如何访问定义它们之外的对象中的累加器？

1 个答案: