为什么我的初始化对象没有传递给我的Spark任务

时间:2018-06-19 16:45:42

标签: scala apache-spark

object MyObj extends Serializable{
    private var a: String = _
    private var b: String = _
    //other vars handled in init
    ...
    def init(x: String, y: String): Unit = {
        this.a = x
    }
    def exec(z: String): String = {
        //want to return value x from init
        this.a
    }
}

然后是我的火花部分-

MyObj.init("one", "two")
myData.map({s => 
                 MyObj.exec(s.toString) 
             }).saveAsTextFile(outFile)

但是,在spark中运行init时,用exec初始化的所有属性均为空。我该如何解决?

更新-跟进:

//works
myData.map({s => 
                 MyObj.init("one", "two")
                 MyObj.exec(s.toString) 
             }).saveAsTextFile(outFile)


//can't serialize error
val one= "one"
myData.map({s => 
                 MyObj.init(one, "two")
                 MyObj.exec(s.toString) 
             }).saveAsTextFile(outFile)

已通过广播测试更新2(失败)-

MyObj.init("one", "two")
val myObj = sc.broadcast(MyObj)     
distData.map({s => 
                  myObj.value.exec(s.toString) 
             }).saveAsTextFile(outFile)

1 个答案:

答案 0 :(得分:0)

您在驱动程序JVM中初始化了对象,但未在执行并行代码的执行器中初始化对象。

MyObj.init("one", "two")  // Run in driver
myData.map({s => MyObj.exec(s.toString)})  // Run in executors
.saveAsTextFile(outFile)

这应该有效:

myData.map({s => 
     MyObj.init("one", "two")
     MyObj.exec(s.toString)
})
.saveAsTextFile(outFile) 

根据您的用例,常见的模式是将对象用作Java单例模式

object MyObject {
   lazy val (a,b) = init

   private def init() = {
      // Initialize a,b. Maybe connection to rest endpoints, 
      // databases or just use environment variables to set up
   }

   def exec(param1, ..) = {
     // your stuff
   }

}