object MyObj extends Serializable{
private var a: String = _
private var b: String = _
//other vars handled in init
...
def init(x: String, y: String): Unit = {
this.a = x
}
def exec(z: String): String = {
//want to return value x from init
this.a
}
}
然后是我的火花部分-
MyObj.init("one", "two")
myData.map({s =>
MyObj.exec(s.toString)
}).saveAsTextFile(outFile)
但是,在spark中运行init
时,用exec
初始化的所有属性均为空。我该如何解决?
更新-跟进:
//works
myData.map({s =>
MyObj.init("one", "two")
MyObj.exec(s.toString)
}).saveAsTextFile(outFile)
//can't serialize error
val one= "one"
myData.map({s =>
MyObj.init(one, "two")
MyObj.exec(s.toString)
}).saveAsTextFile(outFile)
已通过广播测试更新2(失败)-
MyObj.init("one", "two")
val myObj = sc.broadcast(MyObj)
distData.map({s =>
myObj.value.exec(s.toString)
}).saveAsTextFile(outFile)
答案 0 :(得分:0)
您在驱动程序JVM中初始化了对象,但未在执行并行代码的执行器中初始化对象。
MyObj.init("one", "two") // Run in driver
myData.map({s => MyObj.exec(s.toString)}) // Run in executors
.saveAsTextFile(outFile)
这应该有效:
myData.map({s =>
MyObj.init("one", "two")
MyObj.exec(s.toString)
})
.saveAsTextFile(outFile)
根据您的用例,常见的模式是将对象用作Java单例模式
object MyObject {
lazy val (a,b) = init
private def init() = {
// Initialize a,b. Maybe connection to rest endpoints,
// databases or just use environment variables to set up
}
def exec(param1, ..) = {
// your stuff
}
}