以下代码因org.apache.spark.SparkException: Task not serializable
例外而失败:
import org.apache.spark.rdd.RDD
import scala.reflect.ClassTag
class Foo[T](rdd: RDD[T])(implicit kt: ClassTag[T]) {
def die() {
rdd.map(_ => Array[T]()).count()
}
}
val x = sc.parallelize(Array(1, 2, 3, 4, 5))
val foo = new Foo(x)
foo.die()
因为Foo
不可序列化。为什么功能文字传递给map
导致Foo
在引用隐式参数ClassTag
时被序列化?我该如何解决这个问题?当Foo
在Int
而不是T
上工作时,此方法有效。我的实际代码是尝试toArray
,但这是同样的问题。谢谢!
编辑:我用spark-shell运行它。这是序列化堆栈:
Serialization stack:
- object not serializable (class: $iwC$$iwC$Foo, value: $iwC$$iwC$Foo@4ce1292a)
- field (class: $iwC$$iwC$Foo$$anonfun$die$1, name: $outer, type: class $iwC$$iwC$Foo)
- object (class $iwC$$iwC$Foo$$anonfun$die$1, <function1>)
at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:81)
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:312)
... 56 more
答案 0 :(得分:0)
这不是implicit
或ClassTag
特有的。以下作品:
import org.apache.spark.rdd.RDD
import scala.reflect.ClassTag
class Foo[T](rdd: RDD[T])(implicit kt: ClassTag[T]) {
def die() {
val localKt = kt
rdd.map(_ => Array[T]()(localKt)).count()
}
}
val x = sc.parallelize(Array(1, 2, 3, 4, 5))
val foo = new Foo(x)
foo.die()