ClassTag导致Spark序列化对象

时间:2015-08-07 03:15:49

标签: scala serialization apache-spark

以下代码因org.apache.spark.SparkException: Task not serializable例外而失败:

import org.apache.spark.rdd.RDD
import scala.reflect.ClassTag

class Foo[T](rdd: RDD[T])(implicit kt: ClassTag[T]) {
  def die() {
    rdd.map(_ => Array[T]()).count()
  }
}

val x = sc.parallelize(Array(1, 2, 3, 4, 5))
val foo = new Foo(x)
foo.die()

因为Foo不可序列化。为什么功能文字传递给map导致Foo在引用隐式参数ClassTag时被序列化?我该如何解决这个问题?当FooInt而不是T上工作时,此方法有效。我的实际代码是尝试toArray,但这是同样的问题。谢谢!

编辑:我用spark-shell运行它。这是序列化堆栈:

Serialization stack:
    - object not serializable (class: $iwC$$iwC$Foo, value: $iwC$$iwC$Foo@4ce1292a)
    - field (class: $iwC$$iwC$Foo$$anonfun$die$1, name: $outer, type: class $iwC$$iwC$Foo)
    - object (class $iwC$$iwC$Foo$$anonfun$die$1, <function1>)
    at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
    at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
    at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:81)
    at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:312)
    ... 56 more

1 个答案:

答案 0 :(得分:0)

这不是implicitClassTag特有的。以下作品:

import org.apache.spark.rdd.RDD
import scala.reflect.ClassTag

class Foo[T](rdd: RDD[T])(implicit kt: ClassTag[T]) {
  def die() {
    val localKt = kt
    rdd.map(_ => Array[T]()(localKt)).count()
  }
}

val x = sc.parallelize(Array(1, 2, 3, 4, 5))
val foo = new Foo(x)
foo.die()