org.apache.spark.SparkException:任务不能用lambda序列化

时间:2017-03-03 08:08:18

标签: scala apache-spark serialization

我对scala和spark非常新。现在我遇到一个让我非常困惑的问题。请给我一个建议。

我使用lambda从RDD[myEntityClass]制作RDD[Array[String]]。但是我遇到了一个错误,它表示将String解析为Long的值为null。为了研究这个,我实现了一个方法,使我能够使用断点。

但是现在我得到了org.apache.spark.SparkException: Task not serializable而且我找不到错误。下面是我的代码片段,请帮助我,如果你能找到任何东西。

def makingData() : RDD[MyEntityClass] = {
  .
  .
  data.map(row => toMyEntityClass(row))
}

def toMyEntityClass(row : Array[String]) : MyEntityClass = {
  var id = row(0).toLong
  var name = row(1)
  var code = row(2).toLong
  var parentId = row(3).toLong
  var status = row(4)

  MyEntityClass(id, name, code, parentId, status)
}

=====更新了问题=====

我正在更新我的问题以回复您的建议。我已经将MyEntityClass作为案例类,如下所示。

case class MyEntityClass(id: Long, name: String, code: Long, parentId: Long, status: String)

=====附加堆栈跟踪=====

Task not serializable
org.apache.spark.SparkException: Task not serializable
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2030)
at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:314)
at org.apache.spark.rdd.RDD$$anonfun$map$1.apply(RDD.scala:313)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:108)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:306)
at org.apache.spark.rdd.RDD.map(RDD.scala:313)
at com.myproject.repository.MyRepositorySpec.getDummyData(MyRepositorySpec.scala:40)
at com.myproject.repository.MyRepositorySpec$$anonfun$3.apply(MyRepositorySpec.scala:66)
at com.myproject.repository.MyRepositorySpec$$anonfun$3.apply(MyRepositorySpec.scala:65)
at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
at org.scalatest.Transformer.apply(Transformer.scala:22)
at org.scalatest.Transformer.apply(Transformer.scala:20)
at org.scalatest.FlatSpecLike$$anon$1.apply(FlatSpecLike.scala:1681)
at org.scalatest.Suite$class.withFixture(Suite.scala:1031)
at org.scalatest.FlatSpec.withFixture(FlatSpec.scala:1691)
at org.scalatest.FlatSpecLike$class.invokeWithFixture$1(FlatSpecLike.scala:1678)
at org.scalatest.FlatSpecLike$$anonfun$runTest$1.apply(FlatSpecLike.scala:1690)
at org.scalatest.FlatSpecLike$$anonfun$runTest$1.apply(FlatSpecLike.scala:1690)
at org.scalatest.SuperEngine.runTestImpl(Engine.scala:287)
at org.scalatest.FlatSpecLike$class.runTest(FlatSpecLike.scala:1690)
at org.scalatest.FlatSpec.runTest(FlatSpec.scala:1691)
at org.scalatest.FlatSpecLike$$anonfun$runTests$1.apply(FlatSpecLike.scala:1748)
at org.scalatest.FlatSpecLike$$anonfun$runTests$1.apply(FlatSpecLike.scala:1748)
at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:394)
at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:382)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:382)
at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:371)
at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:408)
at org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:382)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:382)
at org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:377)
at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:459)
at org.scalatest.FlatSpecLike$class.runTests(FlatSpecLike.scala:1748)
at org.scalatest.FlatSpec.runTests(FlatSpec.scala:1691)
at org.scalatest.Suite$class.run(Suite.scala:1320)
at org.scalatest.FlatSpec.org$scalatest$FlatSpecLike$$super$run(FlatSpec.scala:1691)
at org.scalatest.FlatSpecLike$$anonfun$run$1.apply(FlatSpecLike.scala:1794)
at org.scalatest.FlatSpecLike$$anonfun$run$1.apply(FlatSpecLike.scala:1794)
at org.scalatest.SuperEngine.runImpl(Engine.scala:519)
at org.scalatest.FlatSpecLike$class.run(FlatSpecLike.scala:1794)
at org.scalatest.FlatSpec.run(FlatSpec.scala:1691)
at org.scalatest.tools.SuiteRunner.run(SuiteRunner.scala:46)
at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$1.apply(Runner.scala:1340)
at org.scalatest.tools.Runner$$anonfun$doRunRunRunDaDoRunRun$1.apply(Runner.scala:1334)
at scala.collection.immutable.List.foreach(List.scala:318)
at org.scalatest.tools.Runner$.doRunRunRunDaDoRunRun(Runner.scala:1334)
at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1011)
at org.scalatest.tools.Runner$$anonfun$runOptionallyWithPassFailReporter$2.apply(Runner.scala:1010)
at org.scalatest.tools.Runner$.withClassLoaderAndDispatchReporter(Runner.scala:1500)
at org.scalatest.tools.Runner$.runOptionallyWithPassFailReporter(Runner.scala:1010)
at org.scalatest.tools.Runner$.run(Runner.scala:850)
at org.scalatest.tools.Runner.run(Runner.scala)
at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.runScalaTest2(ScalaTestRunner.java:138)
at org.jetbrains.plugins.scala.testingSupport.scalaTest.ScalaTestRunner.main(ScalaTestRunner.java:28)
Caused by: java.io.NotSerializableException: org.scalatest.Assertions$AssertionsHelper
Serialization stack:
- object not serializable (class: org.scalatest.Assertions$AssertionsHelper, value: org.scalatest.Assertions$AssertionsHelper@45e639ee)
- field (class: org.scalatest.FlatSpec, name: assertionsHelper, type: class org.scalatest.Assertions$AssertionsHelper)
- object (class com.myproject.repository.MyRepositorySpec, MyRepositorySpec)
- field (class: com.myproject.repository.MyRepositorySpec$$anonfun$getDummyData$1, name: $outer, type: class com.myproject.repository.MyRepositorySpec)
- object (class com.myproject.repository.MyRepositorySpec$$anonfun$getDummyData$1, <function1>)
at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:84)
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:301)
... 61 more

1 个答案:

答案 0 :(得分:1)

从上面给出的代码中,我了解到你想转换 RDD[Array[String]]RDD[MyEntityClass]

我们在这里有两个选择..

  • 制作一个案例类MyEntityClass,默认为Serializable。 例如

    case MyEntityClass(id : Long, name : String, code : String, parentId : Long, status : String)

  • 创建一个普通的课程MyEntityClass with Serializable然后它有资格进行序列化... 注意:一般情况下,当案例类超过22个字段时使用此方法(产品问题)如果你正在使用&lt; scala 2.10

  

编辑:确认MyEntityClass是案例类后,和   粘贴序列化调试器堆栈跟踪,显示   MyRepositorySpec只是一个扩展FlatSpec的测试类   makingData()和toMyEntityClass()。您正在使用您的测试类   在闭包内部是导致此异常的原因

如果出现以下错误,则很明显

  由引起的:java.io.NotSerializableException:   org.scalatest.Assertions $ AssertionsHelper序列化堆栈:    - 对象不可序列化(类:org.scalatest.Assertions $ AssertionsHelper,value:   org.scalatest.Assertions$AssertionsHelper@45e639ee)    - field(class:org.scalatest.FlatSpec,name:assertionsHelper,type:class org.scalatest.Assertions $ AssertionsHelper)    - object(类com.myproject.repository.MyRepositorySpec,MyRepositorySpec)    - field(类:com.myproject.repository.MyRepositorySpec $$ anonfun $ getDummyData $ 1,   名:

解决方案:将MyRepositorySpec设为Serializable