Scala:RDD映射中不可序列化的任务由json4s"隐式val格式= DefaultFormats"

时间:2017-06-02 01:30:12

标签: scala apache-spark json4s

以下程序尝试为每个ROW(在RDD映射中)调用3个函数:

    import org.json4s._
    import org.json4s.jackson.JsonMethods._
    implicit val formats = DefaultFormats

    class TagCalculation extends Serializable {
    def test1(x: String) = x + " test1"
    def test2(x: String) = x + "test2" 
    def test3(x: String) = x + "test3" 
    def test5(arg1: java.lang.Integer, arg2: String, arg3: scala.collection.immutable.$colon$colon[Any]) = "test mix2"
  }
  val df = sqlContext.createDataFrame(Seq((1,"Android"), (2, "iPhone")))
  val get_test = new TagCalculation
  val field = Array("test1","test2","test3")

  val bb = df.rdd.map(row => {

    val reValue1 = "start"
    val ret = for(every <- field)
      yield {
        val test_para = Array(reValue1)
        val argtypes = test_para.map(_.getClass)
        val method4 = get_test.getClass.getMethod(every, argtypes: _*)

        val bbq = method4.invoke(get_test, test_para: _*)

        if (field.last == every)
            bbq
      }
    ret.last
  })

但是有些错误输出:

  

org.apache.spark.SparkException:任务不可序列化   org.apache.spark.util.ClosureCleaner $ .ensureSerializable(ClosureCleaner.scala:304)     在   org.apache.spark.util.ClosureCleaner $ .ORG $阿帕奇$火花$ UTIL $ ClosureCleaner $$干净(ClosureCleaner.scala:294)     在   org.apache.spark.util.ClosureCleaner $清洁机壳(ClosureCleaner.scala:122)     在org.apache.spark.SparkContext.clean(SparkContext.scala:2032)at   org.apache.spark.rdd.RDD $$ anonfun $ map $ 1.apply(RDD.scala:314)at at   org.apache.spark.rdd.RDD $$ anonfun $ map $ 1.apply(RDD.scala:313)at at   org.apache.spark.rdd.RDDOperationScope $ .withScope(RDDOperationScope.scala:147)     在   org.apache.spark.rdd.RDDOperationScope $ .withScope(RDDOperationScope.scala:108)     在org.apache.spark.rdd.RDD.withScope(RDD.scala:306)at   org.apache.spark.rdd.RDD.map(RDD.scala:313)       ........ org.apache.spark.deploy.SparkSubmit $ .submit(SparkSubmit.scala:205)at at   org.apache.spark.deploy.SparkSubmit $ .main(SparkSubmit.scala:120)at at   org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)引起:   java.io.NotSerializableException:org.json4s.DefaultFormats $

任何指针?

它可能是由&#34;隐式val格式= DefaultFormats&#34;引起的。但我需要在&#34; map&#34;。

之前提取价值

1 个答案:

答案 0 :(得分:1)

问题是因为您在初始化和使用对象的TagCalculation内定义了calling class类。只需将其移至calling class之外或将其设为separate class,即可解决NotSerializableException的问题。