org.apache.spark.SparkException:任务不可序列化 - 传递RDD

时间:2015-11-06 04:53:34

标签: java apache-spark

我有三节课,我正在接受

  

任务未序列化

错误。完整的堆栈跟踪见下文。

第一堂课是一个序列化的人:

public class Person implements Serializable 
  {
      private String name;
      private int age;

      public String getName() 
      {
        return name;
      }

      public void setAge(int age) 
      {
        this.age = age;
      }
    }

此类从文本文件中读取并映射到person类:

public class sqlTestv2 implements Serializable

{

    private int appID;
    private int dataID;
    private JavaSparkContext sc;



    public JavaRDD<Person> getDataRDD()
    {

    JavaRDD<String> test = sc.textFile("hdfs://localhost:8020/user/cloudera/people.txt");

         JavaRDD<Person> people = test.map(
                    new Function<String, Person>() {
                    public Person call(String line) throws Exception {
                      String[] parts = line.split(",");

                      Person person = new Person();
                      person.setName(parts[0]);
                      person.setAge(Integer.parseInt(parts[1].trim()));

                      return person;
                    }
                  });


        return people;



    }

}

这将检索RDD并对其执行操作:

public class sqlTestv1 implements Serializable

{

    public static void main(String[] arg) throws Exception 
    {
          SparkConf conf = new SparkConf().setMaster("local").setAppName("wordCount");
          JavaSparkContext sc = new JavaSparkContext(conf);
          SQLContext sqlContext = new org.apache.spark.sql.SQLContext(sc);
          sqlTestv2 v2=new sqlTestv2(1,1,sc);
          JavaRDD<Person> test=v2.getDataRDD();


          DataFrame schemaPeople = sqlContext.createDataFrame(test, Person.class);
          schemaPeople.registerTempTable("people");

          DataFrame df = sqlContext.sql("SELECT age FROM people");

          df.show();





    }

}

完整错误:

  

线程中的异常&#34; main&#34; org.apache.spark.SparkException:任务没有   序列化         在org.apache.spark.util.ClosureCleaner $ .ensureSerializable(ClosureCleaner.scala:315)         在org.apache.spark.util.ClosureCleaner $ .org $ apache $ spark $ util $ ClosureCleaner $$ clean(ClosureCleaner.scala:305)         在org.apache.spark.util.ClosureCleaner $ .clean(ClosureCleaner.scala:132)         在org.apache.spark.SparkContext.clean(SparkContext.scala:1893)         在org.apache.spark.rdd.RDD $$ anonfun $ map $ 1.apply(RDD.scala:294)         在org.apache.spark.rdd.RDD $$ anonfun $ map $ 1.apply(RDD.scala:293)         在org.apache.spark.rdd.RDDOperationScope $ .withScope(RDDOperationScope.scala:147)         在org.apache.spark.rdd.RDDOperationScope $ .withScope(RDDOperationScope.scala:108)         在org.apache.spark.rdd.RDD.withScope(RDD.scala:286)         在org.apache.spark.rdd.RDD.map(RDD.scala:293)         在org.apache.spark.api.java.JavaRDDLike $ class.map(JavaRDDLike.scala:90)         在org.apache.spark.api.java.AbstractJavaRDDLike.map(JavaRDDLike.scala:47)         在com.oreilly.learningsparkexamples.mini.java.sqlTestv2.getDataRDD(sqlTestv2.java:54)         在com.oreilly.learningsparkexamples.mini.java.sqlTestv1.main(sqlTestv1.java:41)       引起:java.io.NotSerializableException:org.apache.spark.api.java.JavaSparkContext       序列化堆栈:          - 对象不可序列化(类:org.apache.spark.api.java.JavaSparkContext,value:   org.apache.spark.api.java.JavaSparkContext@3c3b144b)          - field(class:com.oreilly.learningsparkexamples.mini.java.sqlTestv2,name:sc,type:   class org.apache.spark.api.java.JavaSparkContext)          - object(com.oreilly.learningsparkexamples.mini.java.sqlTestv2类,   com.oreilly.learningsparkexamples.mini.java.sqlTestv2@3752fdda)          - field(class:com.oreilly.learningsparkexamples.mini.java.sqlTestv2 $ 1,name:this $ 0,   类型:class com.oreilly.learningsparkexamples.mini.java.sqlTestv2)          - object(com.oreilly.learningsparkexamples.mini.java.sqlTestv2 $ 1,   com.oreilly.learningsparkexamples.mini.java.sqlTestv2$1@70c171ec)          - field(类:org.apache.spark.api.java.JavaPairRDD $$ anonfun $ toScalaFunction $ 1,   name:fun $ 1,type:interface   org.apache.spark.api.java.function.Function)          - object(类org.apache.spark.api.java.JavaPairRDD $$ anonfun $ toScalaFunction $ 1,   )         在org.apache.spark.serializer.SerializationDebugger $ .improveException(SerializationDebugger.scala:40)         在org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)         在org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:81)

2 个答案:

答案 0 :(得分:3)

堆栈告诉你答案。它是您JavaSparkContext传递给sqlTestv2的{​​{1}}。你应该将sc传递给方法,而不是类

答案 1 :(得分:1)

您可以将“transient”修饰符添加到sc,以便它不会被序列化。