我有三节课,我正在接受
任务未序列化
错误。完整的堆栈跟踪见下文。
第一堂课是一个序列化的人:
public class Person implements Serializable
{
private String name;
private int age;
public String getName()
{
return name;
}
public void setAge(int age)
{
this.age = age;
}
}
此类从文本文件中读取并映射到person类:
public class sqlTestv2 implements Serializable
{
private int appID;
private int dataID;
private JavaSparkContext sc;
public JavaRDD<Person> getDataRDD()
{
JavaRDD<String> test = sc.textFile("hdfs://localhost:8020/user/cloudera/people.txt");
JavaRDD<Person> people = test.map(
new Function<String, Person>() {
public Person call(String line) throws Exception {
String[] parts = line.split(",");
Person person = new Person();
person.setName(parts[0]);
person.setAge(Integer.parseInt(parts[1].trim()));
return person;
}
});
return people;
}
}
这将检索RDD并对其执行操作:
public class sqlTestv1 implements Serializable
{
public static void main(String[] arg) throws Exception
{
SparkConf conf = new SparkConf().setMaster("local").setAppName("wordCount");
JavaSparkContext sc = new JavaSparkContext(conf);
SQLContext sqlContext = new org.apache.spark.sql.SQLContext(sc);
sqlTestv2 v2=new sqlTestv2(1,1,sc);
JavaRDD<Person> test=v2.getDataRDD();
DataFrame schemaPeople = sqlContext.createDataFrame(test, Person.class);
schemaPeople.registerTempTable("people");
DataFrame df = sqlContext.sql("SELECT age FROM people");
df.show();
}
}
完整错误:
线程中的异常&#34; main&#34; org.apache.spark.SparkException:任务没有 序列化 在org.apache.spark.util.ClosureCleaner $ .ensureSerializable(ClosureCleaner.scala:315) 在org.apache.spark.util.ClosureCleaner $ .org $ apache $ spark $ util $ ClosureCleaner $$ clean(ClosureCleaner.scala:305) 在org.apache.spark.util.ClosureCleaner $ .clean(ClosureCleaner.scala:132) 在org.apache.spark.SparkContext.clean(SparkContext.scala:1893) 在org.apache.spark.rdd.RDD $$ anonfun $ map $ 1.apply(RDD.scala:294) 在org.apache.spark.rdd.RDD $$ anonfun $ map $ 1.apply(RDD.scala:293) 在org.apache.spark.rdd.RDDOperationScope $ .withScope(RDDOperationScope.scala:147) 在org.apache.spark.rdd.RDDOperationScope $ .withScope(RDDOperationScope.scala:108) 在org.apache.spark.rdd.RDD.withScope(RDD.scala:286) 在org.apache.spark.rdd.RDD.map(RDD.scala:293) 在org.apache.spark.api.java.JavaRDDLike $ class.map(JavaRDDLike.scala:90) 在org.apache.spark.api.java.AbstractJavaRDDLike.map(JavaRDDLike.scala:47) 在com.oreilly.learningsparkexamples.mini.java.sqlTestv2.getDataRDD(sqlTestv2.java:54) 在com.oreilly.learningsparkexamples.mini.java.sqlTestv1.main(sqlTestv1.java:41) 引起:java.io.NotSerializableException:org.apache.spark.api.java.JavaSparkContext 序列化堆栈: - 对象不可序列化(类:org.apache.spark.api.java.JavaSparkContext,value: org.apache.spark.api.java.JavaSparkContext@3c3b144b) - field(class:com.oreilly.learningsparkexamples.mini.java.sqlTestv2,name:sc,type: class org.apache.spark.api.java.JavaSparkContext) - object(com.oreilly.learningsparkexamples.mini.java.sqlTestv2类, com.oreilly.learningsparkexamples.mini.java.sqlTestv2@3752fdda) - field(class:com.oreilly.learningsparkexamples.mini.java.sqlTestv2 $ 1,name:this $ 0, 类型:class com.oreilly.learningsparkexamples.mini.java.sqlTestv2) - object(com.oreilly.learningsparkexamples.mini.java.sqlTestv2 $ 1, com.oreilly.learningsparkexamples.mini.java.sqlTestv2$1@70c171ec) - field(类:org.apache.spark.api.java.JavaPairRDD $$ anonfun $ toScalaFunction $ 1, name:fun $ 1,type:interface org.apache.spark.api.java.function.Function) - object(类org.apache.spark.api.java.JavaPairRDD $$ anonfun $ toScalaFunction $ 1, ) 在org.apache.spark.serializer.SerializationDebugger $ .improveException(SerializationDebugger.scala:40) 在org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47) 在org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:81)
答案 0 :(得分:3)
堆栈告诉你答案。它是您JavaSparkContext
传递给sqlTestv2
的{{1}}。你应该将sc传递给方法,而不是类
答案 1 :(得分:1)
您可以将“transient”修饰符添加到sc,以便它不会被序列化。