我有一个转变:
JavaRDD<Tuple2<String, Long>> mappedRdd = myRDD.values().map(
new Function<Pageview, Tuple2<String, Long>>() {
@Override
public Tuple2<String, Long> call(Pageview pageview) throws Exception {
String key = pageview.getUrl().toString();
Long value = getDay(pageview.getTimestamp());
return new Tuple2<>(key, value);
}
});
网页浏览是一种:Pageview.java
然后我将该类注册到Spark中:
Class[] c = new Class[1];
c[0] = Pageview.class;
sparkConf.registerKryoClasses(c);
线程“main”中的异常org.apache.spark.SparkException:任务没有 可序列化的 org.apache.spark.util.ClosureCleaner $ .ensureSerializable(ClosureCleaner.scala:166) 在 org.apache.spark.util.ClosureCleaner $清洁机壳(ClosureCleaner.scala:158) 在org.apache.spark.SparkContext.clean(SparkContext.scala:1623)at org.apache.spark.rdd.RDD.map(RDD.scala:286)at org.apache.spark.api.java.JavaRDDLike $ class.map(JavaRDDLike.scala:89) 在 org.apache.spark.api.java.AbstractJavaRDDLike.map(JavaRDDLike.scala:46) 在 org.apache.gora.tutorial.log.ExampleSpark.run(ExampleSpark.java:100) 在 org.apache.gora.tutorial.log.ExampleSpark.main(ExampleSpark.java:53) 引起:java.io.NotSerializableException: org.apache.gora.tutorial.log.ExampleSpark序列化堆栈: - 对象不可序列化(类:org.apache.gora.tutorial.log.ExampleSpark,值: org.apache.gora.tutorial.log.ExampleSpark@1a2b4497) - field(class:org.apache.gora.tutorial.log.ExampleSpark $ 1,name:this $ 0,type:class org.apache.gora.tutorial.log.ExampleSpark) - object(类org.apache.gora.tutorial.log.ExampleSpark $ 1,org.apache.gora.tutorial.log.ExampleSpark$1@4ab2775d) - field(类:org.apache.spark.api.java.JavaPairRDD $$ anonfun $ toScalaFunction $ 1, name:fun $ 1,type:interface org.apache.spark.api.java.function.Function) - object(类org.apache.spark.api.java.JavaPairRDD $$ anonfun $ toScalaFunction $ 1, ) 在 org.apache.spark.serializer.SerializationDebugger $ .improveException(SerializationDebugger.scala:38) 在 org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47) 在 org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:80) 在 org.apache.spark.util.ClosureCleaner $ .ensureSerializable(ClosureCleaner.scala:164) ......还有7个
当我调试代码时,我发现即使有一个名为JavaSerializer.scala
的类,也会调用KryoSerializer
。
PS 1:我不想使用Java Serializer,但在Serializer
实施Pageview
并不能解决问题。
PS 2:这并不能解决问题:
...
//String key = pageview.getUrl().toString();
//Long value = getDay(pageview.getTimestamp());
String key = "Dummy";
Long value = 1L;
return new Tuple2<>(key, value);
...
答案 0 :(得分:4)
我使用Java代码多次遇到此问题。虽然我使用的是Java序列化,但是我会创建包含Serializable代码的类,或者如果你不想这样做,我会将函数作为类的静态成员。
以下是解决方案的代码段。
public class Test {
private static Function s = new Function<Pageview, Tuple2<String, Long>>() {
@Override
public Tuple2<String, Long> call(Pageview pageview) throws Exception {
String key = pageview.getUrl().toString();
Long value = getDay(pageview.getTimestamp());
return new Tuple2<>(key, value);
}
};
}