无法使用Kryo

时间:2017-11-28 09:45:00

标签: clojure kryo flambo

我正在使用 -

创建一个spark上下文
(ns something
  (:require [flambo.conf : conf]
                 [flambo.api :as f]))
(def c (-> (conf/spark-conf)
           (conf/master "spark://formcept008.lan:7077") 
           (conf/app-name "clustering")))  ;; app-name   
(def sc (f/spark-context c))

然后我正在创建一个RDD -

(f/parallelize sc DATA)

现在,当我对这些数据执行某些操作时,如(f / take rdd 3)等,我收到错误 -

  

17/11/28 14:35:00 ERROR Utils:遇到异常       org.apache.spark.SparkException:无法使用Kryo注册类           在org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:129)           在org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:274)           在org.apache.spark.serializer.KryoSerializerInstance。(KryoSerializer.scala:259)           在org.apache.spark.serializer.KryoSerializer.newInstance(KryoSerializer.scala:175)           at org.apache.spark.rdd.ParallelCollectionPartition $$ anonfun $ readObject $ 1.apply $ mcV $ sp(ParallelCollectionRDD.scala:79)           在org.apache.spark.rdd.ParallelCollectionPartition $$ anonfun $ readObject $ 1.apply(ParallelCollectionRDD.scala:70)           在org.apache.spark.rdd.ParallelCollectionPartition $$ anonfun $ readObject $ 1.apply(ParallelCollectionRDD.scala:70)           at org.apache.spark.util.Utils $ .tryOrIOException(Utils.scala:1273)           在org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:70)           at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)           at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)           at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)           at java.lang.reflect.Method.invoke(Method.java:498)           at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058)           at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1909)           at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)           at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)           at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)           在java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)           at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)           at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)           at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)           在org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)           在org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)           在org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:253)           在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)           at java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:617)           在java.lang.Thread.run(Thread.java:745)       引起:java.lang.ClassNotFoundException:flambo.kryo.BaseFlamboRegistrator           at java.net.URLClassLoader.findClass(URLClassLoader.java:381)           at java.lang.ClassLoader.loadClass(ClassLoader.java:424)           at java.lang.ClassLoader.loadClass(ClassLoader.java:357)           at java.lang.Class.forName0(Native Method)           在java.lang.Class.forName(Class.java:348)           在org.apache.spark.serializer.KryoSerializer $$ anonfun $ newKryo $ 5.apply(KryoSerializer.scala:124)           在org.apache.spark.serializer.KryoSerializer $$ anonfun $ newKryo $ 5.apply(KryoSerializer.scala:124)           在scala.collection.TraversableLike $$ anonfun $ map $ 1.apply(TraversableLike.scala:234)           在scala.collection.TraversableLike $$ anonfun $ map $ 1.apply(TraversableLike.scala:234)           在scala.collection.IndexedSeqOptimized $ class.foreach(IndexedSeqOptimized.scala:33)           at scala.collection.mutable.ArrayOps $ ofRef.foreach(ArrayOps.scala:186)           在scala.collection.TraversableLike $ class.map(TraversableLike.scala:234)           在scala.collection.mutable.ArrayOps $ ofRef.map(ArrayOps.scala:186)           在org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:124)           ......还有27个       17/11/28 14:35:00 ERROR Executor:阶段0.0(TID 0)中任务0.0的异常       java.lang.IllegalStateException:未读块数据           at java.io.ObjectInputStream $ BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2449)           at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1385)           at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018)           在java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942)           at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808)           at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353)           at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373)           在org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)           在org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)           在org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:253)           在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)           at java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:617)           在java.lang.Thread.run(Thread.java:745)

对此有任何想法。

2 个答案:

答案 0 :(得分:0)

flambo似乎不会以某种方式进入你的课堂路径,这就是你获得的原因:

java.lang.ClassNotFoundException: flambo.kryo.BaseFlamboRegistrator

您是从REPL运行它还是使用lein或启动任务?

如果您正在使用leiningen,请检查您的类路径(lein classpath)和依赖关系树(lein deps :tree

此外,执行lein clean以确保您的目标文件夹不会导致问题

堆栈跟踪分析 由于Failed to register classes with Kryo缺失

,导致flambo.kryo.BaseFlamboRegistrator

答案 1 :(得分:0)

解决。 使用 -

在spark-configuration中添加项目的所有jar
(conf/jars (map #(.getPath % (.getURLs(java.lang.ClassLoader/getSystemClassLoader))))

它会注册所有课程。 由于此问题已得到解决,因此请将其关闭。