我正在使用 -
创建一个spark上下文(ns something
(:require [flambo.conf : conf]
[flambo.api :as f]))
(def c (-> (conf/spark-conf)
(conf/master "spark://formcept008.lan:7077")
(conf/app-name "clustering"))) ;; app-name
(def sc (f/spark-context c))
然后我正在创建一个RDD -
(f/parallelize sc DATA)
现在,当我对这些数据执行某些操作时,如(f / take rdd 3)等,我收到错误 -
17/11/28 14:35:00 ERROR Utils:遇到异常 org.apache.spark.SparkException:无法使用Kryo注册类 在org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:129) 在org.apache.spark.serializer.KryoSerializerInstance.borrowKryo(KryoSerializer.scala:274) 在org.apache.spark.serializer.KryoSerializerInstance。(KryoSerializer.scala:259) 在org.apache.spark.serializer.KryoSerializer.newInstance(KryoSerializer.scala:175) at org.apache.spark.rdd.ParallelCollectionPartition $$ anonfun $ readObject $ 1.apply $ mcV $ sp(ParallelCollectionRDD.scala:79) 在org.apache.spark.rdd.ParallelCollectionPartition $$ anonfun $ readObject $ 1.apply(ParallelCollectionRDD.scala:70) 在org.apache.spark.rdd.ParallelCollectionPartition $$ anonfun $ readObject $ 1.apply(ParallelCollectionRDD.scala:70) at org.apache.spark.util.Utils $ .tryOrIOException(Utils.scala:1273) 在org.apache.spark.rdd.ParallelCollectionPartition.readObject(ParallelCollectionRDD.scala:70) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1058) at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1909) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018) 在java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373) 在org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75) 在org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114) 在org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:253) 在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:617) 在java.lang.Thread.run(Thread.java:745) 引起:java.lang.ClassNotFoundException:flambo.kryo.BaseFlamboRegistrator at java.net.URLClassLoader.findClass(URLClassLoader.java:381) at java.lang.ClassLoader.loadClass(ClassLoader.java:424) at java.lang.ClassLoader.loadClass(ClassLoader.java:357) at java.lang.Class.forName0(Native Method) 在java.lang.Class.forName(Class.java:348) 在org.apache.spark.serializer.KryoSerializer $$ anonfun $ newKryo $ 5.apply(KryoSerializer.scala:124) 在org.apache.spark.serializer.KryoSerializer $$ anonfun $ newKryo $ 5.apply(KryoSerializer.scala:124) 在scala.collection.TraversableLike $$ anonfun $ map $ 1.apply(TraversableLike.scala:234) 在scala.collection.TraversableLike $$ anonfun $ map $ 1.apply(TraversableLike.scala:234) 在scala.collection.IndexedSeqOptimized $ class.foreach(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps $ ofRef.foreach(ArrayOps.scala:186) 在scala.collection.TraversableLike $ class.map(TraversableLike.scala:234) 在scala.collection.mutable.ArrayOps $ ofRef.map(ArrayOps.scala:186) 在org.apache.spark.serializer.KryoSerializer.newKryo(KryoSerializer.scala:124) ......还有27个 17/11/28 14:35:00 ERROR Executor:阶段0.0(TID 0)中任务0.0的异常 java.lang.IllegalStateException:未读块数据 at java.io.ObjectInputStream $ BlockDataInputStream.setBlockDataMode(ObjectInputStream.java:2449) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1385) at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2018) 在java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1942) at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1808) at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1353) at java.io.ObjectInputStream.readObject(ObjectInputStream.java:373) 在org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75) 在org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114) 在org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:253) 在java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor $ Worker.run(ThreadPoolExecutor.java:617) 在java.lang.Thread.run(Thread.java:745)
对此有任何想法。
答案 0 :(得分:0)
flambo似乎不会以某种方式进入你的课堂路径,这就是你获得的原因:
java.lang.ClassNotFoundException: flambo.kryo.BaseFlamboRegistrator
您是从REPL运行它还是使用lein或启动任务?
如果您正在使用leiningen,请检查您的类路径(lein classpath
)和依赖关系树(lein deps :tree
)
此外,执行lein clean
以确保您的目标文件夹不会导致问题
堆栈跟踪分析
由于Failed to register classes with Kryo
缺失
flambo.kryo.BaseFlamboRegistrator
答案 1 :(得分:0)
解决。 使用 -
在spark-configuration中添加项目的所有jar(conf/jars (map #(.getPath % (.getURLs(java.lang.ClassLoader/getSystemClassLoader))))
它会注册所有课程。 由于此问题已得到解决,因此请将其关闭。