我使用Spark运行使用java.util.logging.Logger
的现有Java包,我收到错误:
org.apache.spark.SparkException: Task not serializable
at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)
at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)
at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
at org.apache.spark.SparkContext.clean(SparkContext.scala:2055)
at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:911)
at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:910)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
at org.apache.spark.rdd.RDD.foreach(RDD.scala:910)
at org.apache.spark.api.java.JavaRDDLike$class.foreach(JavaRDDLike.scala:332)
at org.apache.spark.api.java.AbstractJavaRDDLike.foreach(JavaRDDLike.scala:46)
at edu.uth.clamp.nlp.main.RunPipelineWithSpark.processFolder(RunPipelineWithSpark.java:271)
at edu.uth.clamp.nlp.main.RunPipelineWithSpark.process(RunPipelineWithSpark.java:179)
at edu.uth.clamp.nlp.main.RunPipelineWithSpark.main(RunPipelineWithSpark.java:136)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.NotSerializableException: java.util.logging.Logger
Serialization stack:
- object not serializable (class: java.util.logging.Logger, value: java.util.logging.Logger@a23dc07)
- field (class: edu.uth.clamp.nlp.ner.CRFNameEntityRecognizer, name: logger, type: class java.util.logging.Logger)
- object (class edu.uth.clamp.nlp.ner.CRFNameEntityRecognizer, edu.uth.clamp.nlp.ner.CRFNameEntityRecognizer@5199fdf9)
- field (class: edu.uth.clamp.nlp.uima.NameEntityUIMA, name: recognizer, type: class edu.uth.clamp.nlp.ner.CRFNameEntityRecognizer)
- object (class edu.uth.clamp.nlp.uima.NameEntityUIMA, edu.uth.clamp.nlp.uima.NameEntityUIMA@23a84ec4)
- writeObject data (class: java.util.ArrayList)
答案 0 :(得分:2)
请检查您是否正在尝试序列化记录器实例,将记录器字段设置为静态或瞬态。
答案 1 :(得分:1)
Spark期望在rdd / dstream转换中传递的函数应该是可序列化的。由于java.util.logging.Logger不可序列化,因此您不应在函数内部使用与日志相关的代码。您可以使用简单的println替换日志。或者您可以尝试这里建议的选项。
Apache Spark logging within Scala
请注意,日志可以在驱动程序代码中。 并确保它不引用函数外部的任何不可序列化的变量。为了更好地理解因闭包而导致的序列化,请学习闭包的概念doc doc2。
答案 2 :(得分:1)
在创建日志对象时尝试使用@transient lazy val。最好在封闭内部使用它,这样火花本身就可以解决这个问题。
答案 3 :(得分:0)
您的代码可能类似于
NameEntityUIMA nameEntity = ...;
JavaRDD<SomeType> rdd = ...;
rdd.foreach(x -> /* code using nameEntity */);
foreach
必须序列化其参数以将其发送到每个节点;因为参数使用nameEntity
,它也需要序列化,但它不能(并且由于Java序列化的设计,这只是在运行时检测到而不是给出编译错误)。相反,您希望在每个分区上创建nameEntity
。你可以做到
JavaRDD<SomeType> rdd = ...;
rdd.foreach(x -> {
NameEntityUIMA nameEntity = ...;
/* code using nameEntity */
});
但是这会为RDD的每个元素创建一个新的nameEntity
,这会执行得非常糟糕。相反,请使用foreachPartition
。
答案 4 :(得分:0)
Logger不可序列化,很可能是您尝试从执行程序访问它。我建议把它定义为懒惰
05-22 16:23:45.191 4505-4505/com.development.alo.dasunterboard E/AndroidRuntime: FATAL EXCEPTION: main
Process: com.development.alo.dasunterboard, PID: 4505
java.lang.IllegalStateException
at android.media.MediaPlayer._setDataSource(Native Method)
at android.media.MediaPlayer.setDataSource(MediaPlayer.java:1133)
at com.development.alo.dasunterboard.MainActivity$2.onClick(MainActivity.java:79)
at android.view.View.performClick(View.java:5198)
at android.view.View$PerformClick.run(View.java:21147)
at android.os.Handler.handleCallback(Handler.java:739)
at android.os.Handler.dispatchMessage(Handler.java:95)
at android.os.Looper.loop(Looper.java:148)
at android.app.ActivityThread.main(ActivityThread.java:5417)
at java.lang.reflect.Method.invoke(Native Method)
at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:726)
at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:616)
但缺点是你不应该在驱动程序中使用记录器。另一个不太性感的选择是为执行者提供另一个记录器..