spark java.util.logging.Logger

时间:2016-08-11 20:33:56

标签: logging apache-spark

我使用Spark运行使用java.util.logging.Logger的现有Java包,我收到错误:

org.apache.spark.SparkException: Task not serializable
    at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:304)
    at org.apache.spark.util.ClosureCleaner$.org$apache$spark$util$ClosureCleaner$$clean(ClosureCleaner.scala:294)
    at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:122)
    at org.apache.spark.SparkContext.clean(SparkContext.scala:2055)
    at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:911)
    at org.apache.spark.rdd.RDD$$anonfun$foreach$1.apply(RDD.scala:910)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
    at org.apache.spark.rdd.RDD.foreach(RDD.scala:910)
    at org.apache.spark.api.java.JavaRDDLike$class.foreach(JavaRDDLike.scala:332)
    at org.apache.spark.api.java.AbstractJavaRDDLike.foreach(JavaRDDLike.scala:46)
    at edu.uth.clamp.nlp.main.RunPipelineWithSpark.processFolder(RunPipelineWithSpark.java:271)
    at edu.uth.clamp.nlp.main.RunPipelineWithSpark.process(RunPipelineWithSpark.java:179)
    at edu.uth.clamp.nlp.main.RunPipelineWithSpark.main(RunPipelineWithSpark.java:136)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:731)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.NotSerializableException: java.util.logging.Logger
Serialization stack:
    - object not serializable (class: java.util.logging.Logger, value: java.util.logging.Logger@a23dc07)
    - field (class: edu.uth.clamp.nlp.ner.CRFNameEntityRecognizer, name: logger, type: class java.util.logging.Logger)
    - object (class edu.uth.clamp.nlp.ner.CRFNameEntityRecognizer, edu.uth.clamp.nlp.ner.CRFNameEntityRecognizer@5199fdf9)
    - field (class: edu.uth.clamp.nlp.uima.NameEntityUIMA, name: recognizer, type: class edu.uth.clamp.nlp.ner.CRFNameEntityRecognizer)
    - object (class edu.uth.clamp.nlp.uima.NameEntityUIMA, edu.uth.clamp.nlp.uima.NameEntityUIMA@23a84ec4)
    - writeObject data (class: java.util.ArrayList)

5 个答案:

答案 0 :(得分:2)

请检查您是否正在尝试序列化记录器实例,将记录器字段设置为静态或瞬态。

答案 1 :(得分:1)

Spark期望在rdd / dstream转换中传递的函数应该是可序列化的。由于java.util.logging.Logger不可序列化,因此您不应在函数内部使用与日志相关的代码。您可以使用简单的println替换日志。或者您可以尝试这里建议的选项。

Apache Spark logging within Scala

请注意,日志可以在驱动程序代码中。  并确保它不引用函数外部的任何不可序列化的变量。为了更好地理解因闭包而导致的序列化,请学习闭包的概念doc doc2

答案 2 :(得分:1)

在创建日志对象时尝试使用@transient lazy val。最好在封闭内部使用它,这样火花本身就可以解决这个问题。

答案 3 :(得分:0)

您的代码可能类似于

NameEntityUIMA nameEntity = ...;
JavaRDD<SomeType> rdd = ...;
rdd.foreach(x -> /* code using nameEntity */);

foreach必须序列化其参数以将其发送到每个节点;因为参数使用nameEntity,它也需要序列化,但它不能(并且由于Java序列化的设计,这只是在运行时检测到而不是给出编译错误)。相反,您希望在每个分区上创建nameEntity。你可以做到

JavaRDD<SomeType> rdd = ...;
rdd.foreach(x -> {
    NameEntityUIMA nameEntity = ...;
    /* code using nameEntity */
});

但是这会为RDD的每个元素创建一个新的nameEntity,这会执行得非常糟糕。相反,请使用foreachPartition

答案 4 :(得分:0)

Logger不可序列化,很可能是您尝试从执行程序访问它。我建议把它定义为懒惰

05-22 16:23:45.191 4505-4505/com.development.alo.dasunterboard E/AndroidRuntime: FATAL EXCEPTION: main
Process: com.development.alo.dasunterboard, PID: 4505
    java.lang.IllegalStateException
        at android.media.MediaPlayer._setDataSource(Native Method)
        at android.media.MediaPlayer.setDataSource(MediaPlayer.java:1133)
        at com.development.alo.dasunterboard.MainActivity$2.onClick(MainActivity.java:79)
        at android.view.View.performClick(View.java:5198)
        at android.view.View$PerformClick.run(View.java:21147)
        at android.os.Handler.handleCallback(Handler.java:739)
        at android.os.Handler.dispatchMessage(Handler.java:95)
        at android.os.Looper.loop(Looper.java:148)
        at android.app.ActivityThread.main(ActivityThread.java:5417)
        at java.lang.reflect.Method.invoke(Native Method)
        at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:726)
        at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:616)

但缺点是你不应该在驱动程序中使用记录器。另一个不太性感的选择是为执行者提供另一个记录器..