spark map方法抛出序列化异常

时间:2015-08-05 15:32:14

标签: java hadoop serialization apache-spark

我是Spark的新手,我在map函数中遇到序列化问题。这是代码的一些元素

private Function<Row, String> SparkMap() throws IOException {
        return new Function<Row, String>() {
            public String call(Row row) throws IOException {
                /* some code */
            }
        };
    }

public static void main(String[] args) throws Exception {
        MyClass myClass = new MyClass();
        SQLContext sqlContext = new SQLContext(sc);
        DataFrame df = sqlContext.load(args[0], "com.databricks.spark.avro");

        JavaRDD<String> output = df.javaRDD().map(myClass.SparkMap());
    }

这里是错误日志

Caused by: java.io.NotSerializableException: myPackage.MyClass
Serialization stack:
    - object not serializable (class: myPackage.MyClass, value: myPackage.MyClass@281c8380)
    - field (class: myPackage.MyClass$1, name: this$0, type: class myPackage.MyClass)
    - object (class myPackage.MyClass$1, myPackage.MyClass$1@28ef1bc8)
    - field (class: org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1, name: fun$1, type: interface org.apache.spark.api.java.function.Function)
    - object (class org.apache.spark.api.java.JavaPairRDD$$anonfun$toScalaFunction$1, <function1>)
    at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:40)
    at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
    at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:81)
    at org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:312)
    ... 12 more

如果我声明静态SparkMap方法,那么它就会运行。它怎么可能

1 个答案:

答案 0 :(得分:2)

异常很明确:

MyClas

只需制作Serializable myClass即可。

它作为静态工作,因为它只接受该情况下的函数,而不是整个chrome.webRequest.onBeforeRequest.addListener( function(details) { console.log(details.url); return {cancel: details.url.indexOf("://www.evil.com/") != -1}; }, {urls: ["<all_urls>"]}, ["blocking"]); 对象