Question

我正在尝试使用spark graphx。在此之前我想使用数据帧安排我的顶点和边缘rdd。为此我使用了JavaRdd map函数。但是我得到了错误。我尝试了各种方法来修复这个问题。我序列化了整个类。但它没有工作。我还在一个类中实现了Function和Serializable类ind在map函数中使用它。但是aso没有用。请事先帮助我。

    //add long unique id for vertex dataframe and get javaRdd
    JavaRDD<Row> ff = vertex_dataframe.javaRDD().zipWithIndex().map(new Function<Tuple2<Row, java.lang.Long>, Row>() {
        public Row call(Tuple2<Row, java.lang.Long> rowLongTuple2) throws Exception {
            return RowFactory.create(rowLongTuple2._1().getString(0), rowLongTuple2._2());
        }
    });

我序列化了Function（）类，如下所示。

public abstract class SerialiFunJRdd<T1,R> implements Function<T1, R> , java.io.Serializable{

}

Answer 1

我建议你阅读一些关于在java中序列化非静态内部类的内容。您正在地图中创建一个非静态内部类，即使您标记可序列化也不可序列化。你必须首先使它静止。

    JavaRDD<Row> ff = vertex_dataframe.javaRDD().zipWithIndex().map(mapFunc);

    static SerialiFunJRdd<Tuple2<Row, java.lang.Long>, Row> mapFunc=new SerialiFunJRdd<Tuple2<Row, java.lang.Long>, Row>() {
        @Override
        public Row call(Tuple2<Row, java.lang.Long> rowLongTuple2) throws Exception {
            return RowFactory.create(rowLongTuple2._1().getString(0), rowLongTuple2._2());
        }
    }

org.apache.spark.SparkException：任务在java中不可序列化

1 个答案: