Spark中的类nullPoint异常

时间:2016-06-23 08:04:54

标签: java apache-spark

我的课就是这样的。我也得到了结果"是空的"。如果删除testClass上的transient标记,它也会导致任务不可序列化的错误,甚至认为TestClass已实现Serializable。那么为什么mergeLog中的对象testClass为null?

public class MergeLog implements Serializable {
    private static final Logger LOGGER = LoggerFactory.getLogger(LogFormat.class);
    private transient SparkConf conf = new SparkConf().setAppName("log join");
    private transient JavaSparkContext sc = new JavaSparkContext(conf);
    private HiveContext hiveContext = new org.apache.spark.sql.hive.HiveContext(sc.sc());
    private transient TestClass testClass = new TestClass();

    public void process() {
        JavaRDD<String> people = sc.textFile("/user/people.txt");
        String schemaString = "name age";
        List<StructField> fields = new ArrayList<StructField>();
        for (String fieldName: schemaString.split(" ")) {
            fields.add(DataTypes.createStructField(fieldName, DataTypes.StringType, true));
        }
        StructType schema = DataTypes.createStructType(fields);

        JavaRDD<Row> rowRDD = people.map(
            new Function<String, Row>() {
                @Override
                public Row call(String record) throws Exception {
                    String[] fields = record.split(",");
                    return RowFactory.create(fields[0], fields[1].trim());
                }
            });
        DataFrame peopleDataFrame = sqlContext.createDataFrame(rowRDD, schema);
        JavaRDD<String> javaRDD = peopleDataFrame.toJavaRDD().map(
                                      new Function<Row, String>() {
                                          @Override
                                           public String call(Row row) throws Exception {
                                                String ins = null;
                                                if (testClass == null) {
                                                     return "is null";
                                                } else {
                                                     ins = testClass.calc(row);
                                                }
                                            }
                                      });      
    }

    public static void main(String[] args) {
        MergeLog mergeLog = new MergeLog();
        mergeLog.process();
    }
}

class TestClass implements Serializable {
    public String calc(Row row) {
       return row.mkString();
    }
}

1 个答案:

答案 0 :(得分:1)

测试类是在驱动程序端创建的,因为它是瞬态的,所以实例不会传递给worker。 在

中创建一个新的测试实例
peopleDataFrame.toJavaRDD().map(
                                      new Function<Row, String>() {
                                          @Override
                                           public String call(Row row) throws Exception {
                                                String ins = null;
                                                ins = new TestClass().calc(row);
                                                }
                                            }
                                      }); 

此外,行类不可序列化,因此当您从TestClass中删除瞬态时,它表示不可序列化的异常。 仅将所需参数从Row传递给类进行处理。