我的课就是这样的。我也得到了结果"是空的"。如果删除testClass上的transient标记,它也会导致任务不可序列化的错误,甚至认为TestClass已实现Serializable。那么为什么mergeLog中的对象testClass为null?
public class MergeLog implements Serializable {
private static final Logger LOGGER = LoggerFactory.getLogger(LogFormat.class);
private transient SparkConf conf = new SparkConf().setAppName("log join");
private transient JavaSparkContext sc = new JavaSparkContext(conf);
private HiveContext hiveContext = new org.apache.spark.sql.hive.HiveContext(sc.sc());
private transient TestClass testClass = new TestClass();
public void process() {
JavaRDD<String> people = sc.textFile("/user/people.txt");
String schemaString = "name age";
List<StructField> fields = new ArrayList<StructField>();
for (String fieldName: schemaString.split(" ")) {
fields.add(DataTypes.createStructField(fieldName, DataTypes.StringType, true));
}
StructType schema = DataTypes.createStructType(fields);
JavaRDD<Row> rowRDD = people.map(
new Function<String, Row>() {
@Override
public Row call(String record) throws Exception {
String[] fields = record.split(",");
return RowFactory.create(fields[0], fields[1].trim());
}
});
DataFrame peopleDataFrame = sqlContext.createDataFrame(rowRDD, schema);
JavaRDD<String> javaRDD = peopleDataFrame.toJavaRDD().map(
new Function<Row, String>() {
@Override
public String call(Row row) throws Exception {
String ins = null;
if (testClass == null) {
return "is null";
} else {
ins = testClass.calc(row);
}
}
});
}
public static void main(String[] args) {
MergeLog mergeLog = new MergeLog();
mergeLog.process();
}
}
class TestClass implements Serializable {
public String calc(Row row) {
return row.mkString();
}
}
答案 0 :(得分:1)
测试类是在驱动程序端创建的,因为它是瞬态的,所以实例不会传递给worker。 在
中创建一个新的测试实例peopleDataFrame.toJavaRDD().map(
new Function<Row, String>() {
@Override
public String call(Row row) throws Exception {
String ins = null;
ins = new TestClass().calc(row);
}
}
});
此外,行类不可序列化,因此当您从TestClass中删除瞬态时,它表示不可序列化的异常。 仅将所需参数从Row传递给类进行处理。