我在Java中使用Spark 2.3.1 我有一个封装数据集的对象。我希望能够对该对象进行序列化和反序列化。
我的代码如下:
services.AddSwaggerGen(options =>
{
var filePath = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "MyProject.xml");
options.IncludeXmlComments(filePath);
});
但是序列化似乎未正确完成。 load()函数显示以下内容:
public class MyClass implements Serializable {
private static final long serialVersionUID = -189012460301698744L;
public Dataset<Row> dataset;
public MyClass(final Dataset<Row> dataset) {
this.dataset = dataset;
}
/**
* Save the current instance of MyClass into a file as a serialized object.
*/
public void save(final String filepath, final String filename) throws Exception{
File file = new File(filepath);
file.mkdirs();
file = new File(filepath+"/"+filename);
try (final ObjectOutputStream oos = new ObjectOutputStream(new FileOutputStream(file))) {
oos.writeObject(this);
}
}
/**
* Create a new MyClass from a serialized MyClass object
*/
public static MyClass load(final String filepath) throws Exception{
final File file = new File(filepath);
final MyClass myclass;
try (final ObjectInputStream ois = new ObjectInputStream(new FileInputStream(file))) {
myclass = ((MyClass) ois.readObject());
}
System.out.println("test 1 : "+ myclass);
System.out.println("test 2 : "+ myclass.dataset);
myclass.dataset.printSchema();
return myclass;
}
// Some other functions
}
并在printSchema()上引发java.lang.NullPointerException
要正确序列化对象,我缺少什么?
答案 0 :(得分:3)
Spark Datasets
仅在用于创建它们的会话范围内才有意义。因此,序列化Dataset
毫无意义。
Dataset
写入永久性存储。Dataset
的代码(方法)。不要尝试自行序列化Dataset
。