我正在尝试将DataSet转换为java对象。 架构就像
root
|-- deptId: long (nullable = true)
|-- depNameName: string (nullable = true)
|-- employee: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- firstName: string (nullable = true)
| | |-- lastName: string (nullable = true)
| | |-- phno: Long (nullable = true)
| | | |-- element: integer (containsNull = true)
我创建了pojo类Like。
class Department {
private Long deptId;
private String depName;
private List<Employee> employess;
//with getter setters and no argument constructor
}
class Employee {
private String firstName;
private String lastName;
private List<Long> phno;
//With getter setter and no argument constructor
}
现在这是我正在尝试进行转换的代码。
Dataset<Row> ds = this.spark.read().parquet(Parquet file path);
Dataset<Department> departmentDataset =
ds.as(Encoders.bean(Department.class));
JavaRDD<String> rdd =
departmentDataset.toJavaRDD().map((Function<Department, String>) v -> {
StringBuilder sb = new StringBuilder();
sb.append("deptId").append(v.getDeptID());
if(!CollectionUtil.isListNullOrEmpty(v.employee))
sb.append("FirstName").append(v.getEmployee().get(0).getName);
if(!CollectionUtil.isListNullOrEmpty(v.getEmployee().getPhno()))
sb.append("Ph
number").append(v.getEmployee().getPhno().get(0));
return sb.toString();
});
但是此代码不起作用。 org.apache.hadoop.yarn.server.nodemanager.containermanager.runtime.ContainerExecutionException
失败。但是我可以使用基于行的构造函数对此进行转换,在这里我需要对列名进行硬编码。
喜欢
public Department(Row row)
{
this.employees = new ArrayList<Employee>
this.deptaID = (Long)row.getAs("deptId");
List rowList = (List)row.getList(row.fieldIndex("employee"));
if (rowList!=null) {
for (Row r : rowList) {
Employee obj = new Employee(r);
employees.add(obj);
}
}
public Employee(Row row)
{
this.phno = new ArrayList<Long>
this.firstName = (Long)row.getAs("firstName");
List rowList = (List)row.getList(row.fieldIndex("phno"));
if (rowList!=null) {
for (Row r : rowList) {
phno.add(r);
}
}
JavaRDD<Department> rdd = ds.toJavaRDD().map(Department::new);
JavaRDD<String> rdd = rdd.map((Function<Department, String>) v -> {
StringBuilder sb = new StringBuilder();
sb.append("deptId").append(v.getDeptID());
if(!CollectionUtil.isListNullOrEmpty(v.employee))
sb.append("FirstName").append(v.getEmployee().get(0).getName);
if(!CollectionUtil.isListNullOrEmpty(v.getEmployee().getPhno()))
sb.append("Ph
number").append(v.getEmployee().getPhno().get(0));
return sb.toString();
});
通过这种方法,我成功了。但是它包括很多Schema名称的硬编码。因此,正在寻找更优雅的解决方案。
请提出针对此问题的最佳解决方案。