我正在尝试使用这种模式在apache spark中创建新的数据集:
root
|-- date: date (nullable = true)
|-- id: string (nullable = true)
|-- data: struct (nullable = true)
| |-- title: string (nullable = true)
| |-- label: string (nullable = true)
|-- errors: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- type: string (nullable = true)
| | |-- message: string (nullable = true)
由于自定义转换,我正在使用 spark.sqlContext()。createDataFrame 和 RowFactory.create 需要:
Dataset<Row> newDataset = spark.sqlContext().createDataFrame(dataset.map(
(Function<Row, Row>) row -> {
Date date = row.get("date")
String id = row.get("id")
// SOME STUFF
Object[] data = new Object[??];
Object[] errors = new Object[??];
return RowFactory.create(date, id, ???);
}
), schema);
我的问题是:如何使用RowFactory.create函数创建嵌套数据和嵌套错误?
感谢您的帮助。