带有嵌套数据的Apache Spark行工厂

时间:2018-07-06 10:08:54

标签: java apache-spark apache-spark-sql

我正在尝试使用这种模式在apache spark中创建新的数据集:

root
 |-- date: date (nullable = true)
 |-- id: string (nullable = true)
 |-- data: struct (nullable = true)
 |    |-- title: string (nullable = true)
 |    |-- label: string (nullable = true)
 |-- errors: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- type: string (nullable = true)
 |    |    |-- message: string (nullable = true)

由于自定义转换,我正在使用 spark.sqlContext()。createDataFrame RowFactory.create 需要:

Dataset<Row> newDataset = spark.sqlContext().createDataFrame(dataset.map(
    (Function<Row, Row>) row -> {
        Date date = row.get("date")
        String id = row.get("id")
        // SOME STUFF
        Object[] data = new Object[??];
        Object[] errors = new Object[??];
        return RowFactory.create(date, id, ???);
    }
), schema);

我的问题是:如何使用RowFactory.create函数创建嵌套数据和嵌套错误?

感谢您的帮助。

0 个答案:

没有答案