Question

我正在尝试使用这种模式在apache spark中创建新的数据集：

root
 |-- date: date (nullable = true)
 |-- id: string (nullable = true)
 |-- data: struct (nullable = true)
 |    |-- title: string (nullable = true)
 |    |-- label: string (nullable = true)
 |-- errors: array (nullable = true)
 |    |-- element: struct (containsNull = true)
 |    |    |-- type: string (nullable = true)
 |    |    |-- message: string (nullable = true)

由于自定义转换，我正在使用 spark.sqlContext（）。createDataFrame 和 RowFactory.create 需要：

Dataset<Row> newDataset = spark.sqlContext().createDataFrame(dataset.map(
    (Function<Row, Row>) row -> {
        Date date = row.get("date")
        String id = row.get("id")
        // SOME STUFF
        Object[] data = new Object[??];
        Object[] errors = new Object[??];
        return RowFactory.create(date, id, ???);
    }
), schema);

我的问题是：如何使用RowFactory.create函数创建嵌套数据和嵌套错误？

感谢您的帮助。

带有嵌套数据的Apache Spark行工厂

0 个答案: