在Java8 Spark中将Row []转换为二维数组

时间:2019-10-02 05:02:06

标签: apache-spark java-8 apache-spark-sql

我想使用Java8和Spark将Row []列表转换为二维数组String [] []

输入数据框

+-------------------+----+-----+
|          attribute|city|cntry|
+-------------------+----+-----+
|LOC1,LOC2,LOC3,LOC4| chn|   AU|
|          LOC1,LOC4| mdu|   PE|
|          LOC9,LOC7| sdu|   US|
|          LOC5,LOC6| fdu|  CAN|
+-------------------+----+-----+

请帮助我获得预期的输出。

无法获得预期的输出,只能存储最后一行的数据。

将Java8与Spark配合使用

Dataset<Row> df1 = ss.read().option("inferSchema", true).format("json").load("src/main/resources/input.json");

String[][] outputList = new String[100][100];
Row[] colList = (Row[]) df1.collect();
int rowCount = (int) df1.count();

for (Row rw : colList) {
for (int i = 0; i < rowCount; i++) {
for (int j = 0; j < rw.size(); j++) {
outputList[i][j] = rw.get(j).toString();
}}}

    for (int i = 0; i < 4; i++) {
for (int j = 0; j < 3; j++) {
System.out.println("outputList[" + i + "][" + j + "]" + outputList[i][j]);
}}

预期输出应如下

    outputList[0][0]:LOC1,LOC2,LOC3,LOC4
    outputList[0][1]:chn
    outputList[0][2]:AU
    outputList[1][0]:LOC1,LOC4
    outputList[1][1]:mdu
    outputList[1][2]:PE
    outputList[2][0]:LOC9,LOC7
    outputList[2][1]:sdu
    outputList[2][2]:US
    outputList[3][0]:LOC5,LOC6
    outputList[3][1]:fdu
    outputList[3][2]:CAN

1 个答案:

答案 0 :(得分:0)

尝试

Row[] rows = (Row[]) df.collect();
int cSize = rows[0].size();
int rSize = rows.length;
String[][] outputList = new String[rSize][cSize];
for (int i = 0; i < rSize; i++) {
    Row row = rows[i];
    for (int j = 0; j < cSize; j++) {
        String element = row.get(j).toString();
        outputList[i][j] = element;
    }
}