我想使用Java8和Spark将Row []列表转换为二维数组String [] []
输入数据框
+-------------------+----+-----+
| attribute|city|cntry|
+-------------------+----+-----+
|LOC1,LOC2,LOC3,LOC4| chn| AU|
| LOC1,LOC4| mdu| PE|
| LOC9,LOC7| sdu| US|
| LOC5,LOC6| fdu| CAN|
+-------------------+----+-----+
请帮助我获得预期的输出。
无法获得预期的输出,只能存储最后一行的数据。
将Java8与Spark配合使用
Dataset<Row> df1 = ss.read().option("inferSchema", true).format("json").load("src/main/resources/input.json");
String[][] outputList = new String[100][100];
Row[] colList = (Row[]) df1.collect();
int rowCount = (int) df1.count();
for (Row rw : colList) {
for (int i = 0; i < rowCount; i++) {
for (int j = 0; j < rw.size(); j++) {
outputList[i][j] = rw.get(j).toString();
}}}
for (int i = 0; i < 4; i++) {
for (int j = 0; j < 3; j++) {
System.out.println("outputList[" + i + "][" + j + "]" + outputList[i][j]);
}}
预期输出应如下
outputList[0][0]:LOC1,LOC2,LOC3,LOC4
outputList[0][1]:chn
outputList[0][2]:AU
outputList[1][0]:LOC1,LOC4
outputList[1][1]:mdu
outputList[1][2]:PE
outputList[2][0]:LOC9,LOC7
outputList[2][1]:sdu
outputList[2][2]:US
outputList[3][0]:LOC5,LOC6
outputList[3][1]:fdu
outputList[3][2]:CAN
答案 0 :(得分:0)
尝试
Row[] rows = (Row[]) df.collect();
int cSize = rows[0].size();
int rSize = rows.length;
String[][] outputList = new String[rSize][cSize];
for (int i = 0; i < rSize; i++) {
Row row = rows[i];
for (int j = 0; j < cSize; j++) {
String element = row.get(j).toString();
outputList[i][j] = element;
}
}