我想将固定宽度的数据框中的行转换为定界数据:
如何在java / JavaRDD中实现这一目标。
输入数据框: df.show()
c0
| WAAAAAAWone |
| QBAAAAAWtwo |
输出:应该以竖线(|)分隔。
co | c1 | c2
W | AAAAAA | Wone |
Q | BAAAAA | Wtwo |
答案 0 :(得分:1)
您可以使用String.substring(int start, int end)
轻松完成此操作。这是您需要的方法的有效实现。
public static String parseData(String data) {
String ret = "c0|c1|c2";
// Remove edge delimiters
data = data.replaceAll("\\|", "");
// Split rows
String[] rows = data.split("\n");
// Iterate through each row
for(String row : rows) {
// We end up with extra empty strings because of pipe delimiting, skip them
if("".equals(row)) continue;
// Check row length, throw exception if incorrect
if(row.length() != 11) {
String message = String.format("Row passed to parseData() was the wrong length! Expected 11, got %d", row.length());
throw new IllegalArgumentException(message);
}
String col1 = row.substring(0,1); // Get column one (length=1)
String col2 = row.substring(1,7); // Get column 2 (length=6)
String col3 = row.substring(7,11); // Get column 3 (length=4)
// Add delimited row to return string
ret += (String.format("\n%s|%s|%s", col1, col2, col3));
}
return ret;
}
我测试了它。 parseData("|WAAAAAAWone|\n|QBAAAAAWtwo|")
返回:
c0|c1|c2
W|AAAAAA|Wone
Q|BAAAAA|Wtwo