RDD另存为文本文件

时间:2016-04-04 20:54:19

标签: java csv apache-spark rdd

如何使用RDD.save作为文本文件保存带分隔格式的文本文件?还需要将数据帧列写为标题..我如何实现?

对于大型RDD,是否有比下面更简单的方法..

List<Row> data = resultFrame.toJavaRDD().collect();
    try {
      File file = new File(fileName);

      if (!file.exists()) {
        file.createNewFile();
      }

      FileWriter fw = new FileWriter(file);

      BufferedWriter bufferedWriter = new BufferedWriter(fw);
      for (Row dataRow:data)
      {
        StringBuilder row  = new StringBuilder();
          for(int i = 0; i<dataRow.size();i++)
          {
            row.append(dataRow.get(i));
            if (i != dataRow.size()-1)
            {
              row.append("~");
            }

          }
        bufferedWriter.write(row.toString());
        bufferedWriter.write("\n");
        row.setLength(0);
      }
      bufferedWriter.close();
    } catch (IOException e) {
      LOGGER.error("Error in writing to the ruf file");
    }

2 个答案:

答案 0 :(得分:0)

就像您使用 SQLContext.read Java API)阅读一样,您需要使用 DataFrame.write Java API)。

其他方式已弃用(例如SQLContext.parquetFile,SQLContext.jsonFile)。

答案 1 :(得分:0)

感谢您的回复。以下工作

public class TildaDelimiter implements Function<Row, String> {

  public String call(Row r) {
    return r.mkString("~");
  }
}

in my save as i did the following to save as a ~ delimited file

 resultFrame.toJavaRDD().map(new TildaDelimiter()).coalesce(1, true)
            .saveAsTextFile(folderName);