How to merge two text files and convert it to csv file in Scala

时间:2015-09-14 16:01:31

标签: scala csv

I use the following code to export a DataFrame :

df.select("A", "b", "C", "D","E")
  .write.format("com.databricks.spark.csv")
  .save("newiris.csv")

I get two text files as following :

part-00000

5.1,3.5,1.4,0.2,Iris-setosa
4.9,3,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5,3.6,1.4,0.2,Iris-setosa
5.4,3.9,1.7,0.4,Iris-setosa

part-00001

6.7,3,5,1.7,Iris-versicolor
6,2.9,4.5,1.5,Iris-versicolor
5.7,2.6,3.5,1,Iris-versicolor
5.5,2.4,3.8,1.1,Iris-versicolor
5.5,2.4,3.7,1,Iris-versicolor
5.8,2.7,3.9,1.2,Iris-versicolor

Now I want to have them combined to one file like

5.1,3.5,1.4,0.2,Iris-setosa
4.9,3,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5,3.6,1.4,0.2,Iris-setosa
5.4,3.9,1.7,0.4,Iris-setosa
6.7,3,5,1.7,Iris-versicolor
6,2.9,4.5,1.5,Iris-versicolor
5.7,2.6,3.5,1,Iris-versicolor
5.5,2.4,3.8,1.1,Iris-versicolor
5.5,2.4,3.7,1,Iris-versicolor
5.8,2.7,3.9,1.2,Iris-versicolor

And then convert it to CSV. How can I do this in Scala?

1 个答案:

答案 0 :(得分:2)

此处必需的Scala位为scala.io.Source以读取文件并获取行++以追加part0-00000part-00001以及foreach循环浏览组合数据并写入文件。文件I / O与Java相同。

scala> import java.io._

scala> import scala.io.Source

scala> val part0 = Source.fromFile("part-00000.txt").getLines
part0: Iterator[String] = non-empty iterator

scala> val part1 = Source.fromFile("part-00001.txt").getLines
part1: Iterator[String] = non-empty iterator

scala> val part2 = part0.toList ++ part1.toList
part2: List[String] = List(5.1,3.5,1.4,0.2,Iris-setosa, 4.9,3,1.4,0.2,Iris-setosa, 4.7,3.2,1.3,0.2,Iris-setosa, 4.6,3.1,1.5,0.2,Iris-setosa, 5,3.6,1.4,0.2,Iris-setosa, 5.4,3.9,1.7,0.4,Iris-setosa, 6.7,3,5,1.7,Iris-versicolor, 6,2.9,4.5,1.5,Iris-versicolor, 5.7,2.6,3.5,1,Iris-versicolor, 5.5,2.4,3.8,1.1,Iris-versicolor, 5.5,2.4,3.7,1,Iris-versicolor, 5.8,2.7,3.9,1.2,Iris-versicolor)

scala> val part00002 = new File("part-00002")
part00002: java.io.File = part-00002

scala> val bw = new BufferedWriter(new FileWriter(part00002))
bw: java.io.BufferedWriter = java.io.BufferedWriter@56826a75

scala> part2.foreach(p => bw.write(p + "\n"))


scala> bw.close

检查文件:

brian:/tmp/ $ cat part-00002                                                            
5.1,3.5,1.4,0.2,Iris-setosa
4.9,3,1.4,0.2,Iris-setosa
4.7,3.2,1.3,0.2,Iris-setosa
4.6,3.1,1.5,0.2,Iris-setosa
5,3.6,1.4,0.2,Iris-setosa
5.4,3.9,1.7,0.4,Iris-setosa
6.7,3,5,1.7,Iris-versicolor
6,2.9,4.5,1.5,Iris-versicolor
5.7,2.6,3.5,1,Iris-versicolor
5.5,2.4,3.8,1.1,Iris-versicolor
5.5,2.4,3.7,1,Iris-versicolor
5.8,2.7,3.9,1.2,Iris-versicolor