Scala如何以正确的顺序格式化加扰的数据

时间:2018-06-29 23:32:18

标签: scala

我的数据保存在文本文件中,如下所示

a,b,c,"d
ee"
1,2,3,"fo
ur"
p,o,t,"lu
ck"
o,n,e,"m
o
re"

我想要以最终输出应为 如下:

a,b,c,"dee"
1,2,3,"four"
p,o,t,"luck"
o,n,e,"more"

这是我尝试过的,但是我无法达到我的期望:

val clean = Source.fromFile("my/path/csv/file.csv")
  .getLines
  .drop(1)
  .mkString
  .split("\"")
  .array

有人可以帮我怎么做吗?

2 个答案:

答案 0 :(得分:1)

如果文件不是太大:

Source.fromFile("my/path/csv/file.csv")
  .mkString                               // Iterator[String] to String
  .init                                   // Remove the last " as we're gooing to split on \"\n and the last one won't be removed
  .split("\"\n")                          // "a,b,c,\"d\nee\"\n1,2,3,\"fo becomes Array("a,b,c,\"d\nee", "1,2,3,\"fo")
  .map(_.replace("\n", "") + "\"")        // and we remove those wrongly placed \n

答案 1 :(得分:0)

您可以做类似

的操作
val clean = Source.fromFile("my/path/csv/file.csv")
  .getLines
  .foldLeft((List[String](), "")){ 
    case ((result, partial), line) => {
      val combined = partial + line
      if (combined.count(_ == '"') == 2) 
        (combined :: result, "") 
      else 
        (result, combined)
    }
  }._1.reverse