如果该行没有以&#34结尾,如何删除换行符;

时间:2015-01-16 12:49:27

标签: regex sed

示例数据:

"data","123"
"data2","qwer"
"false","234
And i'm the culprit"
"data5","234567"

输出文字应为

"data","123"
"data2","qwer"
"false","234And i'm the culprit"
"data5","234567"

本质上,我想修复我的csv文件(非常大)

我正在使用sed所以sed中的答案会有很多帮助:)

3 个答案:

答案 0 :(得分:0)

对于任何涉及多行的问题,sed始终是错误的选择。只需使用awk:

$ awk '{printf "%s%s", (prev~/"$/?RS:""), $0; prev=$0} END{print ""}' file
"data","123"
"data2","qwer"
"false","234And i'm the culprit"
"data5","234567"

上面只是检查前一行是否以"结束,如果是,则打印默认的记录分隔符(这是换行符 - 您可以用ORS替换RS或硬编码{{1如果你愿意,但如果它没有,那么它不会打印任何东西。然后它打印当前记录,后面没有换行符。在一切结束时,它会打印一个换行符。

答案 1 :(得分:0)

为了完整起见,使用sed可以这样做:

sed '/"\s*$/! { :loop; N; //! { $! b loop }; s/\n//g }'

其工作原理如下:

/"\s*$/! {    # if a line does not end with double quotes (possibly followed
              # by whitespaces)
  :loop       # jump label "loop"
  N           # fetch the next line
  //! {       # unless the content of the pattern space matches the
              # previously attempted pattern (that is: unless it ends with a
              # double quote, which is the case iff the last fetched line does)
    $! b loop # and unless we reached the end of the input ($!),
              # go back to "loop"
  }
  s/\n//g     # remove all newlines from the accumulated lines in the
              # pattern space
}

因此,这会累积连续的行,这些行不会在模式空间中以双引号结束,然后在打印该行之前将它们粘贴到一行。

答案 2 :(得分:0)

sed ':cycle
$ b
/"$/ !N;s/\n//;t cycle' YourFile

sed版本,但不是这种操作的最佳选择