Question

我有一个csv文件，其中包含 CRLF 和 LF 。在某些点上有一个 LF ，实际上内容属于之前的行。

示例：

smith;pete;he is very nice;1990CRLF
brown;mark;he is very nice;2010CRLF
taylor;sam;he isLF
very nice;2009CRLF

在我的脚本中，我想删除所有 LF 的独立实例。我尝试使用sed：

sed -e ':a' -e 'N' -e '$!ba' -e 's/\n/ /g' $my_file

此解决方案的问题在于，属于 CRLF 的 LF 也会被替换为空格字符。

Answer 1

perl默认情况下不会删除记录分隔符 - 因此可以轻松操作

$ cat -A ip.txt
smith;pete;he is very nice;1990^M$
brown;mark;he is very nice;2010^M$
taylor;sam;he is$
very nice;2009^M$

$ perl -pe 's/(?<!\r)\n/ /' ip.txt
smith;pete;he is very nice;1990
brown;mark;he is very nice;2010
taylor;sam;he is very nice;2009

$ perl -pe 's/(?<!\r)\n/ /' ip.txt | cat -A
smith;pete;he is very nice;1990^M$
brown;mark;he is very nice;2010^M$
taylor;sam;he is very nice;2009^M$

(?<!\r)\n使用否定后卫来确保我们仅在\n

之前取代\r

修改OP的尝试：

$ sed -e ':a' -e 'N' -e '$!ba' -e 's/\([^\r]\)\n/\1 /g' ip.txt
smith;pete;he is very nice;1990
brown;mark;he is very nice;2010
taylor;sam;he is very nice;2009

\([^\r]\)确保\n之前的字符不是\r

Answer 2

使用awk：

$ awk 'BEGIN{RS=ORS="\r\n"}/\n/{sub(/\n/,"")}1' file
smith;pete;he is very nice;1990
brown;mark;he is very nice;2010
taylor;sam;he isvery nice;2009

说明：

$ awk '
BEGIN { RS=ORS="\r\n" }  # set the record separators to CRLF
/\n/ {                   # if there is stray LF in the record
    sub(/\n/,"")         # remove it (maybe " " to replace it with a space)
}1' file                 # output it

在gawk，mawk和Busybox awk上成功测试。使用BSD awk失败，例如：

awk '!/\r$/{printf "%s",$0;next}1' file

如何使用sed替换LF空间，而不是CRLF？

2 个答案: