示例

Question

我在字段中有带换行符的csv文件。现在，我想删除它们而不删除行尾的换行符。

行尾有一个双引号，像这样：

...;"25.33"\n

因此，为了删除字段中的换行符，我尝试删除每个不带有双引号的换行符。其正则表达式为：[^"]\n

在sed中：

sed -i -E "s/[^"]\n/ /g" *.csv＃不包含双引号的换行符

我受到bash的投诉：

➜ sed -i -E "s/[^"]\n/ /g" *.csv
dquote>

很明显，我必须在方括号中转义引号：

sed -i -E "s/[^\"]\n/ /g" *.csv

但这也不起作用：

➜  csv_working_copy1 sed -i -E "s/[^\"]\n/ /g" *.csv
sed: RE error: illegal byte sequence

我想念什么？

示例

这是示例行

"2019-03-17";"Comment \n
with newline within it";"23.88"\n

我想要这个输出

"2019-03-17";"Comment with newline within it";"23.88"\n

Answer 1

将单引号用作最外面的双引号：

sed -i -E 's/[^"]\n/ /g' *.csv

Answer 2

这是应该处理的awk：

$ awk -v RS="^$" '{            # read the whole file in at the beginning
    for(i=1;i<=length;i++) {   # iterate file char at a time
        c=substr($0,i,1)       # read char
        if(c=="\"")            # if its a quote
            f=!f               # ... flag up, of down if already up
        if(c=="\n" && f)       # if its newline and flag is up ie. within quotes
            c=""               # replace newline with null
        printf "%s",c          # print char
    }
}' file

输出示例：

"2019-03-17";"Comment \nwith newline within it";"23.88"\n

更多记录：

$ awk ... file file file
"2019-03-17";"Comment \nwith newline within it";"23.88"\n
"2019-03-17";"Comment \nwith newline within it";"23.88"\n
"2019-03-17";"Comment \nwith newline within it";"23.88"\n

自然不会容忍任何报价问题。

更新：另一个较短的解决方案：

$ awk '{if((c+=gsub(/"/,"&"))%2==0)print;else printf "%s",$0}' file

解释：

$ awk '{
    if((c+=gsub(/"/,"&"))%2==0)  # keep count of quotes, if count is even:
        print                    # print with newline
    else                         # else
        printf "%s",$0           # omit newline
}'

Answer 3

另一个awk：

awk '!($0~"\"$"){a=a$0;next}{$0=a $0;a=""}1' infile

sed：替换双引号在终端中不起作用

示例

3 个答案: