我有一个csv文件,如下所示,由源系统发送,除了添加列之外,它们没有处理机制:
1,"Bob Smith
531 Pennsylvania Avenue
Washington, DC",3,4,"qqqqzzzz"
5,"Bob Smith
531 Pennsylvania Avenue
Washington, DC",6,7,"qqqqzzzz"
预期产出:
1,"Bob Smith 531 Pennsylvania Avenue Washington, DC",3,4
5,"Bob Smith 531 Pennsylvania Avenue Washington, DC",6,7
我尝试过以下方法:
要求在每行末尾添加标识符的源系统" qqqqzzzz"
尝试用空格替换所有新行,然后再用新行替换所有qqqqzzzz
但qqqqzzzz的最后一次替换会导致换行的新行更换为下一行,如下所示:
1,"Bob Smith 531 Pennsylvania Avenue Washington, DC",3,4,""
5,"Bob Smith
sed '/^$/d' all.csv|tr '\n' ' '|sed 's/qqqqzzzz/\n/g' >results.csv
的解决方案
尝试使用命令后更新:
$ sed 'N;N;s/\n//g;s/,"qqqqzzzz"$//' quotetest.csv
1,"Bob Smith 531 Pennsylvania Avenue Washington, DC",3,4,"qqqqzzzz"
5,"Bob Smith 531 Pennsylvania Avenue Washington, DC",6,7
答案 0 :(得分:3)
使用GNU awk:
$ awk 'BEGIN{RS=",\"qqqqzzzz\" ?\r?\n"}{$1=$1}1' file
1,"Bob Smith 531 Pennsylvania Avenue Washington, DC",3,4
5,"Bob Smith 531 Pennsylvania Avenue Washington, DC",6,7
使用dos和unix行结尾测试。关键是使用标识符和相关的额外字符(逗号,条件空间和行结束字符)作为记录分隔符(RS
),问题是在第一个标识符之后有空格但在第二个标识符之后没有