从文件中的字段中删除多行字符串

时间:2018-01-27 21:42:18

标签: linux bash csv awk sed

我有一个csv文件,如下所示,由源系统发送,除了添加列之外,它们没有处理机制:

1,"Bob Smith
531 Pennsylvania Avenue
Washington, DC",3,4,"qqqqzzzz" 
5,"Bob Smith
531 Pennsylvania Avenue
Washington, DC",6,7,"qqqqzzzz"

预期产出:

1,"Bob Smith 531 Pennsylvania Avenue Washington, DC",3,4
5,"Bob Smith 531 Pennsylvania Avenue Washington, DC",6,7

我尝试过以下方法:

  1. 要求在每行末尾添加标识符的源系统" qqqqzzzz"

  2. 尝试用空格替换所有新行,然后再用新行替换所有qqqqzzzz

  3. 但qqqqzzzz的最后一次替换会导致换行的新行更换为下一行,如下所示:

    1,"Bob Smith 531 Pennsylvania Avenue Washington, DC",3,4,"" 
    5,"Bob Smith
    
    sed '/^$/d' all.csv|tr '\n' ' '|sed 's/qqqqzzzz/\n/g' >results.csv
    

    尝试替换引用文字hereherehere

    的解决方案

    尝试使用命令后更新:

    $ sed 'N;N;s/\n//g;s/,"qqqqzzzz"$//' quotetest.csv
    1,"Bob Smith 531 Pennsylvania Avenue Washington, DC",3,4,"qqqqzzzz"
    5,"Bob Smith 531 Pennsylvania Avenue Washington, DC",6,7
    

1 个答案:

答案 0 :(得分:3)

使用GNU awk:

$ awk 'BEGIN{RS=",\"qqqqzzzz\" ?\r?\n"}{$1=$1}1' file
1,"Bob Smith 531 Pennsylvania Avenue Washington, DC",3,4
5,"Bob Smith 531 Pennsylvania Avenue Washington, DC",6,7

使用dos和unix行结尾测试。关键是使用标识符和相关的额外字符(逗号,条件空间和行结束字符)作为记录分隔符(RS),问题是在第一个标识符之后有空格但在第二个标识符之后没有