从csv保留行中删除换行符

时间:2012-09-25 01:00:41

标签: bash csv awk

我有一个导出的CSV,有些行在记录中间有一个换行符(ASCII 012)。我需要用空格替换它,但保留每条记录的新行以加载它。

大部分线路都很好,但是很少有这样的线路:

输入:

10 , ,"2007-07-30 13.26.21.598000" ,1922 ,0 , , , ,"Special Needs List Rows updated :
Row 1 : Instruction: other :Comment: pump runs all of the water for the insd's home" ,10003 ,524 ,"cc:2023" , , ,2023 , , ,"CCR" ,"INSERT" ,"2011-12-03 01.25.39.759555" ,"2011-12-03 01.25.39.759555"

输出:

10 , ,"2007-07-30 13.26.21.598000" ,1922 ,0 , , , ,"Special Needs List Rows updated :Row 1 : Instruction: other :Comment: pump runs all of the water for the insd's home" ,10003 ,524 ,"cc:2023" , , ,2023 , , ,"CCR" ,"INSERT" ,"2011-12-03 01.25.39.759555" ,"2011-12-03 01.25.39.759555"

我一直在研究Awk但是真的无法理解如何保留实际的行。

另一个例子:

输入:

9~~"2007-08-01 16.14.45.099000"~2215~0~~~~"Exposure closed (Unnecessary) : Garage door working
Claim Withdrawn"~~701~"cc:6007"~~564~6007~~~"CCR"~"INSERT"~"2011-12-03 01.25.39.759555"~"2011-12-03 01.25.39.759555"
4~~"2007-08-01 16.14.49.333000"~1923~0~~~~"Assigned to user Leanne Hamshere in group GIO Home Processing (Team 3)"~~912~"cc:6008"~~~6008~~~"CCR"~"INSERT"~"2011-12-03 01.25.39.759555"~"2011-12-03 01.25.39.759555"

输出:

9~~"2007-08-01 16.14.45.099000"~2215~0~~~~"Exposure closed (Unnecessary) : Garage door working Claim Withdrawn"~~701~"cc:6007"~~564~6007~~~"CCR"~"INSERT"~"2011-12-03 01.25.39.759555"~"2011-12-03 01.25.39.759555"
4~~"2007-08-01 16.14.49.333000"~1923~0~~~~"Assigned to user Leanne Hamshere in group GIO Home Processing (Team 3)"~~912~"cc:6008"~~~6008~~~"CCR"~"INSERT"~"2011-12-03 01.25.39.759555"~"2011-12-03 01.25.39.759555"

3 个答案:

答案 0 :(得分:4)

使用GNU awk的一种方式:

awk -f script.awk file.txt

script.awk的内容:

BEGIN {
    FS = "[,~]"
}

NF < 21 {
    line = (line ? line OFS : line) $0
    fields = fields + NF
}

fields >= 21 {
    print line
    line=""
    fields=0
}

NF == 21 {
    print
}

或者,你可以使用这个单行:

awk -F "[,~]" 'NF < 21 { line = (line ? line OFS : line) $0; fields = fields + NF } fields >= 21 { print line; line=""; fields=0 } NF == 21 { print }' file.txt

说明:

我对你的预期输出做了一个观察:似乎每一行应该包含正好21个字段。因此,如果您的行包含少于21个字段,请存储该行并存储字段数。当我们循环到下一行时,该行将以空格连接到存储的行,并且总计的字段数。如果此字段数大于或等于21(虚线的字段总和将添加到22),则打印存储的行。否则,如果该行包含21个字段(NF == 21),则将其打印出来。 HTH。

答案 1 :(得分:2)

我认为sed是您的选择。我假设所有记录以非冒号字符结尾,因此如果一行以冒号结尾,则将其识别为异常并且应该连接到前一行。

以下是代码:

cat data | sed -e '/[^"]$/N' -e 's/\n//g'

第一次执行-e '/[^"]$/N'匹配异常情况,并在不清空缓冲区的情况下读入下一条记录。然后-e 's/\n//g'删除换行符。

答案 2 :(得分:2)

试试这个单行:

awk '{if(t){print;t=0;next;}x=$0;n=gsub(/"/,"",x);if(n%2){printf $0" ";t=1;}else print $0}' file

主意: 计算一行中"的数量。如果计数是奇数,则加入以下行,否则当前行将被视为完整行。