Question

我有一个导出的CSV，有些行在记录中间有一个换行符（ASCII 012）。我需要用空格替换它，但保留每条记录的新行以加载它。

大部分线路都很好，但是很少有这样的线路：

输入：

10 , ,"2007-07-30 13.26.21.598000" ,1922 ,0 , , , ,"Special Needs List Rows updated :
Row 1 : Instruction: other :Comment: pump runs all of the water for the insd's home" ,10003 ,524 ,"cc:2023" , , ,2023 , , ,"CCR" ,"INSERT" ,"2011-12-03 01.25.39.759555" ,"2011-12-03 01.25.39.759555"

输出：

10 , ,"2007-07-30 13.26.21.598000" ,1922 ,0 , , , ,"Special Needs List Rows updated :Row 1 : Instruction: other :Comment: pump runs all of the water for the insd's home" ,10003 ,524 ,"cc:2023" , , ,2023 , , ,"CCR" ,"INSERT" ,"2011-12-03 01.25.39.759555" ,"2011-12-03 01.25.39.759555"

我一直在研究Awk但是真的无法理解如何保留实际的行。

另一个例子：

输入：

9~~"2007-08-01 16.14.45.099000"~2215~0~~~~"Exposure closed (Unnecessary) : Garage door working
Claim Withdrawn"~~701~"cc:6007"~~564~6007~~~"CCR"~"INSERT"~"2011-12-03 01.25.39.759555"~"2011-12-03 01.25.39.759555"
4~~"2007-08-01 16.14.49.333000"~1923~0~~~~"Assigned to user Leanne Hamshere in group GIO Home Processing (Team 3)"~~912~"cc:6008"~~~6008~~~"CCR"~"INSERT"~"2011-12-03 01.25.39.759555"~"2011-12-03 01.25.39.759555"

输出：

9~~"2007-08-01 16.14.45.099000"~2215~0~~~~"Exposure closed (Unnecessary) : Garage door working Claim Withdrawn"~~701~"cc:6007"~~564~6007~~~"CCR"~"INSERT"~"2011-12-03 01.25.39.759555"~"2011-12-03 01.25.39.759555"
4~~"2007-08-01 16.14.49.333000"~1923~0~~~~"Assigned to user Leanne Hamshere in group GIO Home Processing (Team 3)"~~912~"cc:6008"~~~6008~~~"CCR"~"INSERT"~"2011-12-03 01.25.39.759555"~"2011-12-03 01.25.39.759555"

Answer 1

使用GNU awk的一种方式：

awk -f script.awk file.txt

script.awk的内容：

BEGIN {
    FS = "[,~]"
}

NF < 21 {
    line = (line ? line OFS : line) $0
    fields = fields + NF
}

fields >= 21 {
    print line
    line=""
    fields=0
}

NF == 21 {
    print
}

或者，你可以使用这个单行：

awk -F "[,~]" 'NF < 21 { line = (line ? line OFS : line) $0; fields = fields + NF } fields >= 21 { print line; line=""; fields=0 } NF == 21 { print }' file.txt

说明：

我对你的预期输出做了一个观察：似乎每一行应该包含正好21个字段。因此，如果您的行包含少于21个字段，请存储该行并存储字段数。当我们循环到下一行时，该行将以空格连接到存储的行，并且总计的字段数。如果此字段数大于或等于21（虚线的字段总和将添加到22），则打印存储的行。否则，如果该行包含21个字段（NF == 21），则将其打印出来。 HTH。

Answer 2

我认为sed是您的选择。我假设所有记录以非冒号字符结尾，因此如果一行以冒号结尾，则将其识别为异常并且应该连接到前一行。

以下是代码：

cat data | sed -e '/[^"]$/N' -e 's/\n//g'

第一次执行-e '/[^"]$/N'匹配异常情况，并在不清空缓冲区的情况下读入下一条记录。然后-e 's/\n//g'删除换行符。

Answer 3

试试这个单行：

awk '{if(t){print;t=0;next;}x=$0;n=gsub(/"/,"",x);if(n%2){printf $0" ";t=1;}else print $0}' file

主意：计算一行中"的数量。如果计数是奇数，则加入以下行，否则当前行将被视为完整行。

从csv保留行中删除换行符

3 个答案: