我有一个导出的CSV,有些行在记录中间有一个换行符(ASCII 012)。我需要用空格替换它,但保留每条记录的新行以加载它。
大部分线路都很好,但是很少有这样的线路:
输入:
10 , ,"2007-07-30 13.26.21.598000" ,1922 ,0 , , , ,"Special Needs List Rows updated :
Row 1 : Instruction: other :Comment: pump runs all of the water for the insd's home" ,10003 ,524 ,"cc:2023" , , ,2023 , , ,"CCR" ,"INSERT" ,"2011-12-03 01.25.39.759555" ,"2011-12-03 01.25.39.759555"
输出:
10 , ,"2007-07-30 13.26.21.598000" ,1922 ,0 , , , ,"Special Needs List Rows updated :Row 1 : Instruction: other :Comment: pump runs all of the water for the insd's home" ,10003 ,524 ,"cc:2023" , , ,2023 , , ,"CCR" ,"INSERT" ,"2011-12-03 01.25.39.759555" ,"2011-12-03 01.25.39.759555"
我一直在研究Awk但是真的无法理解如何保留实际的行。
另一个例子:
输入:
9~~"2007-08-01 16.14.45.099000"~2215~0~~~~"Exposure closed (Unnecessary) : Garage door working
Claim Withdrawn"~~701~"cc:6007"~~564~6007~~~"CCR"~"INSERT"~"2011-12-03 01.25.39.759555"~"2011-12-03 01.25.39.759555"
4~~"2007-08-01 16.14.49.333000"~1923~0~~~~"Assigned to user Leanne Hamshere in group GIO Home Processing (Team 3)"~~912~"cc:6008"~~~6008~~~"CCR"~"INSERT"~"2011-12-03 01.25.39.759555"~"2011-12-03 01.25.39.759555"
输出:
9~~"2007-08-01 16.14.45.099000"~2215~0~~~~"Exposure closed (Unnecessary) : Garage door working Claim Withdrawn"~~701~"cc:6007"~~564~6007~~~"CCR"~"INSERT"~"2011-12-03 01.25.39.759555"~"2011-12-03 01.25.39.759555"
4~~"2007-08-01 16.14.49.333000"~1923~0~~~~"Assigned to user Leanne Hamshere in group GIO Home Processing (Team 3)"~~912~"cc:6008"~~~6008~~~"CCR"~"INSERT"~"2011-12-03 01.25.39.759555"~"2011-12-03 01.25.39.759555"
答案 0 :(得分:4)
使用GNU awk
的一种方式:
awk -f script.awk file.txt
script.awk
的内容:
BEGIN {
FS = "[,~]"
}
NF < 21 {
line = (line ? line OFS : line) $0
fields = fields + NF
}
fields >= 21 {
print line
line=""
fields=0
}
NF == 21 {
print
}
或者,你可以使用这个单行:
awk -F "[,~]" 'NF < 21 { line = (line ? line OFS : line) $0; fields = fields + NF } fields >= 21 { print line; line=""; fields=0 } NF == 21 { print }' file.txt
说明:
我对你的预期输出做了一个观察:似乎每一行应该包含正好21个字段。因此,如果您的行包含少于21个字段,请存储该行并存储字段数。当我们循环到下一行时,该行将以空格连接到存储的行,并且总计的字段数。如果此字段数大于或等于21(虚线的字段总和将添加到22),则打印存储的行。否则,如果该行包含21个字段(NF == 21),则将其打印出来。 HTH。
答案 1 :(得分:2)
我认为sed
是您的选择。我假设所有记录以非冒号字符结尾,因此如果一行以冒号结尾,则将其识别为异常并且应该连接到前一行。
以下是代码:
cat data | sed -e '/[^"]$/N' -e 's/\n//g'
第一次执行-e '/[^"]$/N'
匹配异常情况,并在不清空缓冲区的情况下读入下一条记录。然后-e 's/\n//g'
删除换行符。
答案 2 :(得分:2)
试试这个单行:
awk '{if(t){print;t=0;next;}x=$0;n=gsub(/"/,"",x);if(n%2){printf $0" ";t=1;}else print $0}' file
主意:
计算一行中"
的数量。如果计数是奇数,则加入以下行,否则当前行将被视为完整行。