Ubuntu 16.04
GNU bash,版本4.3.48
我有一些由于""
而无法正确解析的csv文件,这些文件放置在表示英寸的字段中。
在我们的csv文件中,具有多个值的列必须用逗号分隔,然后该列必须用双引号引起来,如下所示:
"one","two","three, three, three, three","four","five"
外国""
... star","Radio data system,Radio: AM/FM 8"" Diagonal Color Touch Screen,Single Slot CD/MP3 Player, Nicer","Siera ...
... star","Rear Wheelhouse Liners,Thin Profile LED Fog Lamps,4.2"" Diagonal Color Display Driver Info Center,Chevrolet Connected Access","Chevrolet ...
我知道我可以用sed代替“”引号
sed -i 's/""/inch/g' filename.csv
但是当列不包含信息时,这会引起问题,例如:
... star","Program. Exp. 10/01/2018","","All Star Edition,LT Plus Package, somemore ...","Felix ...
因此,我正在寻找一种在双引号前加数字的方式。
答案 0 :(得分:1)
这样做:
line1='... star","Radio data system,Radio: AM/FM 8"" Diagonal Color Touch Screen,Single Slot CD/MP3 Player, Nicer","Siera ...'
line2='... star","Rear Wheelhouse Liners,Thin Profile LED Fog Lamps,4.2"" Diagonal Color Display Driver Info Center,Chevrolet Connected Access","Chevrolet ...'
line3='... star","Program. Exp. 10/01/2018","","All Star Edition,LT Plus Package, somemore ...","Felix ...'
echo $line1 | sed 's/\([0-9]\)""/\1inch/g'
echo $line2 | sed 's/\([0-9]\)""/\1inch/g'
echo $line3 | sed 's/\([0-9]\)""/\1inch/g'
\([0-9]\)
:0到9之间的任何数字。由于我们需要在替换中保留该数字,因此括号中存在。\1inch
:\ 1被替换为匹配部分中保留的数字“ inch”,这很明显;-)echo $line3 | sed 's/\([0-9]\)""/\1inch"/g'
答案 1 :(得分:0)
您不必(也不应该!)替换或删除这些嵌入的引号。第二个引号用于避免字段中的双引号。
以第一个示例为例:
"one","two","three, three, three, three","four","five"
假设我们要在第三个字段中插入"test"
,包括那些引号:
"one","two","three, "test", three, three, three","four","five"
这对于解析器来说是个问题。因此,这些引号必须用另一个引号转义:
"one","two","three, ""test"", three, three, three","four","five"
有关格式的更多详细信息,请参见rfc4180。
因此在您的csv文件中,数据是正确的(引号已正确转义):
,"Radio data system,Radio: AM/FM 8"" Diagonal",
您所要做的就是告诉解析器该字段被引用,并且(可选)嵌入的引号被另一个引号转义(某些系统使用\
来转义那些引号)。
在解析之前删除或替换这些对引号可能会引起各种问题和错误。