如何在awk中解析没有两个空白字段的csv文件?

时间:2017-02-21 12:20:35

标签: csv awk

csv结构如下所示。

"field1","field2","field3,with,commas","field4",          

Ther是csv文件中的四个字段 第一个:field1
第二个:field2
第三个:field3,用,逗号
第四个:field4

这是我对awk的正则表达式。

 '^"|","|",$' 

debian8@hwy:~$ echo '"field1","field2","field3,with,commas","field4",' |awk -F '^"|","|",$' '{print NF}'
6
debian8@hwy:~$ echo '"field1","field2","field3,with,commas","field4",' |awk -F '^"|","|",$' '{print $1}'

debian8@hwy:~$ echo '"field1","field2","field3,with,commas","field4",' |awk -F '^"|","|",$' '{print $2}'
field1
debian8@hwy:~$ echo '"field1","field2","field3,with,commas","field4",' |awk -F '^"|","|",$' '{print $3}'
field2
debian8@hwy:~$ echo '"field1","field2","field3,with,commas","field4",' |awk -F '^"|","|",$' '{print $4}'
field3,with,commas
debian8@hwy:~$ echo '"field1","field2","field3,with,commas","field4",' |awk -F '^"|","|",$' '{print $5}'
field4
debian8@hwy:~$ echo '"field1","field2","field3,with,commas","field4",' |awk -F '^"|","|",$' '{print $6}'

我的正则表达式中存在两个问题'^“|”,“|”,$'

1.4 fiels被'^“|”,“|”,$'解析为6个字段。
2. $ 1和$ 6被解析为空白。

如何编写正则表达式格式来制作:

echo '"field1","field2","field3,with,commas","field4",' |awk -F format '{print NF}'
4
debian8@hwy:~$ echo '"field1","field2","field3,with,commas","field4",' |awk -F  format '{print $1}'
field1
debian8@hwy:~$ echo '"field1","field2","field3,with,commas","field4",' |awk -F format '{print $2}'
field2
debian8@hwy:~$ echo '"field1","field2","field3,with,commas","field4",' |awk -F foramt '{print $3}'
field3,with,commas
debian8@hwy:~$ echo '"field1","field2","field3,with,commas","field4",' |awk -F format '{print $4}'
field4

2 个答案:

答案 0 :(得分:2)

解决方法可能是将FS设置为",",并在每条记录的开头和结尾使用gsub字符删除:

echo '"field1","field2","field3,with,commas","field4",' | awk -v FS='","' '{gsub(/^"|",$/, ""); print NF, $1, $2, $3, $4}'
4 field1 field2 field3,with,commas field4

答案 1 :(得分:0)

我认为FPAT变量可能就是你想要的。请查看文档和示例in the Users Guide