csv结构如下所示。
"field1","field2","field3,with,commas","field4",
Ther是csv文件中的四个字段
第一个:field1
第二个:field2
第三个:field3,用,逗号
第四个:field4
这是我对awk的正则表达式。
'^"|","|",$'
debian8@hwy:~$ echo '"field1","field2","field3,with,commas","field4",' |awk -F '^"|","|",$' '{print NF}'
6
debian8@hwy:~$ echo '"field1","field2","field3,with,commas","field4",' |awk -F '^"|","|",$' '{print $1}'
debian8@hwy:~$ echo '"field1","field2","field3,with,commas","field4",' |awk -F '^"|","|",$' '{print $2}'
field1
debian8@hwy:~$ echo '"field1","field2","field3,with,commas","field4",' |awk -F '^"|","|",$' '{print $3}'
field2
debian8@hwy:~$ echo '"field1","field2","field3,with,commas","field4",' |awk -F '^"|","|",$' '{print $4}'
field3,with,commas
debian8@hwy:~$ echo '"field1","field2","field3,with,commas","field4",' |awk -F '^"|","|",$' '{print $5}'
field4
debian8@hwy:~$ echo '"field1","field2","field3,with,commas","field4",' |awk -F '^"|","|",$' '{print $6}'
我的正则表达式中存在两个问题'^“|”,“|”,$'。
1.4 fiels被'^“|”,“|”,$'解析为6个字段。
2. $ 1和$ 6被解析为空白。
如何编写正则表达式格式来制作:
echo '"field1","field2","field3,with,commas","field4",' |awk -F format '{print NF}'
4
debian8@hwy:~$ echo '"field1","field2","field3,with,commas","field4",' |awk -F format '{print $1}'
field1
debian8@hwy:~$ echo '"field1","field2","field3,with,commas","field4",' |awk -F format '{print $2}'
field2
debian8@hwy:~$ echo '"field1","field2","field3,with,commas","field4",' |awk -F foramt '{print $3}'
field3,with,commas
debian8@hwy:~$ echo '"field1","field2","field3,with,commas","field4",' |awk -F format '{print $4}'
field4
答案 0 :(得分:2)
解决方法可能是将FS
设置为","
,并在每条记录的开头和结尾使用gsub
字符删除:
echo '"field1","field2","field3,with,commas","field4",' | awk -v FS='","' '{gsub(/^"|",$/, ""); print NF, $1, $2, $3, $4}'
4 field1 field2 field3,with,commas field4
答案 1 :(得分:0)
我认为FPAT
变量可能就是你想要的。请查看文档和示例in the Users Guide