我一直在用FPAT高兴地使用gawk。这是我用于示例的脚本:
#!/usr/bin/gawk -f
BEGIN {
FPAT="([^,]*)|(\"[^\"]+\")"
}
{
for (i=1; i<=NF; i++) {
printf "Record #%s, field #%s: %s\n", NR, i, $i
}
}
运作良好。
$ echo 'a,b,c,d' | ./test.awk
Record #1, field #1: a
Record #1, field #2: b
Record #1, field #3: c
Record #1, field #4: d
运作良好。
$ echo '"a","b",c,d' | ./test.awk
Record #1, field #1: "a"
Record #1, field #2: "b"
Record #1, field #3: c
Record #1, field #4: d
运作良好。
$ echo '"a","b",,d' | ./test.awk
Record #1, field #1: "a"
Record #1, field #2: "b"
Record #1, field #3:
Record #1, field #4: d
运作良好。
$ echo '"""a"": aaa","b",,d' | ./test.awk
Record #1, field #1: """a"": aaa"
Record #1, field #2: "b"
Record #1, field #3:
Record #1, field #4: d
失败。
$ echo '"""a"": aaa,","b",,d' | ./test.awk
Record #1, field #1: """a"": aaa
Record #1, field #2: ","
Record #1, field #3: b"
Record #1, field #4:
Record #1, field #5: d
预期产出:
$ echo '"""a"": aaa,","b",,d' | ./test_that_would_be_working.awk
Record #1, field #1: """a"": aaa,"
Record #1, field #2: "b"
Record #1, field #4:
Record #1, field #5: d
FPAT的正则表达式是否会使这项工作成功,或者awk不支持这种正则表达式?
模式为"
,后跟除"
之外的任何内容。正则表达式类搜索一次只能处理一个字符,因此它不能不匹配""
。
我认为可能有一个选择,但是我不能很好地使它成功。
答案 0 :(得分:4)
因为awk的FPAT不知道外观,所以你需要明确你的模式。这个会做:
FPAT="[^,\"]*|\"([^\"]|\"\")*\""
说明:
[^,\"]* # match 0 or more times any character except , and "
| # OR
\" # match '"'
([^\"] # followed by 0 or more anything but '"'
| # OR
\"\" # '""'
)*
\" # ending with '"'
现在测试一下:
$ cat tst.awk
BEGIN {
FPAT="[^,\"]*|\"([^\"]|\"\")*\""
}
{
for (i=1; i<=NF; i++){ printf "Record #%s, field #%s: %s\n", NR, i, $i }
}
$ echo '"""a"": aaa,","b",,d' | awk -f tst.awk
Record #1, field #1: """a"": aaa,"
Record #1, field #2: "b"
Record #1, field #3:
Record #1, field #4: d