Question

我得到以下字符串：

 new Field("count").del("query_then_fetch");
 new Field("scan").del("query_then_fetch sorting on `_doc`");
 new Field("compress").del("no replacement, implemented at the codec level");
 new Field("compress_threshold").del("no replacement");
 new Field("filter").del("query");

我在命令行上运行以下脚本，其中正则表达式匹配双引号中的字符串。：

awk -F '.del' '{match($1, "\".*\"", a); match($2, "\".*\"", b)}END{print a[0]; print b[0]}'

期待这个产出之王：

"count" "query_then_fetch"
"scan" "query_then_fetch sorting on `_doc`"
"compress" "no replacement, implemented at the codec level"
"compress_threshold" "no replacement"
"filter" "query"

但我得到了这个输出：

"filter"
"query"

如何解决此问题？

Answer 1

cat sample.csv                                    
 new Field("count").del("query_then_fetch");
 new Field("scan").del("query_then_fetch sorting on `_doc`");
 new Field("compress").del("no replacement, implemented at the codec level");
 new Field("compress_threshold").del("no replacement");
 new Field("filter").del("query");

awk -F'"' -v q="\"" '{print q $2 q,q $4 q}' sample.csv  
"count" "query_then_fetch"
"scan" "query_then_fetch sorting on `_doc`"
"compress" "no replacement, implemented at the codec level"
"compress_threshold" "no replacement"
"filter" "query"

我使用双引号作为字段分隔符并打印出第2和第4个字段

Answer 2

您的awk脚本在处理所有输入结束时仅在END块中打印一次。

此时您在不同的行上打印a[0]和b[0]（因为您使用了两个print语句）。

使用当前的awk脚本，您希望在单个a[0]语句中打印b[0]和printf，同时处理每个线。

awk -F '.del' '{match($1, "\".*\"", a); match($2, "\".*\"", b); printf "%s %s\n",a[0], b[0]}' sample.csv

或者，您可以使用下面更简单的awk脚本，该脚本将输入分为(和)个字符。

awk -F '[()]' '{print $2,$4}' sample.csv

Answer 3

假设：

$ echo "$tgt" 
 new Field("count").del("query_then_fetch");
 new Field("scan").del("query_then_fetch sorting on `_doc`");
 new Field("compress").del("no replacement, implemented at the codec level");
 new Field("compress_threshold").del("no replacement");
 new Field("filter").del("query");

你可以这样做：

$ echo "$tgt" | awk  '{split($0, a, "\""); print a[2]"\t"a[4]}'
count   query_then_fetch
scan    query_then_fetch sorting on `_doc`
compress    no replacement, implemented at the codec level
compress_threshold  no replacement
filter  query

根据需要在字段周围添加引号。

或者，您可以这样做：

$ echo "$tgt" | awk  '{split($0, a, /[()]/); print a[2],a[4]}'
"count" "query_then_fetch"
"scan" "query_then_fetch sorting on `_doc`"
"compress" "no replacement, implemented at the codec level"
"compress_threshold" "no replacement"
"filter" "query"

使用awk在列上应用正则表达式

3 个答案: