Question

美好的一天，

我一直试图找到comparing 2 files using awk中提出的类似问题的解决方案，但我似乎无法理解它。寻求一些帮助。

我有两个我想要比较的文件。 file1和file2的模拟内容如下所示：

文件1：

50       0004312805201        06740         2310821                                                                                                                                
50      0004986504201        00845         2310837                                                                                                                                
50      0003913155201        47679         2310762                                                                                                                                
50      0004997395201        2035          2311180                                                                                                                                
50      0001147242201        15000         23108723                                                                                                                                
50      0005771878201        13545         I3840000

file2的：

0003913155 A

0005771878 A

0004312805 A

0000000015 B

0000000012 B

1111111111 E

我需要在file1上对field2执行substring以生成10 character length searchable key value，并在file2的field1中找到匹配值。

如果找到匹配项，则打印出整个file1行，其中file2中的field2作为新字段附加。

如果不匹配，则打印出整个file1行，并附加字符串“NO”作为新字段。输出最好重定向到文件。

示例输出如下所示。

输出：

50       0004312805201        06740         2310821 A                                                                                                                               
50      0004986504201        00845         2310837 NO                                                                                                                             
50      0003913155201        47679         2310762 A                                                                                                                               
50      0004997395201        2035          2311180 NO                                                                                                                             
50      0001147242201        15000         23108723 NO                                                                                                                               
50      0005771878201        13545         I3840000 A

你们怎么建议我通过awk或GNU-awk解决这个问题？在准备可搜索的键子字符串时遇到问题，并在awk/GNU-awk中使用它来构建数组。

非常感谢任何帮助。我现在正在转动轮子。

感谢。

Answer 1

awk '
     FNR==NR{ a[$1]=$2; next }
     { s=substr($2,1,10); print $0,(s in a ?a[s]:"No") }
    ' file2 file1 > your_output_file

输入：

$ cat file1
50 0004312805201 06740 2310821
50 0004986504201 00845 2310837
50 0003913155201 47679 2310762
50 0004997395201 2035 2311180
50 0001147242201 15000 23108723
50 0005771878201 13545 I3840000 

$ cat file2
0003913155 A
0005771878 A
0004312805 A
0000000015 B
0000000012 B
1111111111 E

输出

$ awk 'FNR==NR{a[$1]=$2;next}{s=substr($2,1,10);print $0, (s in a ? a[s] : "No") }' file2 file1
50 0004312805201 06740 2310821 A
50 0004986504201 00845 2310837 No
50 0003913155201 47679 2310762 A
50 0004997395201 2035 2311180 No
50 0001147242201 15000 23108723 No
50 0005771878201 13545 I3840000  A

Answer 2

不确定OP对produce a 10 character length searchable key value的意义。我将其解释为： file2的字段1中的值必须是file1 中字段2的子字符串。

$ cat tst.awk
/^[0-9]/ && NR==FNR { a[$1]=$2; next }   # read values from file2 in array
/^[0-9]/{
   f=0;
   for (i in a){                         # loop over field 1 of file2
      if (index($2, i)){                 # if i can be found in field 2 of file1
         print $0, a[i];                 # print $0 with $2 from file2
         f++;
         break;
      }
   }
}
/^[0-9]/ && !f{ print $0, "NO" }         # if no match, print "NO" line

输入

$ cat file1
50 0004312805201 06740 2310821
50 0004986504201 00845 2310837
50 0003913155201 47679 2310762
50 0004997395201 2035 2311180
50 0001147242201 15000 23108723
50 0005771878201 13545 I3840000

和

$ cat file2
0003913155 A

0005771878 A

0004312805 A

0000000015 B

0000000012 B

1111111111 E

调用tst.awk将生成输出：

$ awk -f tst.awk file2 file1
50 0004312805201 06740 2310821 A
50 0004986504201 00845 2310837 NO
50 0003913155201 47679 2310762 A
50 0004997395201 2035 2311180 NO
50 0001147242201 15000 23108723 NO
50 0005771878201 13545 I3840000 A

或者，使用oneliner：

$ awk '/^[0-9]/ && NR==FNR { a[$1]=$2; next } /^[0-9]/{f=0;for (i in a){if (index($2, i)){print $0, a[i];f++;break;}}}/^[0-9]/ && !f{ print $0, "NO" }' file2 file1

将2个文件与可搜索的密钥进行比较

2 个答案: