Question

我想比较两个文件的前两列，如果匹配则需要打印否则否。

input.txt中

123,apple,type1
123,apple,type2
456,orange,type1
6567,kiwi,type2
333,banana,type1
123,apple,type2

qualified.txt

123,apple,type4
6567,kiwi,type2

output.txt的

123,apple,type1,yes
123,apple,type2,yes
456,orange,type1,no
6567,kiwi,type2,yes
333,banana,type1,no
123,apple,type2,yes

我使用下面的命令来分割数据，然后我会根据结果再添加一列。

现在input.txt有重复（第1列）所以下面的方法不起作用，文件大小也很大。

我们可以在awk一个班轮中获取output.txt吗？

comm -2 -3 input.txt qualified.txt

Answer 1

您可以使用awk逻辑，如下所示。不知道为什么你会提到单行awk命令。

awk -v FS="," -v OFS="," 'FNR==NR{map[$1]=$2;next} {if($1 in map == 0) {$0=$0FS"no"} else {$0=$0FS"yes"}}1' qualified.txt input.txt

123,apple,type1,yes
123,apple,type2,yes
456,orange,type1,no
6567,kiwi,type2,yes
333,banana,type1,no
123,apple,type2,yes

逻辑是

命令FNR==NR解析第一个文件qualified.txt并将条目存储在第一个文件中的1和2列中，第一列是索引。
然后，对于第二个文件{if($1 in map == 0) {$0=$0FS"no"} else {$0=$0FS"yes"}}1中的每一行，第1列中的条目与数组不匹配，请附加no字符串，否则添加yes。
-v FS="," -v OFS=","用于设置输入和输出字段分隔符

Answer 2

ar & str;

说明：

$ awk -F, 'NR==FNR {a[$1 FS $2];next} {print $0 FS (($1 FS $2) in a?"yes":"no")}' qual input
123,apple,type1,yes
123,apple,type2,yes
456,orange,type1,no
6567,kiwi,type2,yes
333,banana,type1,no
123,apple,type2,yes

无需重新定义NR==FNR { # for the first file a[$1 FS $2];next # aknowledge the existance of qualified 1st and 2nd field pairs } { print $0 FS ($1 FS $2 in a?"yes":"no") # output input row and "yes" or "no" } # depending on whether key found in array a，因为OFS未经过修改且无法重建。

Answer 3

看起来你只需要：

awk 'BEGIN{FS=OFS=","} NR==FNR{a[$1];next} {print $0, ($1 in a ? "yes" : "no")}' qualified.txt output.txt

比较两列不同的文件，如果匹配则添加新列

3 个答案: