逐字段比较两个分隔文件,找到缺失和不匹配的记录

时间:2017-07-17 13:47:18

标签: shell awk

两个输入文件,每个文件有3个字段。两个文件中的前两个字段必须匹配,并且必须比较第三个字段。

    File1
A ; 1 ; a1
B ; 2 ; b2
C ; 3 ; c3
A ; 4 ; a4


 File 2
B ; 2 ; b2
C ; 3 ; c5
E ; 5 ; e5

我想要输出如下。

Mismatching: 
C ; 3 ; c3

Lines missing in file1:
E ; 5 ; e5

Lines missing in file2: 
A ; 1 ; a1
A ; 4 ; a4

我还希望file1和file2中缺少记录。

我试过

awk 'BEGIN {FS = ";"} NR==FNR{a[$1,$2] = $3; next} (a[$1,$2] != $3)' file1 file2

但这只给了我file2中不存在于file1中的行。

1 个答案:

答案 0 :(得分:0)

$ awk -F';' '
   NR==FNR{a[$1","$2]=$0; next}

   $1","$2 in a{if(a[$1","$2] != $0)mm=mm $0 RS; delete a[$1","$2]; next}
   {nf=nf $0 RS}

   END{print "Mismatching:\n" mm;
       print "Lines missing in file1:"; for(i in a)print a[i];
       print "\nLines missing in file2:\n" nf}
   ' file2 file1
Mismatching:
C ; 3 ; c3

Lines missing in file1:
E ; 5 ; e5

Lines missing in file2:
A ; 1 ; a1
A ; 4 ; a4
  • $1","$2 in a如果在a中找到前两个字段
    • 如果a中的值与当前行不匹配,请将该行附加到变量mm(不匹配行)
    • a删除密钥,以便在最后一个没有调用的密钥时会给出缺失的行
  • nf=nf $0 RS如果在a中找不到密钥,那么我们会在传递给awk的第一个文件参数中找到未找到的行
  • END{...}根据需要打印


最好将代码保存在文件中并使用-f

进行调用
$ cat cmp.awk 
NR==FNR{a[$1","$2]=$0; next}

$1","$2 in a{if(a[$1","$2] != $0)mm=mm $0 RS; delete a[$1","$2]; next}
{nf=nf $0 RS}

END{print "Mismatching:\n" mm;
    print "Lines missing in file1:"; for(i in a)print a[i];
    print "\nLines missing in file2:\n" nf}

$ awk -F';' -f cmp.awk file2 file1