awk匹配/比较/每个文件2列(用于2个文件)

时间:2018-08-06 13:44:15

标签: join awk compare match multiple-columns

在论坛上有多个发现之后,我仍然坚持打印2个文件的期望输出

我想将file1与file2匹配,并根据每个文件的第一列和第二列将它们组合成1个文件。 (两个文件中的行均未排序);文件1中的第5列和文件2中的第3列不是匹配的键(但如果必须,也可以将其用作选项)

不确定最好的方法是否是通过2 diff的awk循环执行。文件夹,其中一个文件夹中的文件名为SITEA; SITEB; SITEC等。文件名来自file1的文件,第二个文件夹中的文件名包含FILENAME信息,例如file2。

如果每个文件中都包含不匹配项以打印单词EMPTY并将其添加到所需的输出文件,则不确定是否可以在其中添加。

文件1

SITEA 222 dummy dummy x8a7sdf dummyvalues dummyvalues
SITEA 357 dummy dummy x11x683 dummyvalues dummyvalues
SITEA 357 dummy dummy x11x69b dummyvalues dummyvalues
SITEA 357 dummy dummy x11x69d dummyvalues dummyvalues
SITEC 200 dummy dummy x11xdc1 dummyvalues dummyvalues
SITEA 357 dummy dummy x11x6bc dummyvalues dummyvalues
SITEA 200 dummy dummy x11x305 dummyvalues dummyvalues
SITEA 200 dummy dummy x11x323 dummyvalues dummyvalues
SITEA 357 dummy dummy x11x693 dummyvalues dummyvalues
SITEA 200 dummy dummy x11x300 dummyvalues dummyvalues
SITEB 357 dummy dummy x11x680 dummyvalues dummyvalues
SITEB 357 dummy dummy x11x688 dummyvalues dummyvalues
SITEA 151 dummy dummy x87f777 dummyvalues dummyvalues
SITEB 357 dummy dummy x11x68c dummyvalues dummyvalues
SITEA 200 dummy dummy x11x33b dummyvalues dummyvalues
SITEB 200 dummy dummy x11xf37 dummyvalues dummyvalues
SITEB 357 dummy dummy x11x68e dummyvalues dummyvalues
SITEB 357 dummy dummy x11x694 dummyvalues dummyvalues
SITEB 357 dummy dummy x11x6a5 dummyvalues dummyvalues
SITED 200 dummy dummy x11xdc0 dummyvalues dummyvalues
SITEB 200 dummy dummy x11xf36 dummyvalues dummyvalues
SITEB 200 dummy dummy x11xffd dummyvalues dummyvalues
SITEA 200 dummy dummy x11x306 dummyvalues dummyvalues
SITEA 200 dummy dummy x11x307 dummyvalues dummyvalues
SITEA 200 dummy dummy x11x325 dummyvalues dummyvalues
SITEB 357 dummy dummy x11x686 dummyvalues dummyvalues
SITEA 357 dummy dummy x11x680 dummyvalues dummyvalues
SITEA 200 dummy dummy x11x33c dummyvalues dummyvalues
SITEA 357 dummy dummy x11x6be dummyvalues dummyvalues
SITEA 357 dummy dummy x11x6ba dummyvalues dummyvalues
SITEB 200 dummy dummy x11xffe dummyvalues dummyvalues
SITEA 200 dummy dummy x11x33e dummyvalues dummyvalues
SITEB 200 dummy dummy x11xf00 dummyvalues dummyvalues
SITEB 357 dummy dummy x11x696 dummyvalues dummyvalues
SITEB 200 dummy dummy x11xf1c dummyvalues dummyvalues
SITEB 200 dummy dummy x11xf1e dummyvalues dummyvalues
SITEB 357 dummy dummy x11x69a dummyvalues dummyvalues
SITEB 200 dummy dummy x11xf34 dummyvalues dummyvalues
SITEB 200 dummy dummy x11xf35 dummyvalues dummyvalues
SITEB 200 dummy dummy x11xfff dummyvalues dummyvalues
SITEA 357 dummy dummy x11x681 dummyvalues dummyvalues
SITEA 200 dummy dummy x11x33d dummyvalues dummyvalues
SITEA 200 dummy dummy x11x33d dummyvalues dummyvalues
SITEA 100 dummy dummy x11x33d dummyvalues dummyvalues

文件2

SITEA 357 x11x683 dummyvalues dummyvalues dummyvalues
SITEA 200 x11x33b dummyvalues dummyvalues dummyvalues
SITEA 357 x11x693 dummyvalues dummyvalues dummyvalues
SITEA 357 x11x680 dummyvalues dummyvalues dummyvalues
SITEA 357 x11x69b dummyvalues dummyvalues dummyvalues
SITEB 357 x11x686 dummyvalues dummyvalues dummyvalues
SITEB 357 x11x6a5 dummyvalues dummyvalues dummyvalues
SITEA 357 x11x69d dummyvalues dummyvalues dummyvalues
SITEB 200 x11xffd dummyvalues dummyvalues dummyvalues
SITEA 357 x11x6ba dummyvalues dummyvalues dummyvalues
SITEB 357 x11x680 dummyvalues dummyvalues dummyvalues
SITEB 200 x11xf1c dummyvalues dummyvalues dummyvalues
SITEB 357 x11x68e dummyvalues dummyvalues dummyvalues
SITEB 357 x11x69a dummyvalues dummyvalues dummyvalues
SITEA 357 x11x681 dummyvalues dummyvalues dummyvalues
SITEA 200 x11x33c dummyvalues dummyvalues dummyvalues
SITEB 357 x11x694 dummyvalues dummyvalues dummyvalues
SITEB 357 x11x696 dummyvalues dummyvalues dummyvalues
SITEC 200 x11xdc1 dummyvalues dummyvalues dummyvalues
SITEA 357 x11x6bc dummyvalues dummyvalues dummyvalues
SITEB 200 x11xf37 dummyvalues dummyvalues dummyvalues
SITEA 200 x11x325 dummyvalues dummyvalues dummyvalues
SITED 200 x11xdc0 dummyvalues dummyvalues dummyvalues
SITEB 200 x11xf00 dummyvalues dummyvalues dummyvalues
SITEB 200 x11xf36 dummyvalues dummyvalues dummyvalues
SITEA 357 x11x6be dummyvalues dummyvalues dummyvalues
SITEA 200 x11x33d dummyvalues dummyvalues dummyvalues
SITEA 200 x11x305 dummyvalues dummyvalues dummyvalues
SITEB 357 x11x688 dummyvalues dummyvalues dummyvalues
SITEA 200 x11x33e dummyvalues dummyvalues dummyvalues
SITEB 200 x11xffe dummyvalues dummyvalues dummyvalues
SITEA 200 x11x300 dummyvalues dummyvalues dummyvalues
SITEB 200 x11xfff dummyvalues dummyvalues dummyvalues
SITEB 200 x11xf1e dummyvalues dummyvalues dummyvalues
SITEA 200 x11x306 dummyvalues dummyvalues dummyvalues
SITEA 200 x11x307 dummyvalues dummyvalues dummyvalues
SITEB 200 x11xf35 dummyvalues dummyvalues dummyvalues
SITEB 357 x11x68c dummyvalues dummyvalues dummyvalues
SITEB 200 x11xf34 dummyvalues dummyvalues dummyvalues
SITEA 200 x11x323 dummyvalues dummyvalues dummyvalues
SITEB 45 a8d7f99 dummyvalues dummyvalues dummyvalues
SITEB 008 8sd7f77 dummyvalues dummyvalues dummyvalues

所需的输出:

SITEA 357 dummy dummy x11x683 dummyvalues dummyvalues dummyvalues SITEA 357 x11x683 dummyvalues dummyvalues dummyvalues
SITEA 357 dummy dummy x11x69b dummyvalues dummyvalues dummyvalues SITEA 357 x11x69b dummyvalues dummyvalues dummyvalues
SITEA 357 dummy dummy x11x69d dummyvalues dummyvalues dummyvalues SITEA 357 x11x69d dummyvalues dummyvalues dummyvalues
SITEC 200 dummy dummy x11xdc1 dummyvalues dummyvalues dummyvalues SITEC 200 x11xdc1 dummyvalues dummyvalues dummyvalues
SITEA 357 dummy dummy x11x6bc dummyvalues dummyvalues dummyvalues SITEA 357 x11x6bc dummyvalues dummyvalues dummyvalues
SITEA 200 dummy dummy x11x305 dummyvalues dummyvalues dummyvalues SITEA 200 x11x305 dummyvalues dummyvalues dummyvalues
SITEA 200 dummy dummy x11x323 dummyvalues dummyvalues dummyvalues SITEA 200 x11x323 dummyvalues dummyvalues dummyvalues
SITEA 357 dummy dummy x11x693 dummyvalues dummyvalues dummyvalues SITEA 357 x11x693 dummyvalues dummyvalues dummyvalues
SITEA 200 dummy dummy x11x300 dummyvalues dummyvalues dummyvalues SITEA 200 x11x300 dummyvalues dummyvalues dummyvalues
SITEB 357 dummy dummy x11x680 dummyvalues dummyvalues dummyvalues SITEB 357 x11x680 dummyvalues dummyvalues dummyvalues
SITEB 357 dummy dummy x11x688 dummyvalues dummyvalues dummyvalues SITEB 357 x11x688 dummyvalues dummyvalues dummyvalues
SITEB 357 dummy dummy x11x68c dummyvalues dummyvalues dummyvalues SITEB 357 x11x68c dummyvalues dummyvalues dummyvalues
SITEA 200 dummy dummy x11x33b dummyvalues dummyvalues dummyvalues SITEA 200 x11x33b dummyvalues dummyvalues dummyvalues
SITEB 200 dummy dummy x11xf37 dummyvalues dummyvalues dummyvalues SITEB 200 x11xf37 dummyvalues dummyvalues dummyvalues
SITEB 357 dummy dummy x11x68e dummyvalues dummyvalues dummyvalues SITEB 357 x11x68e dummyvalues dummyvalues dummyvalues
SITEB 357 dummy dummy x11x694 dummyvalues dummyvalues dummyvalues SITEB 357 x11x694 dummyvalues dummyvalues dummyvalues
SITEB 357 dummy dummy x11x6a5 dummyvalues dummyvalues dummyvalues SITEB 357 x11x6a5 dummyvalues dummyvalues dummyvalues
SITED 200 dummy dummy x11xdc0 dummyvalues dummyvalues dummyvalues SITED 200 x11xdc0 dummyvalues dummyvalues dummyvalues
SITEB 200 dummy dummy x11xf36 dummyvalues dummyvalues dummyvalues SITEB 200 x11xf36 dummyvalues dummyvalues dummyvalues
SITEB 200 dummy dummy x11xffd dummyvalues dummyvalues dummyvalues SITEB 200 x11xffd dummyvalues dummyvalues dummyvalues
SITEA 200 dummy dummy x11x306 dummyvalues dummyvalues dummyvalues SITEA 200 x11x306 dummyvalues dummyvalues dummyvalues
SITEA 200 dummy dummy x11x307 dummyvalues dummyvalues dummyvalues SITEA 200 x11x307 dummyvalues dummyvalues dummyvalues
SITEA 200 dummy dummy x11x325 dummyvalues dummyvalues dummyvalues SITEA 200 x11x325 dummyvalues dummyvalues dummyvalues
SITEB 357 dummy dummy x11x686 dummyvalues dummyvalues dummyvalues SITEB 357 x11x686 dummyvalues dummyvalues dummyvalues
SITEA 357 dummy dummy x11x680 dummyvalues dummyvalues dummyvalues SITEA 357 x11x680 dummyvalues dummyvalues dummyvalues
SITEA 200 dummy dummy x11x33c dummyvalues dummyvalues dummyvalues SITEA 200 x11x33c dummyvalues dummyvalues dummyvalues
SITEA 357 dummy dummy x11x6be dummyvalues dummyvalues dummyvalues SITEA 357 x11x6be dummyvalues dummyvalues dummyvalues
SITEA 357 dummy dummy x11x6ba dummyvalues dummyvalues dummyvalues SITEA 357 x11x6ba dummyvalues dummyvalues dummyvalues
SITEB 200 dummy dummy x11xffe dummyvalues dummyvalues dummyvalues SITEB 200 x11xffe dummyvalues dummyvalues dummyvalues
SITEA 200 dummy dummy x11x33e dummyvalues dummyvalues dummyvalues SITEA 200 x11x33e dummyvalues dummyvalues dummyvalues
SITEB 200 dummy dummy x11xf00 dummyvalues dummyvalues dummyvalues SITEB 200 x11xf00 dummyvalues dummyvalues dummyvalues
SITEB 357 dummy dummy x11x696 dummyvalues dummyvalues dummyvalues SITEB 357 x11x696 dummyvalues dummyvalues dummyvalues
SITEB 200 dummy dummy x11xf1c dummyvalues dummyvalues dummyvalues SITEB 200 x11xf1c dummyvalues dummyvalues dummyvalues
SITEB 200 dummy dummy x11xf1e dummyvalues dummyvalues dummyvalues SITEB 200 x11xf1e dummyvalues dummyvalues dummyvalues
SITEB 357 dummy dummy x11x69a dummyvalues dummyvalues dummyvalues SITEB 357 x11x69a dummyvalues dummyvalues dummyvalues
SITEB 200 dummy dummy x11xf34 dummyvalues dummyvalues dummyvalues SITEB 200 x11xf34 dummyvalues dummyvalues dummyvalues
SITEB 200 dummy dummy x11xf35 dummyvalues dummyvalues dummyvalues SITEB 200 x11xf35 dummyvalues dummyvalues dummyvalues
SITEB 200 dummy dummy x11xfff dummyvalues dummyvalues dummyvalues SITEB 200 x11xfff dummyvalues dummyvalues dummyvalues
SITEA 357 dummy dummy x11x681 dummyvalues dummyvalues dummyvalues SITEA 357 x11x681 dummyvalues dummyvalues dummyvalues
SITEA 200 dummy dummy x11x33d dummyvalues dummyvalues dummyvalues SITEA 200 x11x33d dummyvalues dummyvalues dummyvalues
EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY SITEB 45 a8d7f99 dummyvalues dummyvalues dummyvalues
EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY SITEB 008 8sd7f77 dummyvalues dummyvalues dummyvalues
SITEA 151 dummy dummy x87f777 dummyvalues dummyvalues EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY
SITEA 222 dummy dummy x8a7sdf dummyvalues dummyvalues EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY
SITEA 200 dummy dummy x11x33d dummyvalues dummyvalues  EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY
SITEA 100 dummy dummy x11x33d dummyvalues dummyvalues  EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY EMPTY

谢谢

只有在我要交换file1中的列并将第5列放在第3列旁边的情况下,我才部分使用此代码。

awk 'NR==FNR{
  if(FNR==1){print}
  a[$1 $2 $3]=$0
  next
}
a[$1 $2 $3]!=$0 && a[$1 $2 $3]!=""{
  print a[$1 $2 $3],$0
}'  

但它也不显示不匹配的行

1 个答案:

答案 0 :(得分:0)

像这样吗?

awk 'NR==FNR {a[$1,$2,$5]=$0; next}
             {if(($1,$2,$3) in a) 
                {print a[$1,$2,$3],$0; delete a[$1,$2,$3]}
              else print "EMPTY",$0} 
     END     {for(k in a) print a[k], "EMPTY"} file1 file2

第一个文件具有$ 5的密钥部分,在存储值时使用它。

如果您知道两个文件中的字段数,请添加正确数量的“ EMPTY”填充符,否则也可以对其进行编码。为了简单起见,这里我省略了。

file2将指示输出的顺序。对于file1中的条目,但file2中缺少的条目将不会保留顺序,如果需要,则需要额外的逻辑。