使用awk比较两个文件,如果匹配则打印

时间:2019-07-01 08:46:48

标签: awk

我有两个文件,我想比较File1的第1列和File2的第10列,如果匹配则应打印。我使用了此命令,但它仅显示File2的最后一行。

awk 'BEGIN{FS=OFS="|"}NR==FNR{a[$10]=$0;next}$1 in a {print a[$1],$0}' File2 File1

文件1:

003502|COMMUNICATE|Chat|MEGAMOBILE
003502|COMMUNICATE|News - Headlines|MEGAMOBILE
003502|Entertainment|Promos|MEGAMOBILE
003502|ENTERTAINMENT|Promos|MEGAMOBILE
003502|INFORMATION||MEGAMOBILE

文件2:

1000012640|Libre Aquarius||||By Time||Libre Aquarius subs|1000012640|003502|0|1
1000012587|HULA Aries||||By Time||HULA Aries subs|1000012587|003502|0|1

所需的输出:

1000012587|HULA Aries||||By Time||HULA Aries subs|1000012587|003502|0|1|003502|COMMUNICATE|Chat|MEGAMOBILE
1000012587|HULA Aries||||By Time||HULA Aries subs|1000012587|003502|0|1|003502|COMMUNICATE|News - Headlines|MEGAMOBILE
1000012587|HULA Aries||||By Time||HULA Aries subs|1000012587|003502|0|1|003502|Entertainment|Promos|MEGAMOBILE
1000012587|HULA Aries||||By Time||HULA Aries subs|1000012587|003502|0|1|003502|ENTERTAINMENT|Promos|MEGAMOBILE
1000012587|HULA Aries||||By Time||HULA Aries subs|1000012587|003502|0|1|003502|INFORMATION||MEGAMOBILE
1000012640|Libre Aquarius||||By Time||Libre Aquarius subs|1000012640|003502|0|1|003502|COMMUNICATE|Chat|MEGAMOBILE
1000012640|Libre Aquarius||||By Time||Libre Aquarius subs|1000012640|003502|0|1|003502|COMMUNICATE|News - Headlines|MEGAMOBILE
1000012640|Libre Aquarius||||By Time||Libre Aquarius subs|1000012640|003502|0|1|003502|Entertainment|Promos|MEGAMOBILE
1000012640|Libre Aquarius||||By Time||Libre Aquarius subs|1000012640|003502|0|1|003502|ENTERTAINMENT|Promos|MEGAMOBILE
1000012640|Libre Aquarius||||By Time||Libre Aquarius subs|1000012640|003502|0|1|003502|INFORMATION||MEGAMOBILE

2 个答案:

答案 0 :(得分:3)

有一个特殊的bash命令可以完成这项工作:[join] [1]

我建议您使用它而不是awk,因为这样可以提高内存效率。

正如@EdMorton所说:

  

join要求两个输入文件都必须在join字段中排序

join -t"|" -1 10 -2 1 <(sort -t"|" -k10 -n file2) <(sort -t"|" -k1 -n file1)

给予

003502|1000012587|HULA Aries||||By Time||HULA Aries subs|1000012587|0|1|COMMUNICATE|Chat|MEGAMOBILE
003502|1000012587|HULA Aries||||By Time||HULA Aries subs|1000012587|0|1|COMMUNICATE|News - Headlines|MEGAMOBILE
003502|1000012587|HULA Aries||||By Time||HULA Aries subs|1000012587|0|1|ENTERTAINMENT|Promos|MEGAMO
003502|1000012587|HULA Aries||||By Time||HULA Aries subs|1000012587|0|1|Entertainment|Promos|MEGAMOBILE
003502|1000012640|Libre Aquarius||||By Time||Libre Aquarius subs|1000012640|0|1|COMMUNICATE|Chat|MEGAMOBILE
003502|1000012640|Libre Aquarius||||By Time||Libre Aquarius subs|1000012640|0|1|COMMUNICATE|News - Headlines|MEGAMOBILE
003502|1000012640|Libre Aquarius||||By Time||Libre Aquarius subs|1000012640|0|1|ENTERTAINMENT|Promos|MEGAMO
003502|1000012640|Libre Aquarius||||By Time||Libre Aquarius subs|1000012640|0|1|Entertainment|Promos|MEGAMOBILE

答案 1 :(得分:2)

由于您有重复的密钥,因此您应该跟踪这些密钥。

<div class="landscape gatsby-image-wrapper"></div>
<div class="portrait gatsby-image-wrapper"></div>
<div class="portrait gatsby-image-wrapper"></div>
<div class="square gatsby-image-wrapper"></div>
<div class="landscape gatsby-image-wrapper"></div>

在上面,数组awk 'BEGIN{FS=OFS="|"} (NR==FNR) { c[$1]++; a[$1,c[$1]]=$0; next } ($10 in c) { for(i=1;i<=c[$10];++i) print $0,a[$10,i] }' file1 file2 跟踪我们遇到键c的次数。然后,将这些条目存储在由$1和序列号a索引的数组$1中。在读取c[$1]时,我们检查键file2是否在原始数组$10中,如果是,则按顺序处理所有存储的值。

此外,由于预期的输出,我们不得不还原文件顺序。