我有两个文件,我想比较File1的第1列和File2的第10列,如果匹配则应打印。我使用了此命令,但它仅显示File2的最后一行。
awk 'BEGIN{FS=OFS="|"}NR==FNR{a[$10]=$0;next}$1 in a {print a[$1],$0}' File2 File1
文件1:
003502|COMMUNICATE|Chat|MEGAMOBILE
003502|COMMUNICATE|News - Headlines|MEGAMOBILE
003502|Entertainment|Promos|MEGAMOBILE
003502|ENTERTAINMENT|Promos|MEGAMOBILE
003502|INFORMATION||MEGAMOBILE
文件2:
1000012640|Libre Aquarius||||By Time||Libre Aquarius subs|1000012640|003502|0|1
1000012587|HULA Aries||||By Time||HULA Aries subs|1000012587|003502|0|1
所需的输出:
1000012587|HULA Aries||||By Time||HULA Aries subs|1000012587|003502|0|1|003502|COMMUNICATE|Chat|MEGAMOBILE
1000012587|HULA Aries||||By Time||HULA Aries subs|1000012587|003502|0|1|003502|COMMUNICATE|News - Headlines|MEGAMOBILE
1000012587|HULA Aries||||By Time||HULA Aries subs|1000012587|003502|0|1|003502|Entertainment|Promos|MEGAMOBILE
1000012587|HULA Aries||||By Time||HULA Aries subs|1000012587|003502|0|1|003502|ENTERTAINMENT|Promos|MEGAMOBILE
1000012587|HULA Aries||||By Time||HULA Aries subs|1000012587|003502|0|1|003502|INFORMATION||MEGAMOBILE
1000012640|Libre Aquarius||||By Time||Libre Aquarius subs|1000012640|003502|0|1|003502|COMMUNICATE|Chat|MEGAMOBILE
1000012640|Libre Aquarius||||By Time||Libre Aquarius subs|1000012640|003502|0|1|003502|COMMUNICATE|News - Headlines|MEGAMOBILE
1000012640|Libre Aquarius||||By Time||Libre Aquarius subs|1000012640|003502|0|1|003502|Entertainment|Promos|MEGAMOBILE
1000012640|Libre Aquarius||||By Time||Libre Aquarius subs|1000012640|003502|0|1|003502|ENTERTAINMENT|Promos|MEGAMOBILE
1000012640|Libre Aquarius||||By Time||Libre Aquarius subs|1000012640|003502|0|1|003502|INFORMATION||MEGAMOBILE
答案 0 :(得分:3)
有一个特殊的bash命令可以完成这项工作:[join] [1]
我建议您使用它而不是awk,因为这样可以提高内存效率。
正如@EdMorton所说:
join要求两个输入文件都必须在join字段中排序
join -t"|" -1 10 -2 1 <(sort -t"|" -k10 -n file2) <(sort -t"|" -k1 -n file1)
给予
003502|1000012587|HULA Aries||||By Time||HULA Aries subs|1000012587|0|1|COMMUNICATE|Chat|MEGAMOBILE
003502|1000012587|HULA Aries||||By Time||HULA Aries subs|1000012587|0|1|COMMUNICATE|News - Headlines|MEGAMOBILE
003502|1000012587|HULA Aries||||By Time||HULA Aries subs|1000012587|0|1|ENTERTAINMENT|Promos|MEGAMO
003502|1000012587|HULA Aries||||By Time||HULA Aries subs|1000012587|0|1|Entertainment|Promos|MEGAMOBILE
003502|1000012640|Libre Aquarius||||By Time||Libre Aquarius subs|1000012640|0|1|COMMUNICATE|Chat|MEGAMOBILE
003502|1000012640|Libre Aquarius||||By Time||Libre Aquarius subs|1000012640|0|1|COMMUNICATE|News - Headlines|MEGAMOBILE
003502|1000012640|Libre Aquarius||||By Time||Libre Aquarius subs|1000012640|0|1|ENTERTAINMENT|Promos|MEGAMO
003502|1000012640|Libre Aquarius||||By Time||Libre Aquarius subs|1000012640|0|1|Entertainment|Promos|MEGAMOBILE
答案 1 :(得分:2)
由于您有重复的密钥,因此您应该跟踪这些密钥。
<div class="landscape gatsby-image-wrapper"></div>
<div class="portrait gatsby-image-wrapper"></div>
<div class="portrait gatsby-image-wrapper"></div>
<div class="square gatsby-image-wrapper"></div>
<div class="landscape gatsby-image-wrapper"></div>
在上面,数组awk 'BEGIN{FS=OFS="|"}
(NR==FNR) { c[$1]++; a[$1,c[$1]]=$0; next }
($10 in c) { for(i=1;i<=c[$10];++i) print $0,a[$10,i] }' file1 file2
跟踪我们遇到键c
的次数。然后,将这些条目存储在由$1
和序列号a
索引的数组$1
中。在读取c[$1]
时,我们检查键file2
是否在原始数组$10
中,如果是,则按顺序处理所有存储的值。
此外,由于预期的输出,我们不得不还原文件顺序。