AWK:通过两列匹配比较两个文件,不允许数组

时间:2015-10-24 20:53:16

标签: awk

我有两个包含两列的文件,第二个文件中的一些行与第一个文件的行匹配,如果它们匹配,则打印匹配的行。我不能使用数组,因为文件非常大。

我有点想法,比较一行一行与getline。但我不明白......

你能帮助我吗?

第一个文件:

ut  Adélaïde    Male    Latvian Chippewa
proin   Åke Male    Zulu    Eskimo
scelerisque Åke Female  Maltese Central American
sit Åke Male    Northern Sotho  Yaqui
sagittis    Alizée  Male    Northern Sotho  Paiute
dictumst    Almérinda   Female  Romanian    Honduran
sed Almérinda   Male    Hungarian   Navajo
volutpat    Almérinda   Male    Georgian    Honduran

第二档:

Adélaïde    Male
Åke Female
Alizée  Male
Almérinda   Male

输出:

ut  Adélaïde    Male    Latvian Chippewa
sit Åke Female  Northern Sotho  Yaqui
sagittis    Alizée  Male    Northern Sotho  Paiute
sed Almérinda   Male    Hungarian   Navajo
volutpat    Almérinda   Male    Georgian    Honduran

我的工作

BEGIN {
FS="\t";
n=getline V0 <ll;
}
{
 wrd=$2"\t"$3 
 while (wrd>V0)
 {
     if (n>0)
     {
         n=getline V0 < ll;
     }
     else
     {
         n=getline;
         while (n>0)
         {
             n=getline;
         }
         exit;
     }
 }
 if (wrd==V0)
 {
     print $0;next;
 }
 else
 {
  next;
 }
}

1 个答案:

答案 0 :(得分:0)

这就是你要求的(使用GNU awk用于gensub(),使用2个sub或类似的其他awks):

$ cat tst.awk
BEGIN { db=ARGV[2]; delete ARGV[2]; ARGC-- }
{
    key = gensub(/^\S+\s+(\S+\s+\S+).*/,"\\1",1)
    while ( (getline line < db) > 0 ) {
        if (key == line) {
            print
            break
        }
    }
    close(db)
}

$ awk -f tst.awk file1 file2
ut  Adélaïde    Male    Latvian Chippewa
scelerisque Åke Female  Maltese Central American
sagittis    Alizée  Male    Northern Sotho  Paiute
sed Almérinda   Male    Hungarian   Navajo
volutpat    Almérinda   Male    Georgian    Honduran

但是如果file2如此庞大,它无法放入数组中,那么上面的内容可能会很慢,无法在任何合理的时间内完成。