我有两个包含两列的文件,第二个文件中的一些行与第一个文件的行匹配,如果它们匹配,则打印匹配的行。我不能使用数组,因为文件非常大。
我有点想法,比较一行一行与getline。但我不明白......
你能帮助我吗?
第一个文件:
ut Adélaïde Male Latvian Chippewa proin Åke Male Zulu Eskimo scelerisque Åke Female Maltese Central American sit Åke Male Northern Sotho Yaqui sagittis Alizée Male Northern Sotho Paiute dictumst Almérinda Female Romanian Honduran sed Almérinda Male Hungarian Navajo volutpat Almérinda Male Georgian Honduran
第二档:
Adélaïde Male Åke Female Alizée Male Almérinda Male
输出:
ut Adélaïde Male Latvian Chippewa sit Åke Female Northern Sotho Yaqui sagittis Alizée Male Northern Sotho Paiute sed Almérinda Male Hungarian Navajo volutpat Almérinda Male Georgian Honduran
我的工作
BEGIN {
FS="\t";
n=getline V0 <ll;
}
{
wrd=$2"\t"$3
while (wrd>V0)
{
if (n>0)
{
n=getline V0 < ll;
}
else
{
n=getline;
while (n>0)
{
n=getline;
}
exit;
}
}
if (wrd==V0)
{
print $0;next;
}
else
{
next;
}
}
答案 0 :(得分:0)
这就是你要求的(使用GNU awk用于gensub(),使用2个sub或类似的其他awks):
$ cat tst.awk
BEGIN { db=ARGV[2]; delete ARGV[2]; ARGC-- }
{
key = gensub(/^\S+\s+(\S+\s+\S+).*/,"\\1",1)
while ( (getline line < db) > 0 ) {
if (key == line) {
print
break
}
}
close(db)
}
$ awk -f tst.awk file1 file2
ut Adélaïde Male Latvian Chippewa
scelerisque Åke Female Maltese Central American
sagittis Alizée Male Northern Sotho Paiute
sed Almérinda Male Hungarian Navajo
volutpat Almérinda Male Georgian Honduran
但是如果file2如此庞大,它无法放入数组中,那么上面的内容可能会很慢,无法在任何合理的时间内完成。