我正在尝试使用awk
将file1与file2匹配,并在单独的文件中打印匹配的行。 File1是〜4MB,我得到以下错误,我似乎无法解决它。谢谢你:)。
awk 'NR==FNR{c[$0]; next} ($0 in c)' RS="," file1.txt RS="\n" file2.txt > match.txt
awk:超出程序限制:最大字段数= 32767 FILENAME =“sort.2.txt”FNR = 1 NR = 1
File1中
chr1:3063265-3063458 AVP:exon.3 8.55959
chr1:947806-947967 RSPO4:exon.3 246.54
chr2:12758246-12758422 CTD-2192J16.22:exon.2;MAN2B1:exon.1;MAN2B1:exon.20;MAN2B1:exon.22 221.483
chr2:57975642-57975745 KIF5A:exon.1;KIF5A:exon.23;KIF5A:exon.26 222.932
文件2
AVP
KIF5A
所需的输出
chr1:3063265-3063458 AVP:exon.3 8.55959
chr2:57975642-57975745 KIF5A:exon.1;KIF5A:exon.23;KIF5A:exon.26 222.932
答案 0 :(得分:3)
你可以尝试,
awk '
FNR==NR{d[$0]; next;} #Store each key to find, from file2
{ #for each line in file1
for(k in d){ #for each key in d (file2)
pat="(^|;)"k":"; #pattern to search (regular expression)
if($2 ~ pat){
print; #print if match with RE
break;
}
}
}' file2 file1
你明白了,
chr1:3063265-3063458 AVP:exon.3 8.55959 chr2:57975642-57975745 KIF5A:exon.1;KIF5A:exon.23;KIF5A:exon.26 222.932