我有两个文本文件的数据,如:
FILE1.TXT:
contig postion majorallele minorallele highqualty reliable defin highqualty
Contig1 479 * C 0 0 0 0
Contig1 617 T A 0 0 0 0
Contig15 243 T C 0 0 0 0
Contig15 471 T C 0 0 0 0
FILE2.TXT
contig 1 chromosome 0 000000476-044111330
contig 1 chromosome 0 000000477-044111331
contig 1 chromosome 0 000000478-044111332
contig 1 chromosome 0 000000479-044111333
contig 1 chromosome 0 000000480-044111334
contig 1 chromosome 0 000000481-044111335
contig 1 chromosome 0 000000482-044111336
contig 15 chromosome 3 000000242-018378247
contig 15 chromosome 3 000000243-018378248
contig 15 chromosome 3 000000244-018378249
contig 15 chromosome 3 000000245-018378250
contig 15 chromosome 3 000000468-018377016
contig 15 chromosome 3 000000469-018377017
contig 15 chromosome 3 000000470-018377018
contig 15 chromosome 3 000000471-018377019
contig 15 chromosome 3 000000472-018377020
contig 15 chromosome 3 000000473-018377021
我想要做的是将file1.txt的前两列与file2.txt的第一列和第五列进行比较,并将输出返回为:
contig 1 chromosome 0 000000479-044111333 * C 0 0 0 0
contig 15 chromosome 3 000000243-018378248 T C 0 0 0 0
contig 15 chromosome 3 000000471-018377019 T C 0 0 0 0
用于合并输出中两个文件的匹配行。
答案 0 :(得分:0)
你可以简单地使用awk而不是perl。
awk 'FNR==NR && NR!=1
{x=tolower($1);
y=$2;
$1=$2="";
a[x""y]=$0;
next
}{
b=$5;
gsub(/^0*/,"",b);
split(b,c,"-");
if($1$2c[1] in a)print $0,a[$1$2c[1]]}' file1.txt file2.txt
下面测试:
> cat temp1
contig postion majorallele minorallele highqualty reliable defin highqualty
Contig1 479 * C 0 0 0 0
Contig1 617 T A 0 0 0 0
Contig15 243 T C 0 0 0 0
Contig15 471 T C 0 0 0 0
>
> cat temp2
contig 1 chromosome 0 000000476-044111330
contig 1 chromosome 0 000000477-044111331
contig 1 chromosome 0 000000478-044111332
contig 1 chromosome 0 000000479-044111333
contig 1 chromosome 0 000000480-044111334
contig 1 chromosome 0 000000481-044111335
contig 1 chromosome 0 000000482-044111336
contig 15 chromosome 3 000000242-018378247
contig 15 chromosome 3 000000243-018378248
contig 15 chromosome 3 000000244-018378249
contig 15 chromosome 3 000000245-018378250
contig 15 chromosome 3 000000468-018377016
contig 15 chromosome 3 000000469-018377017
contig 15 chromosome 3 000000470-018377018
contig 15 chromosome 3 000000471-018377019
contig 15 chromosome 3 000000472-018377020
contig 15 chromosome 3 000000473-018377021
>
> nawk 'FNR==NR && NR!=1{x=tolower($1);y=$2;$1=$2="";a[x""y]=$0;next}{b=$5;gsub(/^0*/,"",b);split(b,c,"-");if($1$2c[1] in a)print $0,a[$1$2c[1]]}' temp1 temp2
contig 1 chromosome 0 000000479-044111333 * C 0 0 0 0
contig 15 chromosome 3 000000243-018378248 T C 0 0 0 0
contig 15 chromosome 3 000000471-018377019 T C 0 0 0 0
>