我有一个这样的文件:
#Chr or contig Name #Source #Type #Start #End #Score #Strand #Phase #Attributes
313-9640000-9660000:19634:fwd maker gene 1978 7195 . + . ID=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10;Name=maker-313-9640000-9660000%253A19634%253Afwd-augustus-gene-0.10
313-9640000-9660000:19634:fwd maker mRNA 1978 7195 . + . ID=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1;Name=maker-313-9640000-9660000%253A19634%253Afwd-augustus-gene-0.10-mRNA-1;Parent=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10
313-9640000-9660000:19634:fwd maker exon 1978 2207 0.48 + . Parent=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1
313-9640000-9660000:19634:fwd maker exon 3081 3457 0.48 + . Parent=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1
313-9640000-9660000:19634:fwd maker exon 3535 3700 0.48 + . Parent=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1
313-9640000-9660000:19634:fwd maker exon 4247 4391 0.48 + . ID=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1:exon:2;Parent=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1
313-9640000-9660000:19634:fwd maker exon 6766 7195 0.48 + . Parent=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1
313-9640000-9660000:19634:fwd maker CDS 3267 3457 . + 0 ID=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1:cds:0;Parent=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1
313-9640000-9660000:19634:fwd maker CDS 3535 3700 . + . Parent=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1
313-9640000-9660000:19634:fwd maker CDS 4247 4391 . + . ID=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1:cds:2;Parent=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1
313-9640000-9660000:19634:fwd maker CDS 6766 7106 . + . ID=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1:cds:3;Parent=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1
313-9640000-9660000:19634:fwd maker gene 7997 13832 . + . ID=augustus_masked-313-9640000-9660000%3A19634%3Afwd-abinit-gene-0.1;Name=augustus_masked-313-9640000-9660000%253A19634%253Afwd-abinit-gene-0.1
313-9640000-9660000:19634:fwd maker mRNA 7997 13832 . + . ID=augustus_masked-313-9640000-9660000%3A19634%3Afwd-abinit-gene-0.1-mRNA-1;Name=augustus_masked-313-9640000-9660000%253A19634%253Afwd-abinit-gene-0.1-mRNA-1;Parent=augustus_masked-313-9640000-9660000%3A19634%3Afwd-abinit-gene-0.1
313-9640000-9660000:19634:fwd maker exon 7997 8219 0.46 + . Parent=augustus_masked-313-9640000-9660000%3A19634%3Afwd-abinit-gene-0.1-mRNA-1
313-9640000-9660000:19634:fwd maker exon 8284 8942 0.46 + . Parent=augustus_masked-313-9640000-9660000%3A19634%3Afwd-abinit-gene-0.1-mRNA-1
我想提取行,第三列,那些带有“gene”的行并将它们放入一个数组中:
while (<>) {
chomp;
next if /^\#/;
my @gff_data = split /\t+/;
if ($gff_data[2] eq "gene") {
push(@genes,@gff_data);
}
}
print @genes[1];
然而,使用该代码,我的输出是“错误的”。它给出maker
,但我希望它是
313-9640000-9660000:19634:fwd maker gene 1978 7195 . + . ID=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10;Name=maker-313-9640000-9660000%253A19634%253Afwd-augustus-gene-0.10
有什么想法吗?
谢谢你们。
答案 0 :(得分:0)
首先将'\t+'
更改为' +'
,您在文本中没有标签字符
使用此代码:
use Data::Dumper;
while (<>) {
$line=$_;
chomp;
next if /^\#/;
my @gff_data = split / +/;
if ($gff_data[2] eq "gene") {
push(@genes,$line);
}
}
print Dumper(@genes);
你的代码中的:gff_data变量不是行!!这是一个数组,当你推它时你不要推线!
答案 1 :(得分:0)
据我所知,您的代码与此代码类似:
while (<>) {
chomp;
next if /^\#/;
my @gff_data = split /\t+/;
if ($gff_data[2] eq "gene") {
# here you want a LINE that have third word 'maker'
# this can be done by $_ or by join('\t',@gff_data);
push(@genes,$_);
}
}
print @genes[1];
答案 2 :(得分:0)
open (FH, <Filename>);
while (<FH>) {
next if ($_ =~ /^#/);
push (@genes, $_) if ($_ =~ /.*?\s+.*?\s+gene/is);
}
print Dumper \@genes;