在Perl中推送数组的输出不正确

时间:2014-01-19 20:03:15

标签: perl push

我有一个这样的文件:

#Chr or contig Name             #Source #Type   #Start  #End    #Score  #Strand #Phase  #Attributes
313-9640000-9660000:19634:fwd   maker   gene    1978    7195    .       +       .       ID=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10;Name=maker-313-9640000-9660000%253A19634%253Afwd-augustus-gene-0.10
313-9640000-9660000:19634:fwd   maker   mRNA    1978    7195    .       +       .       ID=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1;Name=maker-313-9640000-9660000%253A19634%253Afwd-augustus-gene-0.10-mRNA-1;Parent=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10
313-9640000-9660000:19634:fwd   maker   exon    1978    2207    0.48    +       .       Parent=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1
313-9640000-9660000:19634:fwd   maker   exon    3081    3457    0.48    +       .       Parent=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1
313-9640000-9660000:19634:fwd   maker   exon    3535    3700    0.48    +       .       Parent=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1
313-9640000-9660000:19634:fwd   maker   exon    4247    4391    0.48    +       .       ID=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1:exon:2;Parent=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1
313-9640000-9660000:19634:fwd   maker   exon    6766    7195    0.48    +       .       Parent=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1
313-9640000-9660000:19634:fwd   maker   CDS     3267    3457    .       +       0       ID=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1:cds:0;Parent=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1
313-9640000-9660000:19634:fwd   maker   CDS     3535    3700    .       +       .       Parent=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1
313-9640000-9660000:19634:fwd   maker   CDS     4247    4391    .       +       .       ID=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1:cds:2;Parent=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1
313-9640000-9660000:19634:fwd   maker   CDS     6766    7106    .       +       .       ID=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1:cds:3;Parent=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10-mRNA-1
313-9640000-9660000:19634:fwd   maker   gene    7997    13832   .       +       .       ID=augustus_masked-313-9640000-9660000%3A19634%3Afwd-abinit-gene-0.1;Name=augustus_masked-313-9640000-9660000%253A19634%253Afwd-abinit-gene-0.1
313-9640000-9660000:19634:fwd   maker   mRNA    7997    13832   .       +       .       ID=augustus_masked-313-9640000-9660000%3A19634%3Afwd-abinit-gene-0.1-mRNA-1;Name=augustus_masked-313-9640000-9660000%253A19634%253Afwd-abinit-gene-0.1-mRNA-1;Parent=augustus_masked-313-9640000-9660000%3A19634%3Afwd-abinit-gene-0.1
313-9640000-9660000:19634:fwd   maker   exon    7997    8219    0.46    +       .       Parent=augustus_masked-313-9640000-9660000%3A19634%3Afwd-abinit-gene-0.1-mRNA-1
313-9640000-9660000:19634:fwd   maker   exon    8284    8942    0.46    +       .       Parent=augustus_masked-313-9640000-9660000%3A19634%3Afwd-abinit-gene-0.1-mRNA-1

我想提取行,第三列,那些带有“gene”的行并将它们放入一个数组中:

while (<>) {
  chomp;
  next if /^\#/;
  my @gff_data = split /\t+/;
  if ($gff_data[2] eq "gene") {
    push(@genes,@gff_data);
  }
}

print @genes[1];

然而,使用该代码,我的输出是“错误的”。它给出maker,但我希望它是

313-9640000-9660000:19634:fwd   maker   gene    1978    7195    .       +       .       ID=maker-313-9640000-9660000%3A19634%3Afwd-augustus-gene-0.10;Name=maker-313-9640000-9660000%253A19634%253Afwd-augustus-gene-0.10

有什么想法吗?

谢谢你们。

3 个答案:

答案 0 :(得分:0)

首先将'\t+'更改为' +',您在文本中没有标签字符

使用此代码:

use Data::Dumper;
while (<>) {
$line=$_;
  chomp;
  next if /^\#/;
  my @gff_data = split / +/;
  if ($gff_data[2] eq "gene") {
    push(@genes,$line);
  }
}

print Dumper(@genes);
你的代码中的

:gff_data变量不是行!!这是一个数组,当你推它时你不要推线!

答案 1 :(得分:0)

据我所知,您的代码与此代码类似:

while (<>) {
  chomp;
  next if /^\#/;
  my @gff_data = split /\t+/;
  if ($gff_data[2] eq "gene") {
    # here you want a LINE that have third word 'maker'
    # this can be done by $_ or by join('\t',@gff_data);
    push(@genes,$_);
  }
}

print @genes[1];

答案 2 :(得分:0)

open (FH, <Filename>);
while (<FH>) {
    next if ($_ =~ /^#/);
    push (@genes, $_) if ($_ =~ /.*?\s+.*?\s+gene/is);
}
print Dumper \@genes;